python 如何在PandasDataframe中计算两个日期之间的时差

mitkmikd  于 2023-02-02  发布在  Python
关注(0)|答案(3)|浏览(149)

我有一个 Dataframe ,其中有多个日期列的行。日期列有日期和时间。不是每一行都有增量时间,所以我想在每一行之后计算当前日期和前一个日期之间的时间差(以秒为单位)。

import pandas as pd
data = pd.date_range('1/1/2011', periods = 10, freq ='H')

在上面的代码片段中,每一步之后的时间差是1hr,这意味着3600秒,所以我想要一个具有[(<prev date time>, <current_datetime>, <time_difference>),.....]的元组列表。

vxf3dgd4

vxf3dgd41#

    • 我想要一个元组列表**,其中包含[(前一个日期时间,当前日期时间,时间差),.....]

在这种情况下,将listzip配合使用,并计算与tolal_seconds的时间差:

data = pd.date_range("1/1/2011", periods = 10, freq ="H")
​
L = list(zip(data.shift(), # <- previous time
             data,         # <- current time
            (data.shift() - data).total_seconds())) # <- time diff

注:如果您操作 * Dataframe *,则需要将data替换为df["date_column"]
输出:

print(L)

[(Timestamp('2011-01-01 01:00:00', freq='H'),
  Timestamp('2011-01-01 00:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 02:00:00', freq='H'),
  Timestamp('2011-01-01 01:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 03:00:00', freq='H'),
  Timestamp('2011-01-01 02:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 04:00:00', freq='H'),
  Timestamp('2011-01-01 03:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 05:00:00', freq='H'),
  Timestamp('2011-01-01 04:00:00', freq='H'),
  3600.0),
  ...
nx7onnlm

nx7onnlm2#

您可以通过使用Pandas中的diff函数计算数据列中连续行之间的时间差来实现这一点。

df = pd.DataFrame({"date": pd.date_range("1/1/2011", periods=10, freq="H")})

# Calculate the time difference between consecutive rows in seconds
df["time_diff"] = df["date"].diff().dt.total_seconds()

# Create a list of tuples
result = [(df.iloc[i-1]["date"], row["date"], row["time_diff"]) for i, row in df[1:].iterrows()]

df

date                time_diff
0   2011-01-01 00:00:00       NaN
1   2011-01-01 01:00:00    3600.0
2   2011-01-01 02:00:00    3600.0
3   2011-01-01 03:00:00    3600.0
4   2011-01-01 04:00:00    3600.0
5   2011-01-01 05:00:00    3600.0
6   2011-01-01 06:00:00    3600.0
7   2011-01-01 07:00:00    3600.0
8   2011-01-01 08:00:00    3600.0
9   2011-01-01 09:00:00    3600.0

result

[(Timestamp('2011-01-01 00:00:00'), Timestamp('2011-01-01 01:00:00'), 3600.0),
 (Timestamp('2011-01-01 01:00:00'), Timestamp('2011-01-01 02:00:00'), 3600.0),
 (Timestamp('2011-01-01 02:00:00'), Timestamp('2011-01-01 03:00:00'), 3600.0),
 (Timestamp('2011-01-01 03:00:00'), Timestamp('2011-01-01 04:00:00'), 3600.0),
 (Timestamp('2011-01-01 04:00:00'), Timestamp('2011-01-01 05:00:00'), 3600.0),
 (Timestamp('2011-01-01 05:00:00'), Timestamp('2011-01-01 06:00:00'), 3600.0),
 (Timestamp('2011-01-01 06:00:00'), Timestamp('2011-01-01 07:00:00'), 3600.0),
 (Timestamp('2011-01-01 07:00:00'), Timestamp('2011-01-01 08:00:00'), 3600.0),
 (Timestamp('2011-01-01 08:00:00'), Timestamp('2011-01-01 09:00:00'), 3600.0)]
bvjveswy

bvjveswy3#

使用列表解析可以做到这一点。[:-1]是必需的,因为我们只使用shift就可以得到一个包含10个区间的列表,但是N个点之间有N-1个区间。

result = [(i[0],
           i[1],
           (i[1] - i[0]).total_seconds())
          for i in list(zip(data, data.shift(1)))[:-1]]

print(result)
[(Timestamp('2011-01-01 00:00:00', freq='H'),
  Timestamp('2011-01-01 01:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 01:00:00', freq='H'),
  Timestamp('2011-01-01 02:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 02:00:00', freq='H'),
  Timestamp('2011-01-01 03:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 03:00:00', freq='H'),
  Timestamp('2011-01-01 04:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 04:00:00', freq='H'),
  Timestamp('2011-01-01 05:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 05:00:00', freq='H'),
  Timestamp('2011-01-01 06:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 06:00:00', freq='H'),
  Timestamp('2011-01-01 07:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 07:00:00', freq='H'),
  Timestamp('2011-01-01 08:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 08:00:00', freq='H'),
  Timestamp('2011-01-01 09:00:00', freq='H'),
  3600.0)]

相关问题