使用pandas计算时间序列峰值之间的Delta

np8igboo  于 2023-04-10  发布在  其他
关注(0)|答案(1)|浏览(117)

我有两个pandas dataframe如下:
df1:
| dt|z|
| --------------|--------------|
| 1970-01-17 21:39:10.304|0.87602美元|
| 1970-01-17 21:39:10.344|0.99907|
| ……|……|
df2:|dt|z||---------------------------|--------------||1970-01-17 21:39:23.312|0.84904||1970-01-17 21:39:23.352|一○ ○五四二||……|……|
其中dt是索引,z(值在0和2之间)是唯一的列,以便绘制时间序列。我绘制了两个时间序列:

两个dataframe应该有大致相同的值,因为它们都注册了相同的事件。问题是它们记录值的时间延迟。
我需要按时间顺序计算时间序列之间的增量,是否可以从 Dataframe 开始计算?

jjjwad0x

jjjwad0x1#

根据OP更新进行编辑。
代码中的注解。

import pandas as pd
from datetime import timedelta

# sample data for two data frames
df1 = pd.DataFrame(
    data=np.random.rand(20) + 1,
    columns=["col"],
    index=pd.date_range(
        start="2023-04-01T00:00",
        end="2023-04-07T00:00",
        periods=20)
)

df2 = pd.DataFrame(
    data=np.random.rand(20) + 1,
    columns=["col"],
    index=pd.date_range(
        start="2023-04-01T00:00",
        end="2023-04-07T00:00",
        periods=20)
)

# choose according to your needs
offset_threshold = timedelta(days=1, hours=6)

# choose according to your needs
peak_strength_threshold = 1.8

# highlight peaks by applying some threshold or "peak detection"
df1["peak"] = df1["col"] > peak_strength_threshold
df2["peak"] = df2["col"] > peak_strength_threshold

for index1, row1 in df1.iterrows():
    if row1.peak:
        # find matching peak in df2
        df2_peaks = df2[df2.peak]
        best_match_in_df2 = df2_peaks.iloc[np.abs((df2_peaks.index - index1)).argmin()]

        delta = index1 - best_match_in_df2.name

        # check if the best match is close enough to be published
        if np.abs(delta) < offset_threshold:
            print(delta)

相关问题