pandas 有没有一种方法可以在Python上找到两个 Dataframe 中的时间段重叠，并返回最大和最小时间戳？

x33g5p2x 于 2023-08-01 发布在 Python

关注(0)|答案(1)|浏览(70)

我有两个事件的Pandas数据框架，有时间段的开始和结束时间：

DF1

Group        amin             amax
1   2023-07-03 10:45:00 2023-07-03 16:00:00
2   2023-07-04 11:00:00 2023-07-04 11:00:00
3   2023-07-04 11:30:00 2023-07-04 18:15:00

字符串

*DF2

Group        amin             amax  
1   2023-07-03 13:30:00 2023-07-03 13:30:00
2   2023-07-03 14:30:00 2023-07-03 15:30:00
3   2023-07-03 16:30:00 2023-07-03 16:30:00
4   2023-07-03 17:00:00 2023-07-03 17:00:00
5   2023-07-04 15:45:00 2023-07-04 16:30:00

型
理想情况下，我想迭代两个 Dataframe 以创建一个新的 Dataframe ，该 Dataframe 将找到它们之间的重叠，并给予整体重叠的最小值和最大值：

Group        amin             amax  
1   2023-07-03 10:45:00 2023-07-03 17:00:00
2   2023-07-04 11:30:00 2023-07-03 18:15:00

型
有人对如何做到这一点有什么建议吗？谢谢你，谢谢

pandas

来源：https://stackoverflow.com/questions/76685578/is-there-a-way-to-find-overlaps-in-time-periods-in-two-dataframes-on-python-and

1条答案

按热度按时间

xriantvc1#

使用您提供的 Dataframe ：

import pandas as pd

df1 = pd.DataFrame(
    {
        "amin": ["2023-07-03 10:45:00", "2023-07-04 11:00:00", "2023-07-04 11:30:00"],
        "amax": ["2023-07-03 16:00:00", "2023-07-04 11:00:00", "2023-07-04 18:15:00"],
    }
)

df2 = pd.DataFrame(
    {
        "amin": [
            "2023-07-03 13:30:00",
            "2023-07-03 14:30:00",
            "2023-07-03 16:30:00",
            "2023-07-03 17:00:00",
            "2023-07-04 15:45:00",
        ],
        "amax": [
            "2023-07-03 13:30:00",
            "2023-07-03 15:30:00",
            "2023-07-03 16:30:00",
            "2023-07-03 17:00:00",
            "2023-07-04 16:30:00",
        ],
    }
)

字符串
下面是一种使用Pandas concat、shift和布尔索引的方法：

# Merge and use proper dtypes
df = pd.concat([df1, df2]).sort_values(["amin", "amax"], ignore_index=True)
for col in ("amin", "amax"):
    df[col] = pd.to_datetime(df[col])

# Remove rows with no intersections
df = df[
    ~(
        (df["amin"] >= df["amax"].shift(1))
        & (df["amax"] <= df["amin"].shift(-1).fillna(df["amax"].max()))
    )
]

# Remove rows contained within others
df = df[
    ~(
        ((df["amin"] >= df["amin"].shift(-1)) & (df["amax"] <= df["amax"].shift(-1)))
        | ((df["amin"] >= df["amin"].shift(1)) & (df["amax"] <= df["amax"].shift(1)))
    )
].reset_index(drop=True)

型
然后：

print(df)
# Output

                 amin                amax
0 2023-07-03 10:45:00 2023-07-03 16:00:00
1 2023-07-04 11:30:00 2023-07-04 18:15:00

型

赞(0）回复(0）举报 2023-08-01

我来回答

pandas 有没有一种方法可以在Python上找到两个 Dataframe 中的时间段重叠，并返回最大和最小时间戳？

1条答案

相关问题

热门标签

最新问答