pandas 有没有一种方法可以在Python上找到两个 Dataframe 中的时间段重叠,并返回最大和最小时间戳?

x33g5p2x  于 2023-08-01  发布在  Python
关注(0)|答案(1)|浏览(70)

我有两个事件的Pandas数据框架,有时间段的开始和结束时间:

DF1

Group        amin             amax
1   2023-07-03 10:45:00 2023-07-03 16:00:00
2   2023-07-04 11:00:00 2023-07-04 11:00:00
3   2023-07-04 11:30:00 2023-07-04 18:15:00

字符串

*DF2

Group        amin             amax  
1   2023-07-03 13:30:00 2023-07-03 13:30:00
2   2023-07-03 14:30:00 2023-07-03 15:30:00
3   2023-07-03 16:30:00 2023-07-03 16:30:00
4   2023-07-03 17:00:00 2023-07-03 17:00:00
5   2023-07-04 15:45:00 2023-07-04 16:30:00


理想情况下,我想迭代两个 Dataframe 以创建一个新的 Dataframe ,该 Dataframe 将找到它们之间的重叠,并给予整体重叠的最小值和最大值:

Group        amin             amax  
1   2023-07-03 10:45:00 2023-07-03 17:00:00
2   2023-07-04 11:30:00 2023-07-03 18:15:00


有人对如何做到这一点有什么建议吗?谢谢你,谢谢

xriantvc

xriantvc1#

使用您提供的 Dataframe :

import pandas as pd

df1 = pd.DataFrame(
    {
        "amin": ["2023-07-03 10:45:00", "2023-07-04 11:00:00", "2023-07-04 11:30:00"],
        "amax": ["2023-07-03 16:00:00", "2023-07-04 11:00:00", "2023-07-04 18:15:00"],
    }
)

df2 = pd.DataFrame(
    {
        "amin": [
            "2023-07-03 13:30:00",
            "2023-07-03 14:30:00",
            "2023-07-03 16:30:00",
            "2023-07-03 17:00:00",
            "2023-07-04 15:45:00",
        ],
        "amax": [
            "2023-07-03 13:30:00",
            "2023-07-03 15:30:00",
            "2023-07-03 16:30:00",
            "2023-07-03 17:00:00",
            "2023-07-04 16:30:00",
        ],
    }
)

字符串
下面是一种使用Pandas concatshift和布尔索引的方法:

# Merge and use proper dtypes
df = pd.concat([df1, df2]).sort_values(["amin", "amax"], ignore_index=True)
for col in ("amin", "amax"):
    df[col] = pd.to_datetime(df[col])

# Remove rows with no intersections
df = df[
    ~(
        (df["amin"] >= df["amax"].shift(1))
        & (df["amax"] <= df["amin"].shift(-1).fillna(df["amax"].max()))
    )
]

# Remove rows contained within others
df = df[
    ~(
        ((df["amin"] >= df["amin"].shift(-1)) & (df["amax"] <= df["amax"].shift(-1)))
        | ((df["amin"] >= df["amin"].shift(1)) & (df["amax"] <= df["amax"].shift(1)))
    )
].reset_index(drop=True)


然后:

print(df)
# Output

                 amin                amax
0 2023-07-03 10:45:00 2023-07-03 16:00:00
1 2023-07-04 11:30:00 2023-07-04 18:15:00

相关问题