python 合并列数据框架

pvcm50d1  于 2023-03-28  发布在  Python
关注(0)|答案(1)|浏览(149)

我有以下的Dataframe:DF1
| startTimeIso|endTimeIso|身份证|
| --------------|--------------|--------------|
| 2023-03-07T03:28:56.969000|2023-03-07T03:29:25.396000|五|
| 2023-03-07T03:57:08.734000|2023年3月7日03:58:08.734000|七|
| 2023-03-07T04:18:08.734000|2023-03-07T04:20:10.271000|十六岁|
| 2023-03-07T07:58:08.734000|2023-03-07T07:58:10.271000|二十一|
第二个df2
| startTimeIso|endTimeIso|价值|
| --------------|--------------|--------------|
| 2023-03-07T03:28:57.169000|2023年3月7日3时29分25.996000秒|真|
| 2023-03-07T03:57:08.734000|2023年3月7日03:58:08.734000|真|
| 2023-03-07T05:38:08.734000|2023-03-07T05:40:10.271000|真|
| 2023-03-07T07:58:08.934000|2023-03-07T07:58:10.371000|真|
我想检查,如果一行从df 2合并行从df 1.可以有一个公差从1秒. StartTimeIso以及endTimeIso应予以考虑.
结果应如下所示:df_merged
| startTimeIso|endTimeIso|价值|startTimeIso_y|endTimeIso_y|身份证|
| --------------|--------------|--------------|--------------|--------------|--------------|
| 2023-03-07T03:28:57.169000|2023年3月7日3时29分25.996000秒|真|2023-03-07T03:28:56.969000|2023-03-07T03:29:25.396000|五|
| 2023-03-07T03:57:08.734000|2023年3月7日03:58:08.734000|真|2023-03-07T03:57:08.734000|2023年3月7日03:58:08.734000|七|
| 2023-03-07T05:38:08.734000|2023-03-07T05:40:10.271000|真|无|无|无|
| 2023-03-07T07:58:08.934000|2023-03-07T07:58:10.371000|真|2023-03-07T07:58:08.734000|2023-03-07T07:58:10.271000|二十一|
Rows_found = 3

km0tfn4u

km0tfn4u1#

使用merge_asoftolerance

df1[['startTimeIso', 'endTimeIso']] = df1[['startTimeIso', 'endTimeIso']].apply(pd.to_datetime)
df2[['startTimeIso', 'endTimeIso']] = df2[['startTimeIso', 'endTimeIso']].apply(pd.to_datetime)

out = pd.merge_asof(
    df2.sort_values(by='startTimeIso'),
    df1.sort_values(by='startTimeIso')
       .rename(columns={'startTimeIso': 'startTimeIso_y'}),
    left_on='startTimeIso', right_on='startTimeIso_y',
    direction='nearest', tolerance=pd.Timedelta('1s'),
    suffixes=(None, '_y')
)

print(out)

输出:

startTimeIso              endTimeIso  value            endTimeIso_y    id
0 2023-03-07 03:28:57.169 2023-03-07 03:29:25.996   True 2023-03-07 03:29:25.396   5.0
1 2023-03-07 03:57:08.734 2023-03-07 03:58:08.734   True 2023-03-07 03:58:08.734   7.0
2 2023-03-07 05:38:08.734 2023-03-07 05:40:10.271   True                     NaT   NaN
3 2023-03-07 07:58:08.934 2023-03-07 07:58:10.371   True 2023-03-07 07:58:10.271  21.0

如果要考虑开始或结束,请执行两次合并和combine_first

out1 = pd.merge_asof(
    df2.sort_values(by='startTimeIso').reset_index(),
    df1.sort_values(by='startTimeIso')
       .rename(columns={'startTimeIso': 'startTimeIso_y'}),
    left_on='startTimeIso', right_on='startTimeIso_y',
    direction='nearest', tolerance=pd.Timedelta('1s'),
    suffixes=(None, '_y')
)

out2 = pd.merge_asof(
    df2.sort_values(by='endTimeIso').reset_index(),
    df1.sort_values(by='endTimeIso')
       .rename(columns={'endTimeIso': 'endTimeIso_y'}),
    left_on='endTimeIso', right_on='endTimeIso_y',
    direction='nearest', tolerance=pd.Timedelta('1s'),
    suffixes=(None, '_y')
)

out = out1.combine_first(out2).set_index('index')

print(out)

相关问题