pandas 基于时间戳和日期范围合并2个字符串

gudnpqoy 于 2023-11-15 发布在其他

关注(0)|答案(2)|浏览(126)

我有两个字符串。第一个有日期范围，第二个有时间戳（都是ISO 8601格式）。
df1：
| 日期开始|date_end|状态|
| --|--|--|
| 2023-10-23T12-59-23-100Z| 2023-10-23T19-27-15-763Z|状态3|
| 2023-10-24T08-00-00-100Z| 2023-10-24T11-00-46-331Z|状态3|
df2：
| 时间戳|值|
| --|--|
| 2023-10-23T17-18-56-341Z| 1500 |
| 2023-10-24T11-46-31-887Z| 4671 |
我想得到df 2的过滤版本，这样如果df 1的至少一行存在end_date >= timestamp >= start_date的时间戳，则具有该时间戳的当前行将成为输出的一部分。
期望输出：
| 时间戳|值|
| --|--|
| 2023-10-23T17-18-56-341Z| 1500 |
我做了一个算法，通过在df 2上逐行迭代来实现这一点，但我希望有更好的方法。提前感谢！

pandas

来源：https://stackoverflow.com/questions/77451996/merge-2-dataframes-based-on-timestamp-and-date-range

2条答案

按热度按时间

zf2sa74q1#

你可以使用pandas中的merge函数来实现。试试这个：

import pandas as pd

# Assuming df1 has columns 'start_date' and 'end_date', and df2 has 'timestamp'

merged_df = pd.merge_asof(df2.sort_values('timestamp'), df1, left_on='timestamp', right_on='start_date')

result_df = merged_df[(merged_df['timestamp'] >= merged_df['start_date']) & (merged_df['timestamp'] <= merged_df['end_date'])]

output_df = result_df[['Image']]

字符串
这应该给你给予一个基于你的条件的df2的过滤版本（当然）

赞(0）回复(0）举报 2023-11-15

lmyy7pcs2#

我会使用numpy来进行比较（=>and<=）* 元素方式 *，然后只保留福尔斯在至少一个时间间隔内的时间戳（使用any）：

import numpy as np

# check my first edit if you need to parse the datetime columns

start, end = df1["date_start"].to_numpy(), df1["date_end"].to_numpy()
timestamps = df2["timestamp"].to_numpy()[:, None]

# does the timestamp falls between at least one interval [ST, END] ?
m = np.logical_and(timestamps >= start, timestamps <= end).any(axis=1)

out = df2.loc[m]

字符串
输出量：

print(out)

                  timestamp  value
0  2023-10-23T17-18-56-341Z   1500

[1 rows x 2 columns]

型

赞(0）回复(0）举报 2023-11-15

我来回答

pandas 基于时间戳和日期范围合并2个字符串

2条答案

相关问题

热门标签

最新问答