pandas 基于panda中的另一个datetime数组对datetime进行分组- Python

iaqfqrcu 于 2023-02-27 发布在 Python

关注(0)|答案(1)|浏览(147)

假设我有两个panda Dataframe ，包含两列datetime，如下所示：
| 列_1|第2列|
| - ------|- ------|
| 2023年1月2日|2023年1月2日9时|
| 2023年1月3日|2023年1月2日20时|
| 2023年1月4日|2023年1月3日1时|
| 2023年1月5日|2023年1月4日9时|
| 2023年1月8日|2023年1月5日16时|
| | 2023年1月6日9时|
| | 2023年1月8日12时|
我想在Col_1日期时间的基础上使用以下条件来拟合Col_2日期时间：
如果日期时间早于12pm，则如果Col_1具有相同的日期，则保持相同的日期，如果不具有相同的日期，则切换到下一Col_1日期时间。如果时间过去或等于12pm，则切换到基于Col_1日期时间的下一日期。
但是，如果最后一个值超过12pm，并且Col_1中没有相关日期，则只需根据Col_1中的最后一天添加一天
例如，我希望Col_2如下所示：
| 列_1|第2列|
| - ------|- ------|
| 2023年1月2日|2023年1月2日|
| 2023年1月3日|2023年1月3日|
| 2023年1月4日|2023年1月3日|
| 2023年1月5日|2023年1月4日|
| 2023年1月8日|2023年1月8日|
| | 2023年1月8日|
| | 2023年1月9日|
Pandas里有什么函数可以这样做吗？或者我怎么写代码？

pandas

来源：https://stackoverflow.com/questions/75559862/group-the-datetime-based-on-another-array-of-the-datetime-in-pandas-python

1条答案

按热度按时间

jtoj6r0c1#

您可以编写一个自定义函数来查找df1中的日期，然后将其应用于df2['Col_2']：

import pandas as pd

df1 = pd.DataFrame({
    'Col_1': pd.to_datetime([
        '2023-01-02',
        '2023-01-03',
        '2023-01-04',
        '2023-01-05',
        '2023-01-08'
    ])
})

df2 = pd.DataFrame({
    'Col_2': pd.to_datetime([
        '2023-01-02 9:00',
        '2023-01-02 20:00',
        '2023-01-03 1:00',
        '2023-01-04 9:00',
        '2023-01-05 16:00',
        '2023-01-06 9:00',
        '2023-01-08 12:00'])
})

## create a function that takes a datetime as an input, 
## and returns the correct datetime from df1 'Col_1' as a reference
def get_date_from_df1(ts, df1):
    ts_in_df1 = sum(df1['Col_1'] == ts.floor(freq='d'))
    ## get index of the the ts:
    if ts_in_df1 == 1:
        if ts.hour < 12:
            idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0]
        elif (ts.hour == 12) & (ts.minute == 0):
            idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0]
            if idx + 1 > df1.index[-1]:
                return ts.ceil(freq='d')
        else:
            idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0] + 1

    elif sum(df1['Col_1'] > ts.floor(freq='d')) == 1:
        idx = df1.loc[df1['Col_1'] > ts.floor(freq='d')].index[0]
    
    ## if the date isn't found in df1, we return None (to avoid an error)
    else:
        return None

    return df1['Col_1'].iloc[idx]

df2['Col_2'] = df2['Col_2'].apply(lambda ts: get_date_from_df1(ts, df1))

结果：

>>> df2
       Col_2
0 2023-01-02
1 2023-01-03
2 2023-01-03
3 2023-01-04
4 2023-01-08
5 2023-01-08
6 2023-01-09

赞(0）回复(0）举报 2023-02-27

我来回答

pandas 基于panda中的另一个datetime数组对datetime进行分组- Python

1条答案

相关问题

热门标签

最新问答