pandas 如何获取特定时间间隔内的最新行以形成 Dataframe ?

c9qzyr3d  于 2022-12-21  发布在  其他
关注(0)|答案(2)|浏览(218)

假设3个孩子正在比赛,看谁能在几天内卖出最多的糖果、巧克力棒和饼干。他们在当天的08:15:00(上午8:15)开始比赛,并同意将他们的销售输入到跟踪器中,如下面的数据框所示:

import pandas as pd

df = pd.DataFrame({
    'Name': ['Harvey', 'Khala', 'Gaddy', 'Harvey', 'Khala', 'Gaddy', 'Harvey', 'Khala', 'Gaddy', 'Khala', 'Harvey', 'Gaddy'],
    'Timestamp': ['2022-01-01 08:17:23.12', '2022-01-01 08:22:58.76', '2022-01-01 08:19:02.57', '2022-01-01 08:55:43.99','2022-01-01 08:41:23.10', '2022-01-01 09:14:59.99', '2022-01-01 09:15:02.02', '2022-01-01 09:44:43.30','2022-01-01 09:54:23.71', '2022-01-01 10:15:00.00', '2022-01-01 10:15:02.99', '2022-01-01 10:19:43.52'],
    'Candy': [2, 1, 3, 3, 5, 4, 6, 6, 4, 10, 9, 14],
    'Chocolate Bars': [4, np.nan, 6, 7, 8, 6, 7, 13, 10, 19, 11, 11],
    'Cookies': [1, 1, 4, 2, 4, 5, 5, 8, 11, 8, 15, 17]
})
    
    Name    Timestamp   Candy   Chocolate Bars  Cookies
0   Harvey  2022-01-01 08:17:23.12  2   4   1
1   Khala   2022-01-01 08:22:58.76  1   NaN 1
2   Gaddy   2022-01-01 08:19:02.57  3   6   4
3   Harvey  2022-01-01 08:55:43.99  3   7   2
4   Khala   2022-01-01 08:41:23.10  5   8   4
5   Gaddy   2022-01-01 09:14:59.99  4   6   5
6   Harvey  2022-01-01 09:15:02.02  6   7   5
7   Khala   2022-01-01 09:44:43.30  6   13  8
8   Gaddy   2022-01-01 09:54:23.71  4   10  11
9   Khala   2022-01-01 10:15:00.00  10  19  8
10  Harvey  2022-01-01 10:15:02.99  9   11  15
11  Gaddy   2022-01-01 10:19:43.52  14  11  17

现在,我们的目的是创建一个新的数据框,以1小时为间隔(一小时的窗口示例为08:15:00.00 - 09:14:59.99)捕获每个孩子的最新销售以及捕获他们的窗口。因此,数据框将如下所示:

Name    Window  Timestamp   Candy   Chocolate Bars  Cookies
1   Harvey  09:15:00.00 2022-01-01 08:55:43.99  3   7   2
2   Khala   09:15:00.00 2022-01-01 08:41:23.10  5   8   4
3   Gaddy   09:15:00.00 2022-01-01 09:14:59.99  4   6   5
4   Harvey  10:15:00.00 2022-01-01 09:15:02.02  6   7   5
5   Khala   10:15:00.00 2022-01-01 09:44:43.30  6   13  8
6   Gaddy   10:15:00.00 2022-01-01 09:54:23.71  4   10  11
7   Khala   11:15:00.00 2022-01-01 10:15:00.00  10  19  8
8   Harvey  11:15:00.00 2022-01-01 10:15:02.99  9   11  15
9   Gaddy   11:15:00.00 2022-01-01 10:19:43.52  14  11  17
dfuffjeb

dfuffjeb1#

我要做的第一件事是将timestamp列转换为datetime,以使其更易于使用

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'Name': ['Harvey', 'Khala', 'Gaddy', 'Harvey', 'Khala', 'Gaddy', 'Harvey', 'Khala', 'Gaddy', 'Khala', 'Harvey', 'Gaddy'],
    'Timestamp': ['2022-01-01 08:17:23.12', '2022-01-01 08:22:58.76', '2022-01-01 08:19:02.57', '2022-01-01 08:55:43.99','2022-01-01 08:41:23.10', '2022-01-01 09:14:59.99', '2022-01-01 09:15:02.02', '2022-01-01 09:44:43.30','2022-01-01 09:54:23.71', '2022-01-01 10:15:00.00', '2022-01-01 10:15:02.99', '2022-01-01 10:19:43.52'],
    'Candy': [2, 1, 3, 3, 5, 4, 6, 6, 4, 10, 9, 14],
    'Chocolate Bars': [4, np.nan, 6, 7, 8, 6, 7, 13, 10, 19, 11, 11],
    'Cookies': [1, 1, 4, 2, 4, 5, 5, 8, 11, 8, 15, 17]
})
df["Timestamp"] = pd.to_datetime(df["Timestamp"])

然后,下一步是添加窗柱

# Get window
window_start = pd.to_timedelta("15min")
df["Window"] = (df["Timestamp"] - window_start).dt.floor("1h") + window_start

你可以先把时间移动15分钟,这样只需要把小时数加回去,如果你不想把日期留在窗口里,也可以这样做。
最后一步是对时间戳进行排序,并仅为每个窗口和每个人保留一个时间戳

# Keep only one row per window and person
df = df.sort_values("Timestamp", ascending=False).groupby(["Name", "Window"]).head(1)
df = df.sort_index().reset_index(drop=True)
whitzsjs

whitzsjs2#

Timestamp数据行转型为datetime之后,您可以结合使用DataFrame.groupby方法和.resample方法:

df["Timestamp"] = pd.to_datetime(df["Timestamp")
cols = ['Candy', 'Chocolate Bars', 'Cookies']

(df
.groupby("Name")
.resample("60T", offset="15T", on="Timestamp", label="right")
.last()
.loc[:, cols]
.reset_index()
.sort_values("Timestamp")
)

Name    Timestamp   Candy   Chocolate Bars  Cookies
0   Gaddy   2022-01-01 09:15:00 4   6.0      5
3   Harvey  2022-01-01 09:15:00 3   7.0      2
6   Khala   2022-01-01 09:15:00 5   8.0      4
1   Gaddy   2022-01-01 10:15:00 4   10.0    11
4   Harvey  2022-01-01 10:15:00 6   7.0      5
7   Khala   2022-01-01 10:15:00 6   13.0     8
2   Gaddy   2022-01-01 11:15:00 14  11.0    17
5   Harvey  2022-01-01 11:15:00 9   11.0    15
8   Khala   2022-01-01 11:15:00 10  19.0     8

相关问题