我有以下数据框,
status1 status2 location1 datetime1 grouping service capacity
0 xx xx xx 01-01-2020 11:50:00 xx xx 150
1 xx xx xx 01-01-2020 11:57:00 xx xx 200
2 xx xx xx 01-01-2020 11:59:00 xx xx 200
3 xx xx xx 01-01-2020 13:59:00 xx xx 200
...
x xx xx xx 01-02-2020 13:59:00 xx xx 300
x xx xx xx 01-03-2020 13:04:00 xx xx 300
...
x xx xx xx 07-03-2021 13:04:00 xx xx 400
x xx xx xx 07-03-2021 13:04:00 xx xx 300
x xx xx xx 07-03-2021 13:04:00 xx xx 300
我想在滚动的基础上总结每周的产能。
例如,我想要
WeekStartingSunday countofstatus1 sumofcapacity
0 1 50 3000
1 2 30 2000
2 3 ... ...
3 4 ... ...
...
因此,第1周包含2020年第一周内所有日期的总和。该周将从星期日开始。我还想为星期一、星期二等其他日子创建表。
我试过了 df.groupby('capacity').rolling(7).sum() but it just sums up every 7 rows i think.
我也试过,,
group = pd.pivot_table(df,columns='capacity', index='datetime1')
group2 = group.resample('D').sum().rolling(7).sum()
group2.sort_index().head(15)
但是看起来是这样的,,
capacity 1.0 2.0 2.25 2.40 3.0....
datetime1
2020-01-01 NaN NaN NaN NaN NaN ...
2021-01-02 NaN NaN NaN NaN NaN ...
...
2021-01-07 322.1 326.5 117 0.0 275.2 ...
...
这可以在Pandas身上实现吗?
1条答案
按热度按时间ilmyapht1#
您可以尝试:
将日期字符串转换为日期时间格式。假设日期在
dd-mm-YYYY
```Use dayfirst=True for dates in dd-mm-YYYY
df['datetimea1'] = pd.to_datetime(df['datetimea1'], dayfirst=True)
df['WeekStartingSunday'] = df['datetimea1'].dt.isocalendar().week
df_out = (df.groupby('WeekStartingSunday', as_index=False)
.agg(countofstatus1=('status1', 'count'), sumofcapacity=('capacity', 'sum'))
)
status1 status2 location1 datetimea1 grouping service capacity
0 xx xx xx 01-01-2020 11:50:00 xx xx 150
1 xx xx xx 01-01-2020 11:57:00 xx xx 200
2 xx xx xx 01-01-2020 11:59:00 xx xx 200
3 xx xx xx 01-01-2020 13:59:00 xx xx 200
4 xx xx xx 01-02-2020 13:59:00 xx xx 300
5 xx xx xx 01-03-2020 13:04:00 xx xx 300
6 xx xx xx 07-03-2021 13:04:00 xx xx 400
7 xx xx xx 07-03-2021 13:04:00 xx xx 300
8 xx xx xx 07-03-2021 13:04:00 xx xx 300
print(df_out)
WeekStartingSunday countofstatus1 sumofcapacity
0 1 4 750
1 5 1 300
2 9 4 1300