pandas 将一段时间内的数据点转换为总计数

bsxbgnwa 于 2022-12-09 发布在其他

关注(0)|答案(1)|浏览(103)

I am trying to do some time trend analysis of cyclone data but am having difficulties getting the pd dataframe into a format where I can do that...
Right now the dataframe looks like this:
| | date | season | stormid |
| ------------ | ------------ | ------------ | ------------ |
| 0 | 1970-01-04 12:00:00 | 1970 | SH071970 |
| 1 | 1970-01-04 18:00:00 | 1970 | SH071970 |
| 2 | 1970-01-05 00:00:00 | 1970 | SH071970 |
| 3 | 1970-01-05 06:00:00 | 1970 | SH071970 |
| 4 | 1970-01-05 12:00:00 | 1970 | SH071970 |
| 5 | 1970-01-05 18:00:00 | 1970 | SH071970 |
| 6 | 1970-01-06 00:00:00 | 1970 | SH071970 |
| 7 | 1970-01-06 06:00:00 | 1970 | SH071970 |
| 8 | 1970-01-06 12:00:00 | 1970 | SH071970 |
| 9 | 1970-01-06 18:00:00 | 1970 | SH071970 |
| 10 | 1970-01-07 00:00:00 | 1970 | SH071970 |
And I would like to have all the entries across days somehow combined into a total day count, for example like this:
| | start date | end date | days | season | stormid |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 0 | 1970-01-04 | 1970-01-11 | 15 | 1970 | SH071870 |
So on and so forth for all the different stormids.

pandas

来源：https://stackoverflow.com/questions/74741962/converting-data-points-over-time-into-total-count

1条答案

按热度按时间

56lgkhnf1#

如果要获取每组最小值和最大值之间的天数，请使用numpy.ptp和Timedelta.days：

import numpy as np

df['date'] = pd.to_datetime(df['date'])

df['date'] = pd.to_datetime(df['date'])

out = (df.groupby(['season', 'stormid'], as_index=False)
         .agg(days=('date', lambda s: np.ptp(s).days),
              start_date=('date', lambda s: s.min().normalize()),
              end_date=('date', lambda s: s.max().normalize())
             )
       )

注意：lambda s: np.ptp(s).days等于lambda s: (s.max()-s.min()).days？如果您想向上舍入，请使用lambda s: (s.max()-s.min()).ceil('D').days（此处为3天）。*

输出量：

season   stormid  days start_date   end_date
0    1970  SH071970     2 1970-01-04 1970-01-07

备选项

df['date'] = pd.to_datetime(df['date'])

out = (df.groupby(['season', 'stormid'], as_index=False)
         .agg(start_date=('date', lambda s: s.min().normalize()),
              end_date=('date', lambda s: s.max().normalize())
             )
         .assign(days=lambda d: d['end_date'].sub(d['start_date']).dt.days)
       )

输出量：

season   stormid start_date   end_date  days
0    1970  SH071970 1970-01-04 1970-01-07     3

计数

如果您只是想要每组的行数（实际上不是“天”）：

out = (df.groupby(['season', 'stormid'], as_index=False)
         .agg(days=('date', 'size'))
       )

输出量：

season   stormid  days
0    1970  SH071970    11

赞(0）回复(0）举报 2022-12-09

我来回答

pandas 将一段时间内的数据点转换为总计数

1条答案

备选项

计数

相关问题

热门标签

最新问答