I am trying to do some time trend analysis of cyclone data but am having difficulties getting the pd dataframe into a format where I can do that...
Right now the dataframe looks like this:
| | date | season | stormid |
| ------------ | ------------ | ------------ | ------------ |
| 0 | 1970-01-04 12:00:00 | 1970 | SH071970 |
| 1 | 1970-01-04 18:00:00 | 1970 | SH071970 |
| 2 | 1970-01-05 00:00:00 | 1970 | SH071970 |
| 3 | 1970-01-05 06:00:00 | 1970 | SH071970 |
| 4 | 1970-01-05 12:00:00 | 1970 | SH071970 |
| 5 | 1970-01-05 18:00:00 | 1970 | SH071970 |
| 6 | 1970-01-06 00:00:00 | 1970 | SH071970 |
| 7 | 1970-01-06 06:00:00 | 1970 | SH071970 |
| 8 | 1970-01-06 12:00:00 | 1970 | SH071970 |
| 9 | 1970-01-06 18:00:00 | 1970 | SH071970 |
| 10 | 1970-01-07 00:00:00 | 1970 | SH071970 |
And I would like to have all the entries across days somehow combined into a total day count, for example like this:
| | start date | end date | days | season | stormid |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 0 | 1970-01-04 | 1970-01-11 | 15 | 1970 | SH071870 |
So on and so forth for all the different stormids.
1条答案
按热度按时间56lgkhnf1#
如果要获取每组最小值和最大值之间的天数,请使用
numpy.ptp
和Timedelta.days
:lambda s: np.ptp(s).days
等于lambda s: (s.max()-s.min()).days
?如果您想向上舍入,请使用lambda s: (s.max()-s.min()).ceil('D').days
(此处为3天)。*输出量:
备选项
输出量:
计数
如果您只是想要每组的行数(实际上不是“天”):
输出量: