I have a large dataframe (extract below) and want to create a new dataframe containing the last "In-progress" row and the 3rd last "In-progress" row based on the Time for each ID.
I am new to Pandas and can't work out how to do it. Any help would be appreciated.
Dataframe:
| Time | State | ID | Ref | Name |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 10:00 | In-progress | 54887 | 1 | Jim |
| 10:00 | In-progress | 54887 | 2 | Jon |
| 10:00 | In-progress | 54887 | 3 | Rob |
| 10:00 | In-progress | 54887 | 4 | Sam |
| 11:00 | In-progress | 54887 | 1 | Jim |
| 11:00 | In-progress | 54887 | 2 | Jon |
| 11:00 | In-progress | 54887 | 3 | Rob |
| 11:00 | In-progress | 54887 | 4 | Sam |
| 12:00 | In-progress | 54887 | 1 | Jim |
| 12:00 | In-progress | 54887 | 2 | Jon |
| 12:00 | In-progress | 54887 | 3 | Rob |
| 12:00 | In-progress | 54887 | 4 | Sam |
| 13:00 | Done | 54887 | 1 | Jim |
| 13:00 | Done | 54887 | 2 | Jon |
| 13:00 | Done | 54887 | 3 | Rob |
| 13:00 | Done | 54887 | 4 | Sam |
| 10:00 | In-progress | 65228 | a | Anya |
| 10:00 | In-progress | 65228 | b | Lot |
| 10:00 | In-progress | 65228 | c | Ted |
| 10:00 | In-progress | 65228 | d | Tom |
| 11:00 | In-progress | 65228 | a | Anya |
| 11:00 | In-progress | 65228 | b | Lot |
| 11:00 | In-progress | 65228 | c | Ted |
| 11:00 | In-progress | 65228 | d | Tom |
| 12:00 | In-progress | 65228 | a | Anya |
| 12:00 | In-progress | 65228 | b | Lot |
| 12:00 | In-progress | 65228 | c | Ted |
| 12:00 | In-progress | 65228 | d | Tom |
| 13:00 | Done | 65228 | a | Anya |
| 13:00 | Done | 65228 | b | Lot |
| 13:00 | Done | 65228 | c | Ted |
| 13:00 | Done | 65228 | d | Tom |
Desired Result:
| Time | State | ID | Ref | Name |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 10:00 | In-progress | 54887 | 1 | Jim |
| 10:00 | In-progress | 54887 | 2 | Jon |
| 10:00 | In-progress | 54887 | 3 | Rob |
| 10:00 | In-progress | 54887 | 4 | Sam |
| 12:00 | In-progress | 54887 | 1 | Jim |
| 12:00 | In-progress | 54887 | 2 | Jon |
| 12:00 | In-progress | 54887 | 3 | Rob |
| 12:00 | In-progress | 54887 | 4 | Sam |
| 10:00 | In-progress | 65228 | a | Anya |
| 10:00 | In-progress | 65228 | b | Lot |
| 10:00 | In-progress | 65228 | c | Ted |
| 10:00 | In-progress | 65228 | d | Tom |
| 12:00 | In-progress | 65228 | a | Anya |
| 12:00 | In-progress | 65228 | b | Lot |
| 12:00 | In-progress | 65228 | c | Ted |
| 12:00 | In-progress | 65228 | d | Tom |
2条答案
按热度按时间hzbexzde1#
倒数第三
使用
groupby.tail
:输出量:
最后一个和倒数第三个(不包括倒数第二个)
使用
groupby.cumcount
:输出量:
62lalag42#
获取每个ID最后和最后3次
df1
按
merge
过滤df
实验结果: