pandas 根据条件获取数据框中的最后一条记录和倒数第三条记录

s4n0splo  于 2022-12-09  发布在  其他
关注(0)|答案(2)|浏览(223)

I have a large dataframe (extract below) and want to create a new dataframe containing the last "In-progress" row and the 3rd last "In-progress" row based on the Time for each ID.
I am new to Pandas and can't work out how to do it. Any help would be appreciated.
Dataframe:
| Time | State | ID | Ref | Name |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 10:00 | In-progress | 54887 | 1 | Jim |
| 10:00 | In-progress | 54887 | 2 | Jon |
| 10:00 | In-progress | 54887 | 3 | Rob |
| 10:00 | In-progress | 54887 | 4 | Sam |
| 11:00 | In-progress | 54887 | 1 | Jim |
| 11:00 | In-progress | 54887 | 2 | Jon |
| 11:00 | In-progress | 54887 | 3 | Rob |
| 11:00 | In-progress | 54887 | 4 | Sam |
| 12:00 | In-progress | 54887 | 1 | Jim |
| 12:00 | In-progress | 54887 | 2 | Jon |
| 12:00 | In-progress | 54887 | 3 | Rob |
| 12:00 | In-progress | 54887 | 4 | Sam |
| 13:00 | Done | 54887 | 1 | Jim |
| 13:00 | Done | 54887 | 2 | Jon |
| 13:00 | Done | 54887 | 3 | Rob |
| 13:00 | Done | 54887 | 4 | Sam |
| 10:00 | In-progress | 65228 | a | Anya |
| 10:00 | In-progress | 65228 | b | Lot |
| 10:00 | In-progress | 65228 | c | Ted |
| 10:00 | In-progress | 65228 | d | Tom |
| 11:00 | In-progress | 65228 | a | Anya |
| 11:00 | In-progress | 65228 | b | Lot |
| 11:00 | In-progress | 65228 | c | Ted |
| 11:00 | In-progress | 65228 | d | Tom |
| 12:00 | In-progress | 65228 | a | Anya |
| 12:00 | In-progress | 65228 | b | Lot |
| 12:00 | In-progress | 65228 | c | Ted |
| 12:00 | In-progress | 65228 | d | Tom |
| 13:00 | Done | 65228 | a | Anya |
| 13:00 | Done | 65228 | b | Lot |
| 13:00 | Done | 65228 | c | Ted |
| 13:00 | Done | 65228 | d | Tom |
Desired Result:
| Time | State | ID | Ref | Name |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 10:00 | In-progress | 54887 | 1 | Jim |
| 10:00 | In-progress | 54887 | 2 | Jon |
| 10:00 | In-progress | 54887 | 3 | Rob |
| 10:00 | In-progress | 54887 | 4 | Sam |
| 12:00 | In-progress | 54887 | 1 | Jim |
| 12:00 | In-progress | 54887 | 2 | Jon |
| 12:00 | In-progress | 54887 | 3 | Rob |
| 12:00 | In-progress | 54887 | 4 | Sam |
| 10:00 | In-progress | 65228 | a | Anya |
| 10:00 | In-progress | 65228 | b | Lot |
| 10:00 | In-progress | 65228 | c | Ted |
| 10:00 | In-progress | 65228 | d | Tom |
| 12:00 | In-progress | 65228 | a | Anya |
| 12:00 | In-progress | 65228 | b | Lot |
| 12:00 | In-progress | 65228 | c | Ted |
| 12:00 | In-progress | 65228 | d | Tom |

hzbexzde

hzbexzde1#

倒数第三

使用groupby.tail

out = (df[df['State'].eq('In-progress')]
       .groupby(['Time', 'ID']).tail(3)
      )

输出量:

Time        State     ID Ref Name
1   10:00  In-progress  54887   2  Jon
2   10:00  In-progress  54887   3  Rob
3   10:00  In-progress  54887   4  Sam
5   11:00  In-progress  54887   2  Jon
6   11:00  In-progress  54887   3  Rob
7   11:00  In-progress  54887   4  Sam
9   12:00  In-progress  54887   2  Jon
10  12:00  In-progress  54887   3  Rob
11  12:00  In-progress  54887   4  Sam
17  10:00  In-progress  65228   b  Lot
18  10:00  In-progress  65228   c  Ted
19  10:00  In-progress  65228   d  Tom
21  11:00  In-progress  65228   b  Lot
22  11:00  In-progress  65228   c  Ted
23  11:00  In-progress  65228   d  Tom
25  12:00  In-progress  65228   b  Lot
26  12:00  In-progress  65228   c  Ted
27  12:00  In-progress  65228   d  Tom
最后一个和倒数第三个(不包括倒数第二个)

使用groupby.cumcount

idx = (df[df['State'].eq('In-progress')]
       .groupby(['Time', 'ID']).cumcount(ascending=False)
       .isin([0,2]).loc[lambda x: x]
       .index
      )

out = df.loc[idx]

输出量:

Time        State     ID Ref Name
1   10:00  In-progress  54887   2  Jon
3   10:00  In-progress  54887   4  Sam
5   11:00  In-progress  54887   2  Jon
7   11:00  In-progress  54887   4  Sam
9   12:00  In-progress  54887   2  Jon
11  12:00  In-progress  54887   4  Sam
17  10:00  In-progress  65228   b  Lot
19  10:00  In-progress  65228   d  Tom
21  11:00  In-progress  65228   b  Lot
23  11:00  In-progress  65228   d  Tom
25  12:00  In-progress  65228   b  Lot
27  12:00  In-progress  65228   d  Tom
62lalag4

62lalag42#

获取每个ID最后和最后3次

df1 = (df[df['State'].eq('In-progress')]
       .drop_duplicates(['ID', 'Time'])
       .groupby('ID')['Time'].nth([-3, -1]).reset_index())

df1

ID      Time
0   54887   10:00
1   54887   12:00
2   65228   10:00
3   65228   12:00

merge过滤df

df1.merge(df, how='left').reindex(columns=df.columns)

实验结果:

Time    State       ID      Ref Name
0   10:00   In-progress 54887   1   Jim
1   10:00   In-progress 54887   2   Jon
2   10:00   In-progress 54887   3   Rob
3   10:00   In-progress 54887   4   Sam
4   12:00   In-progress 54887   1   Jim
5   12:00   In-progress 54887   2   Jon
6   12:00   In-progress 54887   3   Rob
7   12:00   In-progress 54887   4   Sam
8   10:00   In-progress 65228   a   Anya
9   10:00   In-progress 65228   b   Lot
10  10:00   In-progress 65228   c   Ted
11  10:00   In-progress 65228   d   Tom
12  12:00   In-progress 65228   a   Anya
13  12:00   In-progress 65228   b   Lot
14  12:00   In-progress 65228   c   Ted
15  12:00   In-progress 65228   d   Tom

相关问题