在Pandas中基于条件查找最长连续链

cngwdvgl  于 2023-02-28  发布在  其他
关注(0)|答案(3)|浏览(155)
df=pd.DataFrame({"date":['2021-06-02', '2021-06-07', '2021-06-15', '2021-06-19', '2021-06-23', '2021-09-05', '2021-10-08', '2021-11-11', '2022-03-26', '1950-05-28', '1953-10-11', '1954-06-01', '1955-05-28', '1955-10-30', '1956-12-16', '1957-10-27', '1958-02-20', '1959-11-21', '1960-02-07', '1960-07-03'],
             "home_team":['Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana'],
             "away_team":['Denmark', 'Latvia', 'France', 'Portugal', 'Hungary', 'Armenia', 'Romania', 'Liechtenstein', 'Israel', 'Nigeria', 'Nigeria', 'Sierra Leone', 'Nigeria', 'Nigeria', 'Sierra Leone', 'Nigeria', 'Togo', 'Nigeria', 'Sierra Leone', 'Egypt'],
             "home_score":[1, 7, 0, 4, 2, 6, 2, 9, 2, 1, 1, 2, 1, 7, 4, 3, 5, 5, 6, 2],
              "away_score":[1, 1, 1, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 3, 2, 2, 2, 2]
             })

    date    home_team   away_team   home_score  away_score
0   2021-06-02  Germany Denmark 1   1
1   2021-06-07  Germany Latvia  7   1
2   2021-06-15  Germany France  0   1
3   2021-06-19  Germany Portugal    4   2
4   2021-06-23  Germany Hungary 2   2
5   2021-09-05  Germany Armenia 6   0
6   2021-10-08  Germany Romania 2   1
7   2021-11-11  Germany Liechtenstein   9   0
8   2022-03-26  Germany Israel  2   0
9   1950-05-28  Ghana   Nigeria 1   0
10  1953-10-11  Ghana   Nigeria 1   0
11  1954-06-01  Ghana   Sierra Leone    2   0
12  1955-05-28  Ghana   Nigeria 1   0
13  1955-10-30  Ghana   Nigeria 7   0
14  1956-12-16  Ghana   Sierra Leone    4   3
15  1957-10-27  Ghana   Nigeria 3   3
16  1958-02-20  Ghana   Togo    5   2
17  1959-11-21  Ghana   Nigeria 5   2
18  1960-02-07  Ghana   Sierra Leone    6   2
19  1960-07-03  Ghana   Egypt   2   2

在这个Pandas数据框中,我想计算每支球队最长的主场连胜长度。我希望我的结果如下:

Team   HomeWinChain
Germany    4
Ghana      6

我试过不同的方法,但都得不到想要的结果。

kx5bkwkv

kx5bkwkv1#

您可以使用双groupby,首先确定获胜的延伸,然后获得最大的延伸:

# is the row a win?
win = df['home_score'].gt(df['away_score'])

# subselect wins, groupby consecutive wins, get size, get max size
(win[win]
 .groupby([df['home_team'], (~win).cumsum()])
 .size()
 .groupby(level='home_team').max()
)

输出:

home_team
Germany    4
Ghana      6
dtype: int64

所有伸展运动的中间阶段:

(win[win]
 .groupby([df['home_team'], (~win).cumsum()])
 .size()
 #.groupby(level='home_team').max()
)

输出:

home_team   
Germany    1    1
           2    1
           3    4
Ghana      3    6
           4    3
dtype: int64
sigwle7e

sigwle7e2#

import pandas as pd

df=pd.DataFrame({"date":['2021-06-02', '2021-06-07', '2021-06-15', '2021-06-19', '2021-06-23', '2021-09-05', '2021-10-08', '2021-11-11', '2022-03-26', '1950-05-28', '1953-10-11', '1954-06-01', '1955-05-28', '1955-10-30', '1956-12-16', '1957-10-27', '1958-02-20', '1959-11-21', '1960-02-07', '1960-07-03'],
             "home_team":['Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana'],
             "away_team":['Denmark', 'Latvia', 'France', 'Portugal', 'Hungary', 'Armenia', 'Romania', 'Liechtenstein', 'Israel', 'Nigeria', 'Nigeria', 'Sierra Leone', 'Nigeria', 'Nigeria', 'Sierra Leone', 'Nigeria', 'Togo', 'Nigeria', 'Sierra Leone', 'Egypt'],
             "home_score":[1, 7, 0, 4, 2, 6, 2, 9, 2, 1, 1, 2, 1, 7, 4, 3, 5, 5, 6, 2],
              "away_score":[1, 1, 1, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 3, 2, 2, 2, 2]
             })

df['HomeWinChain'] = 0
cnt = 1
for i in range(len(df[df['home_team'] == 'Germany'])):
    if df['home_score'][i] > df['away_score'][i]:
        df['HomeWinChain'][i] = cnt
        cnt += 1
    else:
        cnt = 1
cnt = 1        
for i in range(len(df[df['home_team'] == 'Germany']), len(df)):
    if df['home_score'][i] > df['away_score'][i]:
        df['HomeWinChain'][i] = cnt
        cnt += 1
    else:
        cnt = 1

df[['home_team','HomeWinChain']].groupby('home_team').aggregate({'HomeWinChain':'max'})

你可以做得很好。你可以考虑如何减少代码。

axr492tv

axr492tv3#

def function1(dd:pd.DataFrame):
    col1=dd.home_score.sub(dd.away_score).gt(0)
    col2=col1.eq(False).cumsum()
    return dd.assign(col1=col1).groupby(col2).col1.sum().max()

df.groupby('home_team').apply(function1).reset_index()

输出:

home_team
Germany    4
Ghana      6

相关问题