在Pandas中基于条件查找最长连续链

cngwdvgl  于 2023-02-28  发布在  其他
关注(0)|答案(3)|浏览(186)
  1. df=pd.DataFrame({"date":['2021-06-02', '2021-06-07', '2021-06-15', '2021-06-19', '2021-06-23', '2021-09-05', '2021-10-08', '2021-11-11', '2022-03-26', '1950-05-28', '1953-10-11', '1954-06-01', '1955-05-28', '1955-10-30', '1956-12-16', '1957-10-27', '1958-02-20', '1959-11-21', '1960-02-07', '1960-07-03'],
  2. "home_team":['Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana'],
  3. "away_team":['Denmark', 'Latvia', 'France', 'Portugal', 'Hungary', 'Armenia', 'Romania', 'Liechtenstein', 'Israel', 'Nigeria', 'Nigeria', 'Sierra Leone', 'Nigeria', 'Nigeria', 'Sierra Leone', 'Nigeria', 'Togo', 'Nigeria', 'Sierra Leone', 'Egypt'],
  4. "home_score":[1, 7, 0, 4, 2, 6, 2, 9, 2, 1, 1, 2, 1, 7, 4, 3, 5, 5, 6, 2],
  5. "away_score":[1, 1, 1, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 3, 2, 2, 2, 2]
  6. })
  7. date home_team away_team home_score away_score
  8. 0 2021-06-02 Germany Denmark 1 1
  9. 1 2021-06-07 Germany Latvia 7 1
  10. 2 2021-06-15 Germany France 0 1
  11. 3 2021-06-19 Germany Portugal 4 2
  12. 4 2021-06-23 Germany Hungary 2 2
  13. 5 2021-09-05 Germany Armenia 6 0
  14. 6 2021-10-08 Germany Romania 2 1
  15. 7 2021-11-11 Germany Liechtenstein 9 0
  16. 8 2022-03-26 Germany Israel 2 0
  17. 9 1950-05-28 Ghana Nigeria 1 0
  18. 10 1953-10-11 Ghana Nigeria 1 0
  19. 11 1954-06-01 Ghana Sierra Leone 2 0
  20. 12 1955-05-28 Ghana Nigeria 1 0
  21. 13 1955-10-30 Ghana Nigeria 7 0
  22. 14 1956-12-16 Ghana Sierra Leone 4 3
  23. 15 1957-10-27 Ghana Nigeria 3 3
  24. 16 1958-02-20 Ghana Togo 5 2
  25. 17 1959-11-21 Ghana Nigeria 5 2
  26. 18 1960-02-07 Ghana Sierra Leone 6 2
  27. 19 1960-07-03 Ghana Egypt 2 2

在这个Pandas数据框中,我想计算每支球队最长的主场连胜长度。我希望我的结果如下:

  1. Team HomeWinChain
  2. Germany 4
  3. Ghana 6

我试过不同的方法,但都得不到想要的结果。

kx5bkwkv

kx5bkwkv1#

您可以使用双groupby,首先确定获胜的延伸,然后获得最大的延伸:

  1. # is the row a win?
  2. win = df['home_score'].gt(df['away_score'])
  3. # subselect wins, groupby consecutive wins, get size, get max size
  4. (win[win]
  5. .groupby([df['home_team'], (~win).cumsum()])
  6. .size()
  7. .groupby(level='home_team').max()
  8. )

输出:

  1. home_team
  2. Germany 4
  3. Ghana 6
  4. dtype: int64

所有伸展运动的中间阶段:

  1. (win[win]
  2. .groupby([df['home_team'], (~win).cumsum()])
  3. .size()
  4. #.groupby(level='home_team').max()
  5. )

输出:

  1. home_team
  2. Germany 1 1
  3. 2 1
  4. 3 4
  5. Ghana 3 6
  6. 4 3
  7. dtype: int64
展开查看全部
sigwle7e

sigwle7e2#

  1. import pandas as pd
  2. df=pd.DataFrame({"date":['2021-06-02', '2021-06-07', '2021-06-15', '2021-06-19', '2021-06-23', '2021-09-05', '2021-10-08', '2021-11-11', '2022-03-26', '1950-05-28', '1953-10-11', '1954-06-01', '1955-05-28', '1955-10-30', '1956-12-16', '1957-10-27', '1958-02-20', '1959-11-21', '1960-02-07', '1960-07-03'],
  3. "home_team":['Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana', 'Ghana'],
  4. "away_team":['Denmark', 'Latvia', 'France', 'Portugal', 'Hungary', 'Armenia', 'Romania', 'Liechtenstein', 'Israel', 'Nigeria', 'Nigeria', 'Sierra Leone', 'Nigeria', 'Nigeria', 'Sierra Leone', 'Nigeria', 'Togo', 'Nigeria', 'Sierra Leone', 'Egypt'],
  5. "home_score":[1, 7, 0, 4, 2, 6, 2, 9, 2, 1, 1, 2, 1, 7, 4, 3, 5, 5, 6, 2],
  6. "away_score":[1, 1, 1, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 3, 2, 2, 2, 2]
  7. })
  8. df['HomeWinChain'] = 0
  9. cnt = 1
  10. for i in range(len(df[df['home_team'] == 'Germany'])):
  11. if df['home_score'][i] > df['away_score'][i]:
  12. df['HomeWinChain'][i] = cnt
  13. cnt += 1
  14. else:
  15. cnt = 1
  16. cnt = 1
  17. for i in range(len(df[df['home_team'] == 'Germany']), len(df)):
  18. if df['home_score'][i] > df['away_score'][i]:
  19. df['HomeWinChain'][i] = cnt
  20. cnt += 1
  21. else:
  22. cnt = 1
  23. df[['home_team','HomeWinChain']].groupby('home_team').aggregate({'HomeWinChain':'max'})

你可以做得很好。你可以考虑如何减少代码。

展开查看全部
axr492tv

axr492tv3#

  1. def function1(dd:pd.DataFrame):
  2. col1=dd.home_score.sub(dd.away_score).gt(0)
  3. col2=col1.eq(False).cumsum()
  4. return dd.assign(col1=col1).groupby(col2).col1.sum().max()
  5. df.groupby('home_team').apply(function1).reset_index()

输出:

  1. home_team
  2. Germany 4
  3. Ghana 6

相关问题