pandas 抓取包含联盟最大值的行[重复]

6ju8rftf  于 2024-01-04  发布在  其他
关注(0)|答案(2)|浏览(142)

此问题在此处已有答案

Custom sorting in pandas dataframe(5个答案)
2天前关闭。
我想抓住一个球员每年参加的最高联赛的行。我已经把我想比较的每个联赛从最低到最高排列了一个列表,我从下面的代码中创建了一个列表。

  1. import pandas as pd
  2. prospect = pd.read_html('https://www.baseball-reference.com/register/player.fcgi?id=bishop000hun')[0]
  3. levels = ['Rk', 'A-', 'A', 'A+', 'AA', 'AAA', 'MLB']
  4. prospect = prospect[['Year', 'Tm', 'Lg', 'Lev', 'PA']][prospect['Lev'].isin(levels)]
  5. prospect = prospect.sort_values('Lev', ascending = False).groupby(['Year']).tail(1)

字符串
但是,我生成了这个输出。

  1. Year Tm Lg Lev PA
  2. 6 2019 Salem-Keizer NORW A- 117
  3. 15 2022 Eugene NORW A+ 358
  4. 11 2021 San Jose LAW A 9


我所希望的是2021行给我的行包含A+级而不是A级.有人能帮助我如何解决这个错误?提前感谢.

uqzxnwby

uqzxnwby1#

你可以像这样实现它

  1. import pandas as pd
  2. # Example data
  3. data = {
  4. 'Year': [2019, 2019, 2022, 2022, 2021, 2021],
  5. 'Tm': ['A', 'B', 'C', 'D', 'E', 'F'],
  6. 'Lg': ['NORW', 'ABC', 'NORW', 'ABC', 'LAW', 'ABC'],
  7. 'Lev': ['A-', 'Rk', 'A+', 'A-', 'A', 'A+'],
  8. 'PA': [117, 100, 358, 200, 9, 50]
  9. }
  10. prospect = pd.DataFrame(data)
  11. levels = ['Rk', 'A-', 'A', 'A+', 'AA', 'AAA', 'MLB']
  12. # Convert 'Lev' to categorical type with the specified order
  13. prospect['Lev'] = pd.Categorical(prospect['Lev'], categories=levels, ordered=True)
  14. # Filter and get the first row for each year after sorting in descending order by 'PA'
  15. prospect = prospect.sort_values(['Year', 'PA'], ascending=[True, False]).groupby('Year').first().reset_index()
  16. print(prospect)

字符串
输出量:

  1. Year Tm Lg Lev PA
  2. 0 2019 A NORW A- 117
  3. 1 2021 F ABC A+ 50
  4. 2 2022 C NORW A+ 358

展开查看全部
guykilcj

guykilcj2#

你的例子

  1. import pandas as pd
  2. prospect = pd.read_html('https://www.baseball-reference.com/register/player.fcgi?id=bishop000hun')[0]
  3. levels = ['Rk', 'A-', 'A', 'A+', 'AA', 'AAA', 'MLB']
  4. prospect = prospect[['Year', 'Tm', 'Lg', 'Lev', 'PA']][prospect['Lev'].isin(levels)]

字符串
使用键的sort_values

  1. m = {j: i for i, j in enumerate(levels)}
  2. out = prospect.sort_values('Lev', key=lambda x: x.map(m)).groupby(['Year']).tail(1)


输出:

  1. Year Tm Lg Lev PA
  2. 6 2019 Salem-Keizer NORW A- 117
  3. 10 2021 Eugene HAW A+ 15
  4. 15 2022 Eugene NORW A+ 358

展开查看全部

相关问题