pandas 我如何使用for循环来创建自动NBA统计网页抓取代码多个赛季？

oxosxuxt 于 2023-06-20 发布在其他

关注(0)|答案(1)|浏览(98)

早上好！现在我正在制作NBA统计数据框架来分析球员的赛季统计数据。现在，我成功地编写了代码，可以自动按季节生成 Dataframe 。然而，我正在解决的问题是，我可以将一季的 Dataframe 保存为单个xlsx文件（例如：data_2001.xlsx，data_2002.xlsx，..... data_2022.xlsx），但我想使用for循环将其转换为包含许多季节的**“one”**xlsx文件。（例如：data_2001~2022.xlsx）

years = [2001,2002]

for y in years:
    # Season
    url = 'https://www.basketball-reference.com/leagues/NBA_{}_advanced.html'.format(y)
    html = urlopen(url)
    soup = BeautifulSoup(html)

    soup.findAll('tr',limit=2)
    # use getText()to extract the text we need into a list
    headers = [th.getText() for th in soup.findAll('tr',limit=2)[0].findAll('th')]
    # exclude the first column as we will not need the ranking order from Basketball Reference for the analysis
    headers = headers[1:]
    
    rows = soup.findAll('tr')[1:]
    player_stats = [[td.getText() for td in rows[i].findAll('td')] for i in range (len(rows))]
    stats = pd.DataFrame(player_stats, columns = headers)

    
    stats['Year'] = y             #Added column ['Year'] to recognize season in the dataframe  
    stats['Season_type'] = 'RS'.  #Added column ['Season_type'] to recognize season type in the dataframe  
    stats = stats.apply(pd.to_numeric,errors='ignore') # changed object data types to float to maniupulate data
    stats = stats[stats['G']>=57]                #Only players who played more than 70% of games
    stats = stats.drop(stats.columns[[18,23]],axis=1) # drop NAN columns

  
    print(f'Finished scraping data for the {y}.')
    lag = np.random.uniform(low=5,high=10)
    print(f'...waiting {round(lag,1)} seconds')
    time.sleep(lag)
    
    y_str = str(y)
    stats.to_excel('data_'+y_str+'.xlsx', index=False). # save as xlsx file

这段代码的输出为data2001.xlsx和data2002.xlsx

data_2001 = pd.read_excel('data2001.xlsx')
data_2002 = pd.read_excel('data2002.xlsx')

For data_2001 For data_2002
但我想做的是一个包含很多季节的xlsx文件。我应该给我的for循环做什么修改？谢谢你帮我！！

pandas

来源：https://stackoverflow.com/questions/76451145/how-can-i-use-for-loop-to-create-automated-nba-stat-web-crawling-code-for-many-s

1条答案

按热度按时间

yzuktlbb1#

你可以简单地替换最后一行，在那里你用一个append保存每个dataframe，所以每个循环都会将每个包含季节的dataframe（在你的循环中）附加到一个列表中，然后传递给pd.concat()。在循环之前，你可以这样做：

season_df = []
for y in years:
   #no need to change anything here until
   #...
   print(f'Finished scraping data for the {y}.')
   lag = np.random.uniform(low=5,high=10)
   print(f'...waiting {round(lag,1)} seconds')
   time.sleep(lag)
   season_df.append(stats)
#Outside the for loop
all_seasons_df = pd.concat(season_df)

赞(0）回复(0）举报 2023-06-20

我来回答

pandas 我如何使用for循环来创建自动NBA统计网页抓取代码多个赛季？

1条答案

相关问题

热门标签

最新问答