pandas 寻找简单的Python Scraping帮助:使用BeautifulSoup识别区段和类别时遇到困难

g0czyy6m  于 2022-11-27  发布在  Python
关注(0)|答案(1)|浏览(93)

我正在尝试学习如何抓取数据。我对Python非常陌生,所以对我来说很简单。在搜索YouTube时,我找到了一个教程,并尝试从“www.example.com“上抓取一些数据https://www.pgatour.com/competition/2022/hero-world-challenge/leaderboard.html。
第一次

from bs4 import BeautifulSoup
import requests

SCRAPE = requests.get("https://www.pgatour.com/competition/2022/hero-world-challenge/leaderboard.html")

print(SCRAPE)

#Response [200] = Succesful...

#http response status codes
    #Information Responses 100-199
    #Successful 200-299
    #Redirects 300-399
    #Client Errors 400-499
    #Server Errors 500-599

soup = BeautifulSoup(SCRAPE.content, 'html.parser')

#tells that the data is html and we need to parse it 

table = soup.find_all('div', class_="leaderboard leaderboard-table large" )

#pick the large section that contains all the info you need
    #then, pick each smaller section, find the type and class.

for list in table:
    name = list.find('div', class_="player-name-col")
    position = list.find('td', class_="position")
    total = list.find('td', class_="total")
    
    print(name, position, total)

上面是我的代码。。我还包括与检查打开的图片,所以我可以告诉你我在想什么,当我试图找到类型和类内的排行榜。
当我打印时,什么也没有发生。任何帮助都将不胜感激!

mrwjdhj3

mrwjdhj31#

数据是由JavaScript动态加载的,bs4不能呈现JS,这就是为什么你的代码什么也不打印,但你可以从API中提取所需的数据。

范例:

import pandas as pd
import requests

api_url= 'https://lbdata.pgatour.com/2022/r/478/leaderboard.json?userTrackingId=eyJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2Njg5OTEzNTcsIm5iZiI6MTY2ODk5MTM1NywiZXhwIjoxNjY4OTkzMTU3fQ.eTvZpdJgVp5yzSQz4J8n8ovzaBnKPmLhZm6gfitKJeU'
headers={
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
    }
data=[]

res=requests.get(api_url,headers=headers)
#print(res)
for item in res.json()['rows']:
    startRanks = item['total']
    data.append({'total':startRanks})

df= pd.DataFrame(data)
print(df)

输出:

total
0         -18
1         -17
2         -15
3         -15
4         -14
5         -14
6         -13
7         -13
8         -11
9         -11
10        -11
11        -10
12        -10
13         -8
14         -8
15         -8
16         -7
17         -6
18         +1
19         +6

相关问题