我正在尝试学习如何抓取数据。我对Python非常陌生,所以对我来说很简单。在搜索YouTube时,我找到了一个教程,并尝试从“www.example.com“上抓取一些数据https://www.pgatour.com/competition/2022/hero-world-challenge/leaderboard.html。
第一次
from bs4 import BeautifulSoup
import requests
SCRAPE = requests.get("https://www.pgatour.com/competition/2022/hero-world-challenge/leaderboard.html")
print(SCRAPE)
#Response [200] = Succesful...
#http response status codes
#Information Responses 100-199
#Successful 200-299
#Redirects 300-399
#Client Errors 400-499
#Server Errors 500-599
soup = BeautifulSoup(SCRAPE.content, 'html.parser')
#tells that the data is html and we need to parse it
table = soup.find_all('div', class_="leaderboard leaderboard-table large" )
#pick the large section that contains all the info you need
#then, pick each smaller section, find the type and class.
for list in table:
name = list.find('div', class_="player-name-col")
position = list.find('td', class_="position")
total = list.find('td', class_="total")
print(name, position, total)
上面是我的代码。。我还包括与检查打开的图片,所以我可以告诉你我在想什么,当我试图找到类型和类内的排行榜。
当我打印时,什么也没有发生。任何帮助都将不胜感激!
1条答案
按热度按时间mrwjdhj31#
数据是由JavaScript动态加载的,bs4不能呈现JS,这就是为什么你的代码什么也不打印,但你可以从
API
中提取所需的数据。范例:
输出: