如何在python中从react web应用的script标签中抓取数据？

xuo3flqw 于 2022-12-27 发布在 Python

关注(0)|答案(1)|浏览(164)

我尝试从RateMyProfessor中提取数据，但由于它是一个react应用程序，而且教师信息的所有内容都是动态创建的，这意味着requests.get（）无法获取我尝试解析的数据。但我发现该数据位于脚本标记中，而该标记可以从requests.get中解析。

<script> window.__RELAY_STORE__ = {"legacyId":774048,"avgRating":2.6,"numRatings":12} </script>

在中继存储中有更多的东西，但这正是我试图解析的。还想补充的是，有多个脚本标记。
我目前正在使用Selenium来呈现整个页面，但它确实需要很长时间，所以有没有办法访问这个窗口中继存储，这样我就不需要每次呈现网站？
对于任何好奇的人，这是我写的，以获得窗口中继商店

import requests
page = requests.get("https://www.ratemyprofessors.com/search/teachers?query=Michael&sid=U2Nob29sLTM5OQ==")
print(page.content)

python-3.x

来源：https://stackoverflow.com/questions/74925010/how-to-scrape-data-from-a-script-tag-in-a-react-web-app-in-python

1条答案

按热度按时间

4ngedf3f1#

通过检查页面，你会注意到脚本在正文中。只需提取正文中的脚本，如代码所示。

import requests
from bs4 import BeautifulSoup
import re
page = requests.get("https://www.ratemyprofessors.com/search/teachers?query=Michael&sid=U2Nob29sLTM5OQ==")
soup = BeautifulSoup(page.text, 'html')
#extract the part you want here
script = soup.find("body").find("script")
#here I'm using regex to just pre process the string 
for items in re.findall(r"(\[.*\])", script.string):
    print(items)

输出为您提供：

赞(0）回复(0）举报 2022-12-27

我来回答

如何在python中从react web应用的script标签中抓取数据？

1条答案

相关问题

热门标签

最新问答