在Python中提高Selenium Web抓取的速度？

hl0ma9xz 于 2023-08-02 发布在 Python

关注(0)|答案(1)|浏览(98)

我正在使用一个非常简单的脚本从一个公共论坛中抓取信息。目前每个URL需要大约2分钟的时间来抓取，并且有20，000个URL。
有没有办法加快这个过程？

from bs4 import BeautifulSoup
from selenium import webdriver

urls = ['url1', 'url2', ...]
for url in urls:
    page = webdriver.Chrome()
    page.get(url)
    
    soup = BeautifulSoup(page.page_source,"lxml")
    messages = soup.findAll("div", class_="bbWrapper")
        
    for message in messages:
        print(message.text)
    
    page.quit()

字符串

我使用Selenium来避免以下错误：第一个月
我尝试运行Chrome headless，但被Cloudflare阻止
我读到过Selenium Stealth可以避免Cloudflare块，但我不知道如何在Anaconda-Python环境中安装Selenium Stealth

Access Denied page with headless Chrome on Linux while headed Chrome works on windows using Selenium through Python、How to automate login to a site which is detecting my attempts to login using selenium-stealth或Can a website detect when you are using Selenium with chromedriver?都没有回答这个问题，因为它们都不是关于提高性能的。

python

来源：https://stackoverflow.com/questions/76694226/improve-the-speed-of-selenium-web-scraping-in-python

1条答案

按热度按时间

oalqel3c1#

以下是一些增强代码的建议：
1.避免为每个URL示例化Chrome。将page = webdriver.Chrome()和page.quit()移到循环之外，以便有效地重用浏览器示例。
1.将该过程分为两个步骤。首先，检索并保存每个URL的HTML内容。然后，单独执行解析。
1.通过浏览 threading 模块，考虑实现多线程。它可以帮助优化多个任务的并发执行。

赞(0）回复(0）举报 2023-08-02

我来回答

在Python中提高Selenium Web抓取的速度？

1条答案

相关问题

热门标签

最新问答