scrapy 你如何让代理集成在scrappy-playwright工作？

ve7v8dk2 于 2023-05-29 发布在其他

关注(0)|答案(1)|浏览(385)

我试图建立代理scrappy剧作家，但总是得到错误

playwright._impl._api_types.Error: net::ERR_TIMED_OUT at http://whatismyip.com/
=========================== logs ===========================
navigating to "http://whatismyip.com/", waiting until "load"

当执行代码时：

from scrapy import Spider, Request
from scrapy_playwright.page import PageMethod

class ProxySpider(Spider):
    name = "check_proxy_ip"
    custom_settings = {
        "PLAYWRIGHT_LAUNCH_OPTIONS": {
            "proxy": {
                "server": "http://host:port",
                "username": "user",
                "password": "pass",
            },
        },
        "PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT": "300000",
    }

    def start_requests(self):
        yield Request("http://whatismyip.com",
                      meta=dict(
                          playwright=True,
                          playwright_include_page=True,
                          playwright_page_methods=[PageMethod('wait_for_selector', 'span.ipv4-hero')]
                      ),
                      callback=self.parse,
                      )

    def parse(self, response):
        print(response.text)

尝试的代理是付费的，并按检查的方式工作，settings.py中的DOWNLOAD_DELAY设置为DOWNLOAD_DELAY=30。无论PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT设置为0、10000还是300000（如上面代码中所复制的），都会发生这种情况。有什么问题吗？

scrapy

来源：https://stackoverflow.com/questions/74725399/how-do-you-get-proxy-integration-in-scrapy-playwright-working

1条答案

按热度按时间

oyjwcjzk1#

Playwright还支持在上下文创建时间上提供代理，如下所示。

meta=dict(
                      playwright=True,
                      playwright_include_page=True,
                      playwright_page_methods=[PageMethod('wait_for_selector', 'span.ipv4-hero')],
                      playwright_context_kwargs = dict(
                          proxy = dict(
                              server = "http://host:port",
                              username = "user",
                              password = "pass"
                          )
                      )
                  )

如果不是手动创建的，Playwright会创建一个默认上下文，并为每个请求重用它。对于相同的上下文，playwright_context_kwargs选项对于后续请求被忽略，并且对于所有新请求使用相同的代理。如documentation中所述
请注意，如果具有指定名称的上下文已经存在，则使用该上下文并忽略playwright_context_kwargs。
我希望它能解决你的问题

赞(0）回复(0）举报 2023-05-29

我来回答

scrapy 你如何让代理集成在scrappy-playwright工作？

1条答案

相关问题

热门标签

最新问答