scrapy 你如何让代理集成在scrappy-playwright工作?

ve7v8dk2  于 2023-05-29  发布在  其他
关注(0)|答案(1)|浏览(385)

我试图建立代理scrappy剧作家,但总是得到错误

playwright._impl._api_types.Error: net::ERR_TIMED_OUT at http://whatismyip.com/
=========================== logs ===========================
navigating to "http://whatismyip.com/", waiting until "load"

当执行代码时:

from scrapy import Spider, Request
from scrapy_playwright.page import PageMethod

class ProxySpider(Spider):
    name = "check_proxy_ip"
    custom_settings = {
        "PLAYWRIGHT_LAUNCH_OPTIONS": {
            "proxy": {
                "server": "http://host:port",
                "username": "user",
                "password": "pass",
            },
        },
        "PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT": "300000",
    }

    def start_requests(self):
        yield Request("http://whatismyip.com",
                      meta=dict(
                          playwright=True,
                          playwright_include_page=True,
                          playwright_page_methods=[PageMethod('wait_for_selector', 'span.ipv4-hero')]
                      ),
                      callback=self.parse,
                      )

    def parse(self, response):
        print(response.text)

尝试的代理是付费的,并按检查的方式工作,settings.py中的DOWNLOAD_DELAY设置为DOWNLOAD_DELAY=30。无论PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT设置为0、10000还是300000(如上面代码中所复制的),都会发生这种情况。有什么问题吗?

oyjwcjzk

oyjwcjzk1#

Playwright还支持在上下文创建时间上提供代理,如下所示。

meta=dict(
                      playwright=True,
                      playwright_include_page=True,
                      playwright_page_methods=[PageMethod('wait_for_selector', 'span.ipv4-hero')],
                      playwright_context_kwargs = dict(
                          proxy = dict(
                              server = "http://host:port",
                              username = "user",
                              password = "pass"
                          )
                      )
                  )

如果不是手动创建的,Playwright会创建一个默认上下文,并为每个请求重用它。对于相同的上下文,playwright_context_kwargs选项对于后续请求被忽略,并且对于所有新请求使用相同的代理。如documentation中所述
请注意,如果具有指定名称的上下文已经存在,则使用该上下文并忽略playwright_context_kwargs。
我希望它能解决你的问题

相关问题