我试图建立代理scrappy剧作家,但总是得到错误
playwright._impl._api_types.Error: net::ERR_TIMED_OUT at http://whatismyip.com/
=========================== logs ===========================
navigating to "http://whatismyip.com/", waiting until "load"
当执行代码时:
from scrapy import Spider, Request
from scrapy_playwright.page import PageMethod
class ProxySpider(Spider):
name = "check_proxy_ip"
custom_settings = {
"PLAYWRIGHT_LAUNCH_OPTIONS": {
"proxy": {
"server": "http://host:port",
"username": "user",
"password": "pass",
},
},
"PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT": "300000",
}
def start_requests(self):
yield Request("http://whatismyip.com",
meta=dict(
playwright=True,
playwright_include_page=True,
playwright_page_methods=[PageMethod('wait_for_selector', 'span.ipv4-hero')]
),
callback=self.parse,
)
def parse(self, response):
print(response.text)
尝试的代理是付费的,并按检查的方式工作,settings.py
中的DOWNLOAD_DELAY
设置为DOWNLOAD_DELAY=30
。无论PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT
设置为0、10000还是300000(如上面代码中所复制的),都会发生这种情况。有什么问题吗?
1条答案
按热度按时间oyjwcjzk1#
Playwright还支持在上下文创建时间上提供代理,如下所示。
如果不是手动创建的,Playwright会创建一个默认上下文,并为每个请求重用它。对于相同的上下文,
playwright_context_kwargs
选项对于后续请求被忽略,并且对于所有新请求使用相同的代理。如documentation中所述请注意,如果具有指定名称的上下文已经存在,则使用该上下文并忽略playwright_context_kwargs。
我希望它能解决你的问题