将scrapy修改为scrapy-redis后,开始url标题已更改

mlmc2os5  于 2022-11-09  发布在  Redis
关注(0)|答案(1)|浏览(176)

我有一个scrapy项目,我想将其修改为scrapy-redis:主要的Scrapy文件如下:

class MySpider(RedisSpider):
    name = 'ScrapyBot'
    redis_key = 'myspider:start_urls'
    start_urls = []

    my_header = {
        "Host": "jd.com",
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0",
    }

def start_requests(self):
    for url in MySpider.start_urls:
        yield scrapy.Request(
                            url=url,
                            headers=MySpider.my_header,
                            callback=self.parse}
                            )

请求在Scrapy中运行良好,但在添加scrapy-redis部分后,启动请求中的头(从Fidder捕获)更改为默认值

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en
User-Agent: Scrapy/1.6.0 (+https://scrapy.org)
Accept-Encoding: gzip,deflate

这导致了服务器返回403错误,我如何修复scrapy-redis中开始url的头?

6ljaweal

6ljaweal1#

您可以在settings.py档案中设定预设信头,方法如下:

DEFAULT_REQUEST_HEADERS = {
    "Host": "jd.com",
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0",
}

相关问题