使用Scrapy抓取反馈网站时未获得所需的输出

9udxz4iz  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(185)

这是我用来抓取网站的代码

import scrapy
class DellLatitudeSpider(scrapy.Spider):
    name = 'dell_latitude'
    allowed_domains = ['www.dell.com/community']

    def start_request(self):
        yield scrapy.Request(url='https://www.dell.com/community/Latitude/bd-p/Latitude?ref=lithium_menu', callback=self.parse, headers='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36')

    def parse(self, response):
        for activity in response.xpath("//table[@class='lia-list-wide']"):
            yield {
                'board': 'Laptops',
                'sub board': 'Latitude',
                'title': activity.xpath("//a[@class='page-link lia-link-navigation lia-custom-event']/text()").get(),
                'url': activity.xpath("//a[@class='page-link lia-link-navigation lia-custom-event']/@href").get()

            }

最初,我得到了一个错误,我通过添加User-Agent纠正了这个错误,但是现在,当我抓取文件时,它显示抓取了0个页面,其他什么都没有。
下面是我获得输出的方法。

2022-07-08 17:38:01 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-07-08 17:38:01 [scrapy.core.engine] INFO: Spider opened
2022-07-08 17:38:01 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-07-08 17:38:01 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-07-08 17:38:01 [scrapy.core.engine] INFO: Closing spider (finished)
2022-07-08 17:38:01 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.005837,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 7, 8, 12, 8, 1, 807968),
 'log_count/DEBUG': 1,
 'log_count/INFO': 10,
 'start_time': datetime.datetime(2022, 7, 8, 12, 8, 1, 802131)}
2022-07-08 17:38:01 [scrapy.core.engine] INFO: Spider closed (finished)
goqiplq2

goqiplq21#

你的headers参数应该是一个将头文件名称Map到头文件值的dict,不是吗?
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36'}

相关问题