scrapy集群回调请求不工作,卡在处理元传递中间件中

mjqavswn  于 2021-06-04  发布在  Kafka
关注(0)|答案(0)|浏览(265)

kibanai的这个调试面板正在尝试使用scrapy集群,但是在回调请求中它不起作用。这在scrapy中工作正常,但在scrapy集群中不工作。无法报废在处理元传递中间件时卡住的数据

class EbayDataSpider(RedisSpider):
    name = 'ebay_data'

    # Allow a custom parameter (-a flag in the scrapy command)
    def __init__(self, search="iphone 64GB", *args,**kwargs):
        self.search_string = search
        super(EbayDataSpider, self).__init__(*args,**kwargs)

    def parse(self, response):
        # Extrach the trksid to build a search request
        trksid = response.css("input[type='hidden'][name='_trksid']").xpath(
            "@value").extract()[0]

        # Build the url and start the requests
        yield response.follow(url="http://www.ebay.com/sch/i.html?_from=R40&_trksid=" + trksid +
                             "&_nkw=" +
                             self.search_string.replace(
                                 ' ', '+') + "&_sacat=0",
                             callback=self.parse_link)

    # Parse the search results
    def parse_link(self, response):
        # Extract the list of products
        results = response.xpath(
            '//div/div/ul/li[contains(@class, "s-item" )]')

        # Extract info for each product
        for product in results:
            product_url = product.xpath(
                './/a[@class="s-item__link"]/@href').extract_first()
        yield response.follow(url=product_url, callback=self.parse_product_details)

    def parse_product_deails(self, response):
        # capture raw response
        item = RawResponseItem()

        # populated from response.meta
        item['appid'] = response.meta['appid']
        item['crawlid'] = response.meta['crawlid']
        item['attrs'] = response.meta['attrs']
        # populated from raw HTTP response
        item["url"] = response.request.url
        item["response_url"] = response.url
        item["status_code"] = response.status
        item["status_msg"] = "OK"
        item["response_headers"] = self.reconstruct_headers(response)
        item["request_headers"] = response.request.headers
        #item["body"] = response.body
        item["body"] = "This is empty body from amazon spider"
        item["links"] = []

        # Add more data from details page
        item['p_brand'] = response.xpath(
            "//div[@id='viTabs_0_is']//tbody//tr[1]//td[4]//span/text()").extract()
        item['p_title'] = response.xpath("//h1[@id='itemTitle']/text()").extract()
        item['p_price'] = response.xpath("//span[@id='prcIsum']/text()").extract()

        yield item

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题