scrapy 使用多个解析函数精简一个项目

cnjp1d6j 于 2023-02-08 发布在其他

关注(0)|答案(2)|浏览(108)

我正在使用Scrapy与python刮一个网站，我面临着一些困难，以填补项目，我已经创建。
产品被适当地刮，一切都工作良好，只要信息位于for loop中提到的response.xpath内。
使用ItemLoader将“trend”和“number”正确添加到项目中。
但是，产品日期不在下面引用的response.xpath中，而是在response.css中作为标题：response.css('title')

import scrapy
import datetime
from trends.items import Trend_item
from scrapy.loader import ItemLoader

#Initiate the spider

class trendspiders(scrapy.Spider):
    name = 'milk'
    start_urls = ['https://thewebsiteforthebestmilk/ireland/2022-03-16/7/']

    def parse(self, response):

       for milk_unique in response.xpath('/html/body/main/div/div[2]/div[1]/section[1]/div/div[3]/table/tbody/tr'):
                l = ItemLoader(item=Milk_item(), selector=milk_unique, response=response)
                l.add_css('milk', 'a::text')
                l.add_css('number', 'span.small.text-muted::text')

            return l.load_item()

我怎样才能添加'日期'到我的项目请（在response.css('title')中找到？
我尝试在for循环中添加l.add_css('date', "response.css('title')")，但它返回了一个错误。
我应该创建一个新的解析函数吗？如果是，那么如何将信息发送到同一个项目？
希望我已经说清楚了。
非常感谢你的帮助，

scrapy

来源：https://stackoverflow.com/questions/75293571/scrapy-one-item-with-multiple-parsing-functions

2条答案

按热度按时间

wgx48brx1#

由于date位于您用于每一行的选择器之外，您应该在for循环之前先提取它，因为它不需要在每次迭代时更新。
然后，使用条目加载器，您可以使用l.add_value将其与其余字段一起加载。
例如：

class trendspiders(scrapy.Spider):
    name = 'trends'
    start_urls = ['https://getdaytrends.com/ireland/2022-03-16/7/']

    def parse(self, response):
        date_str = response.xpath("//title/text()").get()
        for trend_unique in response.xpath('/html/body/main/div/div[2]/div[1]/section[1]/div/div[3]/table/tbody/tr'):
            l = ItemLoader(item=Trend_item(), selector=trend_unique, response=response)
            l.add_css('trend', 'a::text')
            l.add_css('number', 'span.small.text-muted::text')
            l.add_value('date', date_str)
            yield l.load_item()

赞(0）回复(0）举报 2023-02-08

kse8i1jr2#

如果response.css('title').get()给出了您需要的答案，为什么不在add_css中使用相同的CSS：