scrapy 未找到以下类型对象的适配器:'项目适配器.适配器.项目适配器'

xpszyzbs  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(249)

我想更改从网页下载的图片的名称。我想使用网站提供的标准名称,而不是清除请求的网址。
我有下面的pipeline.py

from itemadapter import ItemAdapter
from scrapy.pipelines.images import ImagesPipeline

class ScrapyExercisesPipeline:
    def process_item(self, item, spider):
        adapter = ItemAdapter(item)
        return adapter

class DownfilesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None, item=None):
        adapter = ScrapyExercisesPipeline().process_item()[0]
        image_name: str = f'{adapter}.jpg'
        return image_name

这会产生以下错误:
raise TypeError(f“未找到以下类型对象的适配器:{类型(项目)}({项目})”)类型错误:未找到以下类型对象的适配器:(==============================================================================================================================================================================
scraper.py:

import scrapy
from scrapy_exercises.items import ScrapyExercisesItem

class TestSpider(scrapy.Spider):
    name = 'test'
    #allowed_domains = ['x']
    start_urls = ['https://www.meadowhall.co.uk/eatdrinkshop?page=1']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url,
                callback=self.parse,
                cb_kwargs = {'pg':0}
            )
    def parse(self, response,pg):
        pg=0
        content_page = response.xpath("//div[@class='view-content']//div")
        for cnt in content_page:
            image_url = cnt.xpath(".//img//@src").get()
            image_name = cnt.xpath(".//img//@alt").get()
            if image_url != None:
                pg+=1
                items = ScrapyExercisesItem()
                if image_name == '':
                    items['name'] = 'unknown'+f'{pg}'
                    items['images'] = [image_url]
                    yield items
                else:
                    items['name'] = image_name
                    items['images'] = [image_url]
                    yield items

settings.py

ITEM_PIPELINES = {
    #'scrapy.pipelines.images.ImagesPipeline': 1,
    'scrapy_exercises.pipelines.ScrapyExercisesPipeline':45,
    'scrapy_exercises.pipelines.DownfilesPipeline': 55
    }
from pathlib import Path
import os
BASE_DIR = Path(__file__).resolve().parent.parent
IMAGES_STORE = os.path.join(BASE_DIR, 'images')
IMAGES_URLS_FIELD = 'images'
IMAGES_RESULT_FIELD = 'results'
gab6jxml

gab6jxml1#

您正在从您的管缐内呼叫管缐,而该管缐也已在您的设定中注册为要当做管缐执行。如果只从您的DownFilesPipeLine中的项目撷取name字段并传回它,会比较简单。
将您的pipelines.py文件更改为:

from itemadapter import ItemAdapter
from scrapy.pipelines.images import ImagesPipeline

class DownfilesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None, item=None):
        return item['name'] + '.jpg'

您还需要在设置中关闭ScrapyExercisesPipeline

相关问题