Scrapy Splash AttributeError:“HtmlResponse”对象没有属性“data”

q3aa0525  于 2023-05-22  发布在  其他
关注(0)|答案(1)|浏览(201)

我有一个简单的scrapy蜘蛛,必须创建一个屏幕截图。下面是我的代码,我得到了错误:
追溯(最近一次调用):文件“c:\users\xxxxx\appdata\local\programs\python\python37\lib\site-packages\twisted\internet\defer.py”,line 654,in _runCallbacks current.result = callback(current.result,*args,**kw)文件“C:\Users\xxxxxx\scrapy_screenshot\scrapy_screenshot\spiders\extract.py”,line 19,in parse_result imgdata = base64.b64decode(response.data['png'])AttributeError:“HtmlResponse”对象没有属性“data”

import json
import base64
import scrapy
from scrapy_splash import SplashRequest

class ExtractSpider(scrapy.Spider):
    name = 'extract'

    def start_requests(self):
        url = 'https://stackoverflow.com/'
        splash_args = {
            'html': 1,
            'png': 1
        }
        yield SplashRequest(url, self.parse_result, endpoint='render.json', args=splash_args)

    def parse_result(self, response):
        imgdata = base64.b64decode(response.data['png'])
        filename = 'some_image.png'
        with open(filename, 'wb') as f:
            f.write(imgdata)
eoigrqb6

eoigrqb61#

对我来说,问题是settings.py不会影响启用的中间件,因此直接向蜘蛛添加设置会有所帮助。

import json
import base64
import scrapy
from scrapy_splash import SplashRequest

class ExtractSpider(scrapy.Spider):
    name = 'extract'
    custom_settings = {
        "SPLASH_URL": "http://localhost:8050",
        "DOWNLOADER_MIDDLEWARES": {
            "scrapy_splash.SplashCookiesMiddleware": 723,
            "scrapy_splash.SplashMiddleware": 725,
            "scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware": 810,
        },
        # Enable Splash Deduplicate Args Filter
        "SPIDER_MIDDLEWARES": {
            "scrapy_splash.SplashDeduplicateArgsMiddleware": 100,
        },
        # Define the Splash DupeFilter
        "DUPEFILTER_CLASS": "scrapy_splash.SplashAwareDupeFilter",
        "HTTPCACHE_STORAGE": "scrapy_splash.SplashAwareFSCacheStorage",
    }
    def start_requests(self):
        url = 'https://stackoverflow.com/'
        splash_args = {
            'html': 1,
            'png': 1
        }
        yield SplashRequest(url, self.parse_result, endpoint='render.json', args=splash_args)

    def parse_result(self, response):
        imgdata = base64.b64decode(response.data['png'])
        filename = 'some_image.png'
        with open(filename, 'wb') as f:
            f.write(imgdata)

相关问题