Scrapy正在向我的自定义文件管道发送类型为“无”的响应

atmip9wb 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(151)

使用Scrapy，我想下载并保存不同文件名的文件。
首先，如果我启用默认的文件管道。文件（可能是html/pdf）下载得非常好。
对于重命名，我写了下面的类& overriden file_path方法。

class MyCustomFilePipeline(FilesPipeline):
    def file_path(self, request, response=None, info=None, *, item=None):        
        # extract from 191148 http://mywebsite.com/filedownload.asp?pn=191148&yr=2022
        pn = re.search(r'(?<=pn\=)\d+', request.url).group()
        print(f'{request.url} - {pn}')
        print(type(response)) # <-- this prints as <class 'NoneType'>

        response_contentype = response.headers['Content-Type'].decode('ASCII')
        ext = 'html'
        if response_contentype  == 'text/html':
            ext = 'html'
        elif response_contentype == 'application/pdf':
            ext = 'pdf'
        print(f'{pn}.{ext}') # <-- this is not printed  
        return f'{pn}.{ext}'

我在settings.py中启用了它。在控制台中，对于每个请求URL，我将获得两个print语句的输出（在上面的代码中，用于调试）。
但是，response是<class 'NoneType'>。
令人惊讶的是，没有打印print(f'{pn}.{ext}')。
未开始下载任何文件。未填充files
为什么没有人提出要求和得到回应？我错过了什么？

scrapy

来源：https://stackoverflow.com/questions/73383199/scrapy-is-sending-response-of-type-none-to-my-custom-file-pipeline

1条答案

按热度按时间

5t7ly7z51#

1.您没有看到print(f'{pn}.{ext}')是因为响应为None，response.headers是一个错误，它正在停止其余代码。
1.为什么响应为“无”：因为如果您检查FilesPipelines的代码，函数filepath将执行两次，一次在media_to_download中执行，没有响应（这就是您得到None的原因），另一次在media_downloaded中执行，有响应。
一种使代码工作的方法是用if response Package 代码，尽管我不知道这是否是最好的解决方案

class MyCustomFilePipeline(FilesPipeline):
    def file_path(self, request, response=None, info=None, *, item=None):
        if response:
            # extract from 191148 http://mywebsite.com/filedownload.asp?pn=191148&yr=2022
            pn = re.search(r"(?<=pn\=)\d+", request.url).group()
            print(f"{request.url} - {pn}")
            print(type(response))  # <-- this prints as <class 'NoneType'>
            response_contentype = response.headers["Content-Type"].decode("ASCII")
            ext = "html"
            if response_contentype == "text/html":
                ext = "html"
            elif response_contentype == "application/pdf":
                ext = "pdf"
            print(f"{pn}.{ext}")  # <-- this works now
            return f"{pn}.{ext}"

赞(0）回复(0）举报 2022-11-09

我来回答

Scrapy正在向我的自定义文件管道发送类型为“无”的响应

1条答案

相关问题

热门标签

最新问答