scrapy 如何使用文本响应，HtmlResponse和XmlResponse？

qf9go6mv 于 2023-01-02 发布在其他

关注(0)|答案(1)|浏览(244)

我看到有几种类型的响应，但是我如何向Scrapy发送信号以返回HtmlResponse呢？
我认为我们的目标是实现def parse(self, response: HtmlResponse):。或者应该以其他方式使用它？有usag的例子吗？
这是Scrapy教程中的例子。我如何在这里使用HtmlResponse而不是默认的？

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = [
            'https://quotes.toscrape.com/page/1/',
            'https://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = f'quotes-{page}.html'
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log(f'Saved file {filename}')

scrapy

来源：https://stackoverflow.com/questions/74950510/how-to-use-textresponse-htmlresponse-and-xmlresponse

1条答案

按热度按时间

yqkkidmi1#

Scrapy尝试识别它得到的响应的类型，并使用特定的类型调用parse。据我所知，从没有使用基类型Response调用parse。响应识别是在'scrapy/ www.example.com中responsetypes.py通过某种方法完成的：mimetype、主体、报头等。
以下是mimetype标识Map：

CLASSES = {
    'text/html': 'scrapy.http.HtmlResponse',
    'application/atom+xml': 'scrapy.http.XmlResponse',
    'application/rdf+xml': 'scrapy.http.XmlResponse',
    'application/rss+xml': 'scrapy.http.XmlResponse',
    'application/xhtml+xml': 'scrapy.http.HtmlResponse',
    'application/vnd.wap.xhtml+xml': 'scrapy.http.HtmlResponse',
    'application/xml': 'scrapy.http.XmlResponse',
    'application/json': 'scrapy.http.TextResponse',
    'application/x-json': 'scrapy.http.TextResponse',
    'application/json-amazonui-streaming': 'scrapy.http.TextResponse',
    'application/javascript': 'scrapy.http.TextResponse',
    'application/x-javascript': 'scrapy.http.TextResponse',
    'text/xml': 'scrapy.http.XmlResponse',
    'text/*': 'scrapy.http.TextResponse',
}

由于parse是通过其中一个子类调用的，开发人员可以直接在response参数中访问它。

def parse(self, response):
       if isinstance(response, HtmlResponse):
           ...

赞(0）回复(0）举报 2023-01-02

我来回答

scrapy 如何使用文本响应，HtmlResponse和XmlResponse？

1条答案

相关问题

热门标签

最新问答