如何使用Scrapy抓取多个报价页面数据

2fjabf4q  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(212)

我已经开发了一个代码来抓取单页数据,但是我不知道如何使用我的抓取代码来抓取多个数据。

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'quote'
    start_urls = ['https://www.goodreads.com/quotes/tag/love?page=2']

    def parse(self, response):
        url=response.url
        text=response.css(".mediumText:nth-child(2) .quoteText::text").get().strip()
        author=response.css(".mediumText:nth-child(2) .authorOrTitle::text").get().strip()
        yield{"text":text,"author":author,"url":url }

屏幕截图:https://prnt.sc/pd3ei5z-9VwZ

kuhbmx9i

kuhbmx9i1#

要在页面上抓取多个项目,您需要遍历每个报价项目的选择器,如下面的示例所示,然后生成每个项目。
我还添加了代码,以转到下一页,并刮那些以及。

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'quote'
    start_urls = ['https://www.goodreads.com/quotes/tag/love']

    def parse(self, response):

        for quote in response.xpath("//div[contains(@class,'quoteDetails')]"):
            yield {
                'url': response.url,
                'text': quote.xpath("normalize-space(./div[@class='quoteText']/text())").get(),
                'author': quote.xpath("normalize-space(.//span[@class='authorOrTitle']/text())").get()
            }

        # go to next page
        next_page = response.xpath("//a[@class='next_page']/@href").get()
        if next_page:
            yield response.follow(next_page)

相关问题