每当我改变解析函数时,Scrapy不工作并抛出错误?

nkcskrwz  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(118)
from scrapy import Spider
from selenium import webdriver
from scrapy.selector import Selector
from scrapy.http import Request

from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

class BookSeleniumSpider(Spider):
    name = 'book_selenium'
    allowed_domains = ['books.toscrape.com']

    def start_req(self):
        s = Service('C:\\Users\\aps\\Documents\\chromedriver.exe')
        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_argument('--no-sandbox')
        self.driver = webdriver.Chrome(service=s, chrome_options=chrome_options)

        # Get the site we want to start scraping
        self.driver.get('http://books.toscrape.com')

        sel = Selector(text=self.driver.page_source)
        books = sel.xpath('//h3/a/@href').extract()

        for book in books:
            url = 'http://books.toscrape.com/' + book
            print(url)

    def parse_book(self, response):
        pass

当我把parse函数改为start_req时,它停止工作了。但是当我把它改回parse时,它工作得很好。我不知道为什么。有人能给我解释一下吗

ncecgwcz

ncecgwcz1#

当创建scrapy请求时,它们会被一个回调函数初始化以处理结果。除非用户明确地标识了这个回调函数,否则默认使用的函数是parse方法。因此,当你改变方法的名称时,它会抛出一个错误,因为回调函数不再存在。
如果由于某种原因你确实想改变名字,一种方法是在类中给函数名分配parse属性,这样做几乎不费什么力气。例如:

class BookSeleniumSpider(Spider):
    name = 'book_selenium'
    allowed_domains = ['books.toscrape.com']

    def start_req(self):
        s = Service('C:\\Users\\aps\\Documents\\chromedriver.exe')
        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_argument('--no-sandbox')
        self.driver = webdriver.Chrome(service=s, chrome_options=chrome_options)

        # Get the site we want to start scraping
        self.driver.get('http://books.toscrape.com')

        sel = Selector(text=self.driver.page_source)
        books = sel.xpath('//h3/a/@href').extract()
        for book in books:
            url = 'http://books.toscrape.com/' + book
            print(url)

    def parse_book(self, response):
        pass
        ...

    ...
    ...
    parse = parse_book

相关问题