scrapy 报废-属性错误:'dict'对象没有'dont_filter'属性

hs1rzwqc  于 2022-11-09  发布在  其他
关注(0)|答案(2)|浏览(216)

我尝试运行此代码,webdriver打开页面,但不久后,它停止工作,我收到和错误:属性错误:'dict'对象没有'dont_filter'属性。这是我的程式码:

import scrapy
from scrapy import Spider
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from scrapy.selector import Selector
from scrapy.http import Request

class RentalMarketSpider(Spider):
    name = 'rental_market'
    allowed_domains = ['home.co.uk']

    def start_requests(self):
        s=Service('/Users/chrisb/Desktop/Scrape/Home/chromedriver')
        self.driver = webdriver.Chrome(service=s)
        self.driver.get('https://www.home.co.uk/for_rent/ampthill/current_rents?location=ampthill')
        sel = Selector(text=self.driver.page_source)

        tot_prop_rent = sel.xpath('.//div[1]/table/tbody/tr[1]/td[2]/text()').extract_first()
        last_14_days = sel.xpath('.//div[1]/table/tbody/tr[2]/td[2]/text()').extract_first()
        average = sel.xpath('.//div[1]/table/tbody/tr[3]/td[2]/text()').extract_first()
        median = sel.xpath('.//div[1]/table/tbody/tr[4]/td[2]/text()').extract_first()

        one_b_num_prop = sel.xpath('.//div[3]/table/tbody/tr[2]/td[2]/text()').extract_first()
        one_b_average = sel.xpath('.//div[3]/table/tbody/tr[2]/td[3]/text()').extract_first()

        yield {
                'tot_prop_rent': tot_prop_rent,
                'last_14_days': last_14_days,
                'average': average,
                'median': median,
                'one_b_num_prop': one_b_num_prop,
                'one_b_average': one_b_average
            }

下面是我收到的完整错误。我到处找了找,但没有找到一个明确的答案,以摆脱这个错误:

2021-12-23 17:43:26 [twisted] CRITICAL: Unhandled Error
Traceback (most recent call last):
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/commands/crawl.py", line 27, in run
    self.crawler_process.start()
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/crawler.py", line 327, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 1318, in run
    self.mainLoop()
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 1328, in mainLoop
    reactorBaseSelf.runUntilCurrent()
--- <exception caught here> ---
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 994, in runUntilCurrent
    call.func(*call.args,**call.kw)
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/utils/reactor.py", line 50, in __call__
    return self._func(*self._a,**self._kw)
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/core/engine.py", line 137, in _next_request
    self.crawl(request, spider)
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/core/engine.py", line 218, in crawl
    self.schedule(request, spider)
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/core/engine.py", line 223, in schedule
    if not self.slot.scheduler.enqueue_request(request):
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/core/scheduler.py", line 78, in enqueue_request
    if not request.dont_filter and self.df.request_seen(request):
builtins.AttributeError: 'dict' object has no attribute 'dont_filter'

2021-12-23 17:43:26 [scrapy.core.engine] INFO: Closing spider (finished)

任何建议都将不胜感激。谢谢你的时间。

xn1cxnb4

xn1cxnb41#

我看不出你的代码有什么问题。可能你使用的是旧版本的ChromeDriver,返回了一个形状错误的对象。

溶液

确保:

tl; dr

FIND_ELEMENT command return a dict object value

2admgd59

2admgd592#

start_requests应该生成单个请求对象而不是dict
我现在也面临同样的问题。我认为如果我们在另一个职能部门进行这一流程,它会起作用。这个解决方案不是最好的,但总比没有好:

def start_requests(self):
    yield scrapy.Request(url='https://scrapy.org/', callback=self.parse)
def parse(self,response):
    s = Service('/Users/chrisb/Desktop/Scrape/Home/chromedriver')
    self.driver = webdriver.Chrome(service=s)
    self.driver.get('https://www.home.co.uk/for_rent/ampthill/current_rents?location=ampthill')
    sel = Selector(text=self.driver.page_source)

    tot_prop_rent = sel.xpath('.//div[1]/table/tbody/tr[1]/td[2]/text()').extract_first()
    last_14_days = sel.xpath('.//div[1]/table/tbody/tr[2]/td[2]/text()').extract_first()
    average = sel.xpath('.//div[1]/table/tbody/tr[3]/td[2]/text()').extract_first()
    median = sel.xpath('.//div[1]/table/tbody/tr[4]/td[2]/text()').extract_first()

    one_b_num_prop = sel.xpath('.//div[3]/table/tbody/tr[2]/td[2]/text()').extract_first()
    one_b_average = sel.xpath('.//div[3]/table/tbody/tr[2]/td[3]/text()').extract_first()

    yield {
        'tot_prop_rent': tot_prop_rent,
        'last_14_days': last_14_days,
        'average': average,
        'median': median,
        'one_b_num_prop': one_b_num_prop,
        'one_b_average': one_b_average
    }

我刚刚测试过,它起作用了。

相关问题