scrapy 易趣刮刀停止后5页

vwkv1x7d  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(115)

我是一个完全noob使用scrapy的第一次。我已经设置了它,以获得一些信息,但它总是停止后,5页。我希望它刮了很多更多的网页,因为至少有20可用。

import scrapy
from myproject.items import EbaySold

class EbaySpider(scrapy.Spider):
    name = 'EbaySold'
    allowed_domains = ['www.ebay.com']
    start_urls = ['https://www.ebay.com/b/Apple-Unlocked-Smartphones/9355/bn_599372?LH_Sold=1&mag=1&rt=nc&_dmd=1&_pgn=1&_sop=13']

    def parse(self, response):
        products = response.css('li.s-item')

        product_item = EbaySold()
        for product in products:
            product_item['name'] = product.css('h3.s-item__title::text').get()
            if product_item['name'] is None:
                product_item['name'] = product.css('span.BOLD::text').get()
            product_item['sold_price'] = product.css('span.POSITIVE::text').get()
            product_item['date_sold'] = product.css('div.s-item__title-tag::text').get().replace('SOLD ', '')
            yield product_item

        next_page = response.css('a[type=next]').attrib['href']

        if next_page is not None:
            yield response.follow(next_page, callback=self.parse)
holgip5t

holgip5t1#

在您的scrapy项目settings.py文件中。确保您配置了以下设置。

ROBOTSTXT_OBEY = False
COOKIES_ENABLED = True
DEFAULT_REQUEST_HEADERS = {
  'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
  'accept-language': 'en-US,en;q=0.9',
  'accept-encoding': 'gzip, deflate, br',
  'sec-fetch-site': 'same-origin',
  'upgrade-insecure-requests': 1,
  'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
CONCURRENT_REQUESTS = 2 # small number

然后尝试再次运行蜘蛛。

相关问题