使用scrapy shell css选择器的div类选择返回空

z4iuyo4d  于 2022-11-09  发布在  Shell
关注(0)|答案(1)|浏览(124)

我试图从下面的链接废弃T恤价格:https://www.adidas.com/us/search?q=tshirt
从链接中我看到了一行内容

<div class="gl-price-item gl-price-item--sale notranslate">$36</div>

我就是这么做的

>>> fetch('https://www.adidas.com/us/search?q=tshirt')
2022-09-25 23:50:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adidas.com/us/search?q=tshirt> (referer: None)
>>> response.css('div.gl-price-item.gl-price-item--sale.notranslate')
[]

我希望从response.css('div.gl-price-item.gl-price-item--sale.notranslate')中至少返回一个项,因为gl-price-item.gl-price-item--sale.notranslate有一个$36项,但是我得到了一个空数组。为什么会发生这种情况?
我做错了什么?

9rnv2umw

9rnv2umw1#

因为数据是通过API动态加载的,所以你得到的是一个空数组。所以你不能抓取动态内容,因为scrapy不能呈现JS。但是你可以在scrapy的帮助下从API中拉取所有需要的数据。

范例:

import scrapy
class TestSpider(scrapy.Spider):
    name = 'test'
    def start_requests(self):
        headers= {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}

        api_url='https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt'

        yield scrapy.Request(
            url=api_url,
            headers=headers,
            callback= self.parse,
            method="GET")

    def parse(self, response):
        resp=response.json()

        for item in resp['raw']['itemList']['items']:
            yield {
                'price':item['price'],
                'salePrice':item['salePrice']
                }

输出:

{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 45, 'salePrice': 45}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 150, 'salePrice': 60}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 36}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 32, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 10}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 55, 'salePrice': 55}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 18}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 45, 'salePrice': 45}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 32, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 15}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 18}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 110, 'salePrice': 110}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 22, 'salePrice': 22}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 30}

...等等

相关问题