使用jquery搜索进行Scrapy爬行

mcvgt66p  于 2022-11-09  发布在  jQuery
关注(0)|答案(1)|浏览(165)

我试图从网站https://www.moparpartsgiant.com/使用产品编号提取一些信息。列出其中的一些['5175788 AA','82214506 AB','UN 051 D1 AA']。搜索响应是.html文件与此产品,他们有混乱的url,所以我不能运行蜘蛛与改变URL的一部分。enter image description here我试图运行爬行通过scrapy。FormRequest:

def parse(self, response,**kwargs):
    yield scrapy.FormRequest.from_response(response,
                                           formdata={'input_name': '5175788AA'},
                                           callback=self.parse_product
                                           )

但是我不能设置input_name,因为forminput没有标记参数nameenter image description here
我如何使用scrapy.FormRequest运行搜索?或者我如何模拟执行搜索的request
提前感谢您的回答!

to94eoyn

to94eoyn1#

使用上述关键字生成的url沿着信息不是动态的,但是点击搜索选项是动态的。如果你手动输入关键字搜索,那么你会得到用于搜索的url。所以你可以提供所有的url,并且可以抓取所需的数据。同样的事情你可以使用selenium/playwright自动完成,这是耗时和更复杂的,但可能不是一个容易的任务与scrapy

工作代码及示例:

import scrapy

class TestSpider(scrapy.Spider):
    name = 'test'

    urls = [
        'https://www.moparpartsgiant.com/parts/mopar-2-way~5175788aa.html',
        'https://www.moparpartsgiant.com/parts/mopar-accessories_bag_kit_storage-82214506ab.html',
        'https://www.moparpartsgiant.com/parts/mopar-a-c-duct-left~un051d1aa.html'
    ]

    def start_requests(self): 
        for url in self.urls:
            yield scrapy.Request(url=url, callback=self.parse_details)

    def parse_details(self, response):

        data = []

        d = {x.css('span::text').get() : x.css('span+div::text').get() for x in response.css('ul.pn-detail-list li')}
        yield d

if __name__ == "__main__":
    process =CrawlerProcess()
    process.crawl(TestSpider)
    process.start()

输出:

2022-10-06 03:49:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.moparpartsgiant.com/parts/mopar-accessories_bag_kit_storage-82214506ab.html>
{}
2022-10-06 03:49:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.moparpartsgiant.com/parts/mopar-a-c-duct-left~un051d1aa.html> (referer: None)
2022-10-06 03:49:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.moparpartsgiant.com/parts/mopar-2-way~5175788aa.html>
{'Part Description': 'Wiring Kit 2 Way Female', 'Replaced By': '5175788AB', 'Manufacturer': 'Mopar'}
2022-10-06 03:49:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.moparpartsgiant.com/parts/mopar-a-c-duct-left~un051d1aa.html>
{'Part Description': 'Vent A/C Duct Left', 'Position': 'Left', 'Replaced By': '5179768AA', 'Manufacturer': 'Mopar'}

相关问题