使用Scrapy运行多个Scrapes并写入单独的csv文件

hgtggwj0  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(143)

我写一个脚本刮我的网站使用scrapy,但我目前运行8个不同的脚本刮我的每个集合,并保存到自己的CSV文件.(集合1保存到集合-1.csv等.)
是否有一种方法可以从一个脚本运行多个蜘蛛,并将抓取的数据保存到每个唯一的文件中?
当前脚本如下图。

import scrapy
from scrapy.crawler import CrawlerProcess
import  csv

cs = open('results/collection-1-results.csv', 'w', newline="", encoding='utf-8')
header_names = ['stk','name','price','url']
csv_writer = csv.DictWriter(cs, fieldnames=header_names)
csv_writer.writeheader()

class XXX(scrapy.Spider):
    name = 'XXX'
    start_urls = [
    'website-url.com'
    ]

    def parse(self,response):
        product_urls  = response.css('div.grid-uniform a.product-grid- item::attr(href)').extract()

        for product_url in product_urls:
            yield 
scrapy.Request(url='website-url.com'+product_url,callback=self.next_parse_two)

        next_url  = response.css('ul.pagination-custom li a[title="Next 
»"]::attr(href)').get()
        if next_url != None:
            yield 
scrapy.Request(url='website-url.com'+next_url,callback=self.parse)

    def next_parse_two(self,response):
        item = dict()
        item['stk'] = response.css('script#swym-snippet::text').get().split('stk:')[1].split(',')[0]
        item['name'] = response.css('h1.h2::text').get()
        item['price'] =response.css('span#productPrice-product-template span.visually-hidden::text').get()
        item['url'] = response.url
        csv_writer.writerow(item)
        cs.flush()

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(XXX)
process.start()
zzwlnbp8

zzwlnbp81#

是的,您可以通过从同一个脚本调用单独的process.crawl()方法注入每个spider类名,这样您就有了一个spider,然后添加更多所需的东西,如下所示:

process.crawl(X)
process.crawl(Xx)
process.crawl(Xxx)
process.start()

相关问题