scrapy pandas追加数据可在线用于少数行

6ss1mwsb  于 2023-01-17  发布在  其他
关注(0)|答案(1)|浏览(134)

我的脚本在每次迭代时从第2行开始写入excel文件。2但我需要它每次都在最后一行下追加数据。
代码需要从最后一行批量写入新数据
代码如下

import scrapy
from scrapy.crawler import CrawlerProcess
import pandas as pd

class plateScraper(scrapy.Spider):
    name = 'scrapePlate'
    allowed_domains = ['dvlaregistrations.direct.gov.uk']

    def start_requests(self):
        df=pd.read_excel('data.xlsx')
        columnA_values=df['PLATE']
        for row in columnA_values:
            global  plate_num_xlsx
            plate_num_xlsx=row
            base_url =f"https://dvlaregistrations.direct.gov.uk/search/results.html?search={plate_num_xlsx}&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto="
            url=base_url
            yield scrapy.Request(url)

    def parse(self, response):
        itemList=[]
        for row in response.css('div.resultsstrip'):
            plate = row.css('a::text').get()
            price = row.css('p::text').get()
            if plate_num_xlsx==plate.replace(" ","").strip():
                item= {"plate": plate.strip(), "price": price.strip()}
                itemList.append(item)
                yield  item
            else:
                item = {"plate": plate.strip(), "price": "-"}
                itemList.append(item)
                yield item

        with pd.ExcelWriter('output_res.xlsx', mode='a',if_sheet_exists='overlay') as writer:
            df_output = pd.DataFrame(itemList)
            df_output.to_excel(writer, sheet_name='result', index=False, header=True)

process = CrawlerProcess()
process.crawl(plateScraper)
process.start()

它批量写入数据,我的意思是每次重写某种12行,而不是追加和向下。奇怪,不是吗?想听听原因,以及如何修复它,以写入向上到向下的所有数据

taor4pac

taor4pac1#

试试看-

with pd.ExcelWriter('output.xlsx',  mode='a') as writer: 
    df_output=pd.DataFrame(itemList)
    df_output.to_excel(writer, sheet_name='result',index=False,header=True)

相关问题