使用项目管道从scrapy到sqlite-由于列表中没有值而丢弃整个记录

lzfw57am  于 2021-08-25  发布在  Java
关注(0)|答案(0)|浏览(305)

我试图使用scrapy的项目管道在表中插入spider中的四个项目,除了doc_id项目中有空值(来自原始源,并使用regex清理)之外,一切都很顺利。
我这里的问题是,一旦它找到一个空值,整行将被丢弃,而当我将数据导出为csv文件时,如果有一个列中没有任何值,则完全可以。
下面是我得到的关键错误:

  1. Traceback (most recent call last):
  2. File "/Users/opt/anaconda3/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
  3. current.result = callback(current.result, *args,**kw)
  4. File "/Users/opt/anaconda3/lib/python3.7/site-packages/scrapy/utils/defer.py", line 154, in f
  5. return deferred_from_coro(coro_f(*coro_args,**coro_kwargs))
  6. File "/Users/user/document_scraper/doc/pipelines.py", line 32, in process_item
  7. self.store_db(item)
  8. File "/Users/user/document_scraper/doc/pipelines.py", line 40, in store_db
  9. item['doc_id'][0],
  10. File "/Users/opt/anaconda3/lib/python3.7/site-packages/scrapy/item.py", line 83, in __getitem__
  11. return self._values[key]
  12. KeyError: 'doc_id'

my items.py:

  1. import scrapy
  2. from scrapy.loader import ItemLoader
  3. from itemloaders.processors import TakeFirst, MapCompose, Compose
  4. import re
  5. class DocItem(scrapy.Item):
  6. date = scrapy.Field()
  7. office = scrapy.Field(output_processor = TakeFirst())
  8. doc_body = scrapy.Field()
  9. doc_id = scrapy.Field(input_processor = MapCompose(lambda item: re.findall('C\.\s\d{4}', item)[0].split('C. ')[1]))

my pipelines.py:

  1. import sqlite3
  2. class DocPipeline(object):
  3. def __init__(self):
  4. self.create_connection()
  5. self.create_table()
  6. def create_connection(self):
  7. self.conn = sqlite3.connect("//document.db")
  8. self.curr = self.conn.cursor()
  9. def create_table(self):
  10. self.curr.execute("""DROP TABLE IF EXISTS DOC""")
  11. self.curr.execute("""CREATE TABLE DOC(
  12. DOC_DATE date,
  13. OFFICE text,
  14. BODY text,
  15. DOC_ID number
  16. )""")
  17. def process_item(self, item, spider):
  18. self.store_db(item)
  19. return item
  20. def store_db(self, item):
  21. self.curr.execute("""INSERT INTO DOC VALUES (?,?,?,?)""",(
  22. item['date'][0],
  23. item['office'],
  24. item['doc_body'][0],
  25. item['doc_id'][0],
  26. ))
  27. self.conn.commit()

我如何告诉sqlite或scrapy我希望返回我的项目,而不管其中一列中的值是“无”呢?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题