scrapy 正在获取与关联对象的关系权限

ds97pgxw  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(148)

使用Scrapy抓取网站时,创建以下形式的数据库(如教程结构中的www.example.com所定义models.py):

from sqlalchemy import create_engine, Column, Table, ForeignKey, MetaData
    from sqlalchemy.orm import relationship
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy import (Integer, String, Date, DateTime, Float, Boolean, Text)
    from scrapy.utils.project import get_project_settings

    Base = declarative_base()

    def db_connect():
        return create_engine(get_project_settings().get("CONNECTION_STRING"))

    def create_table(engine):
        Base.metadata.create_all(engine)

    Article_author = Table('article_author', Base.metadata,
      Column('article_id', Integer, ForeignKey('article.article_id'), primary_key=True),
      Column('author_id', Integer, ForeignKey('author.author_id'), primary_key=True),
      Column('author_number', Integer)
    )

    class Article(Base):
      __tablename__ = "article"

      article_id    = Column(Integer, primary_key=True)
      article_title = Column('name', String(50), unique=True)
      authors = relationship('Author', secondary='article_author',lazy='dynamic', backref="article") 

    class Author(Base):
      __tablename__ = "author"

      author_id        = Column(Integer, primary_key=True)
      author_name     = Column('name', String(50), unique=True)
      articles = relationship('Article', secondary='article_author',lazy='dynamic', backref="article")

在将作者编号(例如第一或第二作者)添加到自动创建的关联表“article_author”时出错,因为我不知道如何从www.example.com脚本访问该表pipelines.py。文章和作者表之间存在多对多关系,因为作者可以写多篇文章,文章可以有多个作者。文章表具有唯一的article_id,author表具有唯一的author_id。关联表具有唯一的(article_id,author_id)结构。在pipeline.py脚本中,有一个函数process_item,可以在其中创建文章的示例,之后author表和关联表将相应地更新。问题是如何插入作者编号。
是否存在应在www.example.com中添加的关系models.py?
脚本pipeline.py的内容如下:

from sqlalchemy.orm import sessionmaker
    from scrapy.exceptions import DropItem
    from tutorial.models import Article, Author, Article_author, Article_author, db_connect, create_table

    class SavePipeline(object):

        def __init__(self):
            """
            Initializes database connection and sessionmaker
            Creates tables
            """
            engine = db_connect()
            create_table(engine)
            self.Session = sessionmaker(bind=engine)

        def process_item(self, item, spider):
            session = self.Session()
            article = Article()
            #article_author = Article_author()

            #check whether the current article has authors or not
            if 'author' in item:
                for author,n in zip(item["author"],item["n"]):
                    writer = Author(author=author)
                    # check whether author already exists in the database
                    exist = session.query(Author).filter_by(author = writer.author).first()
                    if exist_title is not None:  
                    # the current author exists
                        writer = exist
                    article.authors.append(writer)
                    nr = article_author(author_number =n)
                    article.article_author.append(nr)
                    #article_author.append(nr)
                    #article.authors.append(pag) 
                    #article_author.author_number = n               

            try:
                session.add(proverb)
                session.commit()

            except:
                session.rollback()
                raise

            finally:
                session.close()

            return item

从终端产生的错误是完整性错误,因为它不能与author_id相关:

sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: article_author.author_id
[SQL: INSERT INTO proverb_source (article_id, author_number) VALUES (?, ?)]
[parameters: (30, 2]

当在process_item中定义Article_author示例并通过

nr = Article_author(author_number =n)
    article_author.append(nr)

则会导致属性错误:

article_author.append(nr)
AttributeError: 'Article_author' object has no attribute 'append'

通过文章的作者成员添加时

article.authors.append(pag)

它会给出一个ValueError:

ValueError: Bidirectional attribute conflict detected: Passing object <Article_author at 0x7f9007276c70> to attribute "Article.authors" triggers a modify event on attribute "Article.article_author" via the backref "Article_author.article".

当直接访问它时,它不会给出错误,但会将该列留空,

article_author.author_number = n
cczfrluj

cczfrluj1#

我通过从关联表定义关系并从该表追加来解决这个问题,请参阅https://docs.sqlalchemy.org/en/14/glossary.html#term-association-relationship

相关问题