pyspark AWS粘附作业从一个数据库表向上插入到另一个数据库表

bihw5rsg 于 2022-12-22 发布在 Spark

关注(0)|答案(1)|浏览(137)

我试图创建一个相当基本的胶水工作。
我有两个不同的AWS RDS Mariadb，有两个相似的表（字段名不同）。
我想转换表A中的数据，使其符合表B的模式（这看起来很简单，但正在起作用）。
然后，我想更新所有现有的条目（在一个特定的关键字），并插入所有不存在的。
我使用了基本的转换工作，如果表b为空，插入工作正常（AWS角色/权限/端口正常），转换工作正常。
但是我得到了一个预期的重复键错误，因为它只是试图插入。
我非常不确定最简单的解决方案是什么，以及我在哪里可以读到它。
应该更新表B的键是central_requisition_id（它是表A中的pk，但不是表B中的pk）

schemaapplymapping= ApplyMapping.apply(
frame=some_frame,
mappings=[
    ("supplier_id", "int", "central_parent_supplier_id", "int"),
    ("description", "string", "description", "string"),
    ("id", "int", "central_requisition_id", "int"),
],
transformation_ctx="schemaapplymapping",)

pyspark

来源：https://stackoverflow.com/questions/74842225/aws-glue-job-upsert-from-one-db-table-to-annother-db-table

1条答案

按热度按时间

nmpmafwu1#

我不确定您的具体需求，但您可以通过将写入模式设置为overwrite来解决重复键错误

df.write.format('jdbc').options(url = dest_jdbc_url, 
                                      user = username,
                                      password = password,
                                      dbtable = dest ).mode("overwrite").save()

赞(0）回复(0）举报 2022-12-22

我来回答

pyspark AWS粘附作业从一个数据库表向上插入到另一个数据库表

1条答案

相关问题

热门标签

最新问答