如何执行从pyspark框架到azure sql数据库表的upsert(插入+更新)？

xqkwcwgp 于 2024-01-06 发布在 Spark

关注(0)|答案(1)|浏览(199)

我正在尝试做一个从pyspark数组到sql表的upsert。
sparkdf是我的pyspark框架。Test是我在azure sql数据库中的sql表。
到目前为止，我有以下内容：

def write_to_sqldatabase(final_table, target_table):
    #Write table data into a spark dataframe
    final_table.write.format("jdbc") \
        .option("url", f"jdbc:sqlserver://{SERVER};databaseName={DATABASE}") \
        .option("dbtable", f'....{target_table}') \
        .option("user", USERNAME) \
        .option("password", PASSWORD) \
        .mode("append") \
        .save()

字符串
和

spark.sql("""
merge target t
using source s
on s.Id = t.Id
when matched then 
update set *
when not matched then insert *
""")

型
和

jdbc_url = f"jdbc:sqlserver://{SERVER};database={DATABASE};user={USERNAME};password={PASSWORD}"
sparkdf.createOrReplaceTempView('source')
df = spark.read \
    .format("jdbc") \
    .option("url", jdbc_url) \
    .option("dbtable", "(merge into target t using source s on s.Id = t.Id when matched then  update set * when not matched then insert *) AS subquery") \
    .load()