How to insert huge data from PySpark dataframe to sql server table

jdg4fx2g  于 2023-11-16  发布在  Spark
关注(0)|答案(1)|浏览(138)

I have nearly 38190123 rows of data to be pushed to SQL server table. I have tried the following:

df.write.format("jdbc").option("url", jdbc_url).option("dbtable",target_table).option("user", username).option("password", password).option("batchsize", 100).mode("overwrite").save()

I have used reparation also, job is running forever in all the cases. Is there any way to push that dataframe at least slower if possible faster too. Is there any parameter I need to look on/modify on SQL server database side ? I think here bulk insert is not required as 38190123 is smaller number for bulk insert. Can you suggest other than Bulk insert methods because I did not even know what's Bulk insert?

I am performing PySpark code in Azure Synapse.

bq3bfh9z

bq3bfh9z1#

Increase batchsize from default 1000, also Use SQL Spark connector Refer

相关问题