I have nearly 38190123 rows of data to be pushed to SQL server table. I have tried the following:
df.write.format("jdbc").option("url", jdbc_url).option("dbtable",target_table).option("user", username).option("password", password).option("batchsize", 100).mode("overwrite").save()
I have used reparation also, job is running forever in all the cases. Is there any way to push that dataframe at least slower if possible faster too. Is there any parameter I need to look on/modify on SQL server database side ? I think here bulk insert is not required as 38190123 is smaller number for bulk insert. Can you suggest other than Bulk insert methods because I did not even know what's Bulk insert?
I am performing PySpark code in Azure Synapse.
1条答案
按热度按时间bq3bfh9z1#
Increase batchsize from default 1000, also Use SQL Spark connector Refer