SQL Server Write dataframe to Azure SQL database from Databricks notebook

zd287kbt  于 2023-06-04  发布在  其他
关注(0)|答案(1)|浏览(3799)

I am quite new to Databricks and I was trying to write a data frame into the Azure SQL database. But I am getting an error.

org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: com.microsoft.sqlserver.jdbc.spark.

Below is my code:

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').config("spark.jars", "mysql-connector-java-6.0.6.jar").getOrCreate()

# Create DataFrame 
columns = ["id", "name","age","gender"]
data = [(1, "James",30,"M"), (2, "Ann",40,"F"),
    (3, "Jeff",41,"M"),(4, "Jennifer",20,"F")]

sampleDF = spark.sparkContext.parallelize(data).toDF(columns)

and writing this data frame to the Azure SQL database.

sampleDF.write.format("com.microsoft.sqlserver.jdbc.spark") \
  .mode("overwrite") \
  .option("url", "jdbc:sqlserver://sqlservername.database.windows.net;databaseName=sqldbname;") \
  .option("dbtable", "df") \
  .option("user", "***") \
  .option("password", "***") \
  .save()

When I try to write this data frame, I get the above error.

xzabzqsa

xzabzqsa1#

The version that you're trying to use needs the installation of the external library to your cluster - its source code repository lists specific versions that you need to install based on the Spark version used in DBR version that you use. Databricks by default ships with JDBC driver that uses another name - just sqlserver ( doc ). You should use something like this:

sampleDF.write.format("sqlserver") \
  .mode("overwrite") \
  .option("url", "jdbc:sqlserver://sqlservername.database.windows.net;databaseName=sqldbname;") \
  .option("dbtable", "df") \
  .option("user", "***") \
  .option("password", "***") \
  .save()

相关问题