I am quite new to Databricks and I was trying to write a data frame into the Azure SQL database. But I am getting an error.
org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: com.microsoft.sqlserver.jdbc.spark.
Below is my code:
# Imports
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').config("spark.jars", "mysql-connector-java-6.0.6.jar").getOrCreate()
# Create DataFrame
columns = ["id", "name","age","gender"]
data = [(1, "James",30,"M"), (2, "Ann",40,"F"),
(3, "Jeff",41,"M"),(4, "Jennifer",20,"F")]
sampleDF = spark.sparkContext.parallelize(data).toDF(columns)
and writing this data frame to the Azure SQL database.
sampleDF.write.format("com.microsoft.sqlserver.jdbc.spark") \
.mode("overwrite") \
.option("url", "jdbc:sqlserver://sqlservername.database.windows.net;databaseName=sqldbname;") \
.option("dbtable", "df") \
.option("user", "***") \
.option("password", "***") \
.save()
When I try to write this data frame, I get the above error.
1条答案
按热度按时间xzabzqsa1#
The version that you're trying to use needs the installation of the external library to your cluster - its source code repository lists specific versions that you need to install based on the Spark version used in DBR version that you use. Databricks by default ships with JDBC driver that uses another name - just
sqlserver
( doc ). You should use something like this: