从hdfs读取并写入MySQL

gtlvzcf8  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(177)

我是大数据开发的新手。我有一个从hdfs读取数据的用例,通过spark处理并保存到MySQL数据库。保存到MySQL数据库的原因是报告工具指向MySQL。所以我想出了下面的流程来实现它。有人能验证并建议需要任何优化/更改吗?

val df = spark.read.format("csv")
    .option("header", "true")
    .option("inferSchema","true")
    .option("nullValue","NA")
    .option("mode","failfast")
    .load("hdfs://localhost:9000/user/testuser/samples.csv")  

val resultsdf = df.select("Sample","p16","Age","Race").filter($"Anatomy".like("BOT"))  

val prop=new java.util.Properties
prop.setProperty("driver", "com.mysql.cj.jdbc.Driver")  
prop.setProperty("user", "root")  
prop.setProperty("password", "pw")  
val url = "jdbc:mysql://localhost:3306/meta" 
 df.write.mode(SaveMode.Append).jdbc(url,"sample_metrics",prop)
a0zr77ik

a0zr77ik1#

Change is required in this line val resultdf= ... , you are using column Anatomy for filtering but you didn't select that column is select clause. Add that column otherwise you will end up with error- Analysis Exception unable to resolve column Anatomy.

val resultsdf = df.select("Sample","p16","Age","Race", "Anatomy").filter($"Anatomy".like("BOT"))

Optimizations: You can use addtional properties like numPartitions and batchsize . You can read about these properties here .

相关问题