如何在配置单元表中插入具有map列的Dataframe

qvtsj1bj 于 2021-05-27 发布在 Hadoop

关注(0)|答案(2)|浏览(433)

我有一个多列的Dataframe，其中一列是map（string，string）类型。我可以打印这个数据框，其中列作为map，数据作为map（“pun”->“pune”）。我想将这个Dataframe写入hive表（存储为avro），该表的列类型与map相同。

Df.withcolumn("cname", lit("Pune"))
withcolumn("city_code_name", map(lit("PUN"), col("cname"))
Df.show(false)

//table - created external hive table..stored as avro..with avro schema

删除这个Map类型列后，我可以将Dataframe保存到hiveavro表中。
保存到配置单元表的方式：
spark.save-保存avro文件
spark.sql-使用avro文件位置在配置单元表上创建分区

hadoop Hive apache-spark apache-spark-sql complextype

来源：https://stackoverflow.com/questions/60438988/how-to-insert-dataframe-having-map-column-in-hive-table

2条答案

按热度按时间

wqlqzqxt1#

您可以通过saveastable示例实现这一点：

Df\
        .write\
        .saveAsTable(name='tableName',
                     format='com.databricks.spark.avro',
                     mode='append',
                     path='avroFileLocation')

将mode选项更改为任何适合您的选项

赞(0）回复(0）举报 2021-05-27

kognpnkq2#

将此测试用例作为spark测试的示例

test("Insert MapType.valueContainsNull == false") {
    val schema = StructType(Seq(
      StructField("m", MapType(StringType, StringType, valueContainsNull = false))))
    val rowRDD = spark.sparkContext.parallelize(
      (1 to 100).map(i => Row(Map(s"key$i" -> s"value$i"))))
    val df = spark.createDataFrame(rowRDD, schema)
    df.createOrReplaceTempView("tableWithMapValue")
    sql("CREATE TABLE hiveTableWithMapValue(m Map <STRING, STRING>)")
    sql("INSERT OVERWRITE TABLE hiveTableWithMapValue SELECT m FROM tableWithMapValue")

    checkAnswer(
      sql("SELECT * FROM hiveTableWithMapValue"),
      rowRDD.collect().toSeq)

    sql("DROP TABLE hiveTableWithMapValue")
  }

另外，如果您想要save选项，那么您可以尝试使用saveastable，如下所示

Seq(9 -> "x").toDF("i", "j")
        .write.format("hive").mode(SaveMode.Overwrite).option("fileFormat", "avro").saveAsTable("t")

yourdataframewithmapcolumn.write.partitionby是创建分区的方法。

赞(0）回复(0）举报 2021-05-27

我来回答

如何在配置单元表中插入具有map列的Dataframe

2条答案

相关问题

热门标签

最新问答