scala—在spark中读取一行json,其中列键是可变的

zlhcx6iw  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(357)

我有一个单行json文件,如下所示

{"Hotel Dream":{"Guests":20,"Address":"14 Naik Street","City":"Manila"},"Serenity Stay":{"Guests":35,"Address":"10 St Marie Road","City":"Manila"}....}

如果我使用以下内容来读取json to spark上下文,它会导致

val hotelDF = sqlContext.read.json("file").printSchema

root
 |-- Hotel Dream: struct (nullable = true)
 |    |-- Address: string (nullable = true)
 |    |-- City: string (nullable = true)
 |    |-- Guests: long (nullable = true)
 |-- Serenity Stay: struct (nullable = true)
 |    |-- Address: string (nullable = true)
 |    |-- City: string (nullable = true)
 |    |-- Guests: long (nullable = true)

我想转换不同的列(hoteldream、serenity stay等),以便Dataframe最终以正则模式结束

Hotel: string (nullable = true)
Guests: string (nullable = true)
Address: string (nullable = true)
City: string (nullable = true)

还尝试将json作为textfile或wholetextfiles输入。但是由于没有换行符,我不能用map函数Map内容。
关于如何读取这样的数据格式有什么输入吗?

iugsix8n

iugsix8n1#

以下是我从你的问题中了解到的你的解决方案(尽管这不是一个完美的解决方案)

var newDataFrame = Seq(("test", "test", "test", "test")).toDF("Hotel", "Address", "City", "Guests")
for(name <- hotelDF.schema.fieldNames) {
  val tempdf = hotelDF.withColumn("Hotel", lit(name))
    .withColumn("Address", hotelDF(name + ".Address"))
    .withColumn("City", hotelDF(name + ".City"))
    .withColumn("Guests", hotelDF(name + ".Guests"))
  val tdf = tempdf.select("Hotel", "Address", "City", "Guests")
  newDataFrame = newDataFrame.union(tdf)
}
newDataFrame.filter(!(col("Hotel") === "test")).show

相关问题