大家好,我正在Dataframe中阅读scala a json,如下所示:
{
"key": {
"code": "1"
},
"data": {
"array": [
{
"type": "a",
"bool": true
}
]
}
}
然后我需要生成一个dataframe,其中包含以下列:key\u code和data\u array,当data\u array columns有值时(作为字符串):
[
{
"type": "a",
"bool": true
}
]
key\u code列没有问题,但是对于data\u array,我遇到了一个问题。。。我试着这样编码:
dataFrame
.withColumn(
"data_array",
explode(col("data.array"))
)
.withColumn("DATA_ARRAY", col("data_array").cast("String"))
但我得到了[“a”,真的]而不是我提到的预期结果,有人能帮我吗?
谢谢!
1条答案
按热度按时间cuxqih211#
尝试
to_json
函数来自Spark-2.2
```df.show()
//+-------------+---+
//| data|key|
//+-------------+---+
//|[true, a]|[1]|
//+-------------+---+
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
df.withColumn("key_code",col("key.code")).
withColumn("da",explode(col("data.array"))).
withColumn("DATA_ARRAY",to_json(array(col("da")))).
drop("data","key","da").
show(false)
//or using selectExpr
df.selectExpr("key.code","explode(data.array)").
selectExpr("code","to_json(array(col)) as data_array").show(false)
//+--------+--------------------------+
//|key_code|DATA_ARRAY |
//+--------+--------------------------+
//|1 |[{"bool":true,"type":"a"}]|
//+--------+--------------------------+