我有一个 dataframe
就像下面一样。空格表示不存在值
+---+---+----+---+---+
| a1| b1| c1| d1| e1|
+---+---+----+---+---+
| 1| a|foo1| 4| 5|
| | b| bar| 4| 6|
| | c| mnc| | 7|
+---+---+----+---+---+
//Schema
root
|-- a1: long (nullable = true)
|-- b1: string (nullable = true)
|-- c1: string (nullable = true)
|-- d1: long (nullable = true)
|-- e1: long (nullable = true)
我想把它转换成一行一行的json格式
result = df.withColumn( "JSON",to_json(struct([when (df[x].isNotNull(),df[x]).otherwise(F.lit(None)).alias(x)for x in df.columns])))
结果就像
{"a1":1,"b1":"a","c1":"foo1","d1":4,"e1":5}
{"b1":"b","c1":"bar","d1":4,"e1":6}
{"b1":"c","c1":"mnc","e1":6}
因此空值列不会添加到json结构中。
一种克服的方法是,如果我像下面那样加上f.lit(“”),用like代替f.lit(none)
result = data.withColumn( "JSON",to_json(struct([when (data[x].isNotNull(),data[x]).otherwise(F.lit("")).alias(x)for x in data.columns])))
但是添加f.lit(“”)会将所有内容转换为字符串。
所以我得到的结果是
{"a1":"1","b1":"a","c1":"foo1","d1":"4","e1":"5"}
{"a1":"",b1":"b","c1":"bar","d1":"4","e1":"6"}
{"a1":"","b1":"c","c1":"mnc","d1":"","e1":"6"}
你能建议一种方法来克服这个问题吗?比如有没有什么方法可以把每一列都转换成原来的类型?
暂无答案!
目前还没有任何答案,快来回答吧!