root
|-- eid: string (nullable = true)
|-- keys: array (nullable = true)
| |-- element: string (containsNull = true)
|-- type: string (nullable = true)
|-- values: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: string (containsNull = true)
需要使用spark dataframe将具有上述模式的jsonfile解析为结构化格式。“键”列的列名在“值”列中有值。
示例数据文件:{'type':'logs','eid':'1','keys':['crt_ts','id','upd_ts','km','pivl','distance','speed'],'values':['12343.0000.012','aaga1567','1333.333.333','565656','10.5','121','64']}
预期产量:
eid crt_ts id upd_ts km pivl distance speed type
1 12343.0000.012 AAGA1567 1333.333.333 565656 10.5 121 64 logs
1条答案
按热度按时间n3schb8v1#
请检查下面的代码,我用过
groupBy
,pivot
&agg
: