嵌套structtype复杂json的pyspark arraytype元素

62lalag4  于 2021-07-14  发布在  Spark
关注(0)|答案(0)|浏览(315)

我正在创建一个pysparkDataframe,从kafka主题消息中读取它,这是一个复杂的json消息-

{
"paymentEntity": {  
"id": 3081458,
"details": {
  "values": [
    {
      "CardType": "VisaDebit"
    },
    {
      "CardNumber": "********8759"
    },
    {
      "WorldPayMasterId": "c670b980c50eb50373f66a1fe2bf8e70d320a0f7"
    }
  ]
}}}

将其读入Dataframe后,其shcema和数据如下所示-

root
 |-- details: struct (nullable = true)
 |    |-- values: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- CardNumber: string (nullable = true)
 |    |    |    |-- CardType: string (nullable = true)
 |    |    |    |-- WorldPayMasterId: string (nullable = true)
 |-- id: long (nullable = true)

+-----------------------------------------------------------------------------------+-------+
|details                                                                            |id     |
+-----------------------------------------------------------------------------------+-------+
|[[[, VisaDebit,], [********8759,,], [,, c670b980c50eb50373f66a1fe2bf8e70d320a0f7]]]|3081458|
+-----------------------------------------------------------------------------------+-------+

如果我用下面的代码转换

jsonDF = jsonDF.withColumn("paymentEntity-details- 
values",explode(col('paymentEntity.details.values'))) \
            .withColumn('id',col('paymentEntity.id')).drop('paymentEntity')

然后输出如下

root
 |-- paymentEntity-details-values: struct (nullable = true)
 |    |-- CardNumber: string (nullable = true)
 |    |-- CardType: string (nullable = true)
 |    |-- WorldPayMasterId: string (nullable = true)
 |-- id: long (nullable = true)

+---------------------------------------------+-------+
|paymentEntity-details-values                 |id     |
+---------------------------------------------+-------+
|[, VisaDebit,]                               |3081458|
|[********8759,,]                             |3081458|
|[,, c670b980c50eb50373f66a1fe2bf8e70d320a0f7]|3081458|
+---------------------------------------------+-------+

我想处理它和转换Dataframe输出如下,而不爆炸数组字段-

+------------+---------+---------------------------------------------------+-------+
|cardnumber  |CardType |WorldPayMasterId                                   |id     |
+------------+---------+---------------------------------------------------+-------+
|********8759|VisaDebit|c670b980c50eb50373f66a1fe2bf8e70d320a0f7           |3081458|
+------------+---------+---------------------------------------------------+-------+

请任何人建议如何得到相同的,任何帮助是感激的。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题