pyspark 将json解析为DynamicFrame列

z9smfwbn  于 2023-10-15  发布在  Spark
关注(0)|答案(2)|浏览(89)

你好,我试图解析json文件在以下结构的DynamicFrame列。我需要每列分开

changedFields|First Name|Last Name| ...... | id
             |          |         |        |  
             |          |         |        |
|-- employees: array
|    |-- element: struct
|    |    |-- changedFields: array
|    |    |    |-- element: choice
|    |    |    |    |-- int
|    |    |    |    |-- string
|    |    |-- fields: struct
|    |    |    |-- First Name: string
|    |    |    |-- Last Name: string
|    |    |    |-- Status: string
|    |    |    |-- Employee #: string
|    |    |    |-- Marital Status: string
|    |    |    |-- Address Line 1: string
|    |    |    |-- Mobile Phone: string
|    |    |    |-- Work Email: string
|    |    |    |-- Hire Date: string
|    |    |    |-- Original Hire Date: string
|    |    |    |-- Effective Date: string
|    |    |    |-- Location: string
|    |    |    |-- Division: string
|    |    |    |-- Department: string
|    |    |    |-- Job Title: string
|    |    |    |-- Reports To: string
|    |    |-- id: string
ogsagwnx

ogsagwnx1#

df=spark.read.format("json").option("inferSchema","true").load("test.json").select(F.explode("employees").alias('employees')).select('employees.*').select("element.*")

df2 = df.select(df.select(F.explode("fields").alias("fields").select("fields.*"), df.select("fields.*"), df.id.alias("id"))
df.show()
5gfr0r5j

5gfr0r5j2#

你可以使用- dynamicFrame.unnest()函数来扁平化嵌套的json。https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-unnest

相关问题