有没有一种方法可以在不使用explode函数的情况下展平struct数组的复杂数据类型数组？

798qvoo8 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(478)

我试图在pyspark中平展出一个复杂的模式。对于explode函数来说，数据太大了（我读到explode函数是一个非常昂贵的函数）。下面是我的模式-

|-- A: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- B: string (nullable = true)
 |    |    |    |-- C: string (nullable = true)

我想把它展平

|-- A: array (nullable = true)
|    |-- B: string (nullable = true)
|    |-- C: string (nullable = true)

我试过了 df.select("A.*") 但我有个例外

: org.apache.spark.sql.AnalysisException: Can only star expand struct data types. Attribute: `ArrayBuffer(A)`;

提前谢谢！

apache-spark pyspark apache-spark-sql pyspark-dataframes

来源：https://stackoverflow.com/questions/63079304/is-there-a-way-i-can-flatten-a-complex-datatypes-array-of-array-of-struct-withou

1条答案

按热度按时间

2hh7jdfx1#

检查以下代码。

scala> df.printSchema
root
 |-- A: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- A: string (nullable = true)
 |    |    |    |-- B: string (nullable = true)

scala> df.withColumn("A",expr("flatten(transform(A,x -> array(x.A,x.b)))")).printSchema
root
 |-- A: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: string (containsNull = true)

scala> df.withColumn("A",flatten($"A")).printSchema
root
 |-- A: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- A: string (nullable = true)
 |    |    |-- B: string (nullable = true)

赞(0）回复(0）举报 2021-05-27

我来回答

有没有一种方法可以在不使用explode函数的情况下展平struct数组的复杂数据类型数组？

1条答案

相关问题

热门标签

最新问答