当一个结果为空时,如何合并两个分解结果?

s5a0g9ez  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(568)

环境:spark2.4.5
我的spark sql:

SELECT A.*
FROM table_0
    LATERAL VIEW explode(table_0.Array_0) exploded_a_values AS A
UNION
SELECT B.*
FROM table_0
    LATERAL VIEW explode(table_0.Array_1) exploded_a_values AS B

分解结构a和b具有相同的模式。其中一个为空时出错:

Can only star expand struct data types. Attribute: `ArrayBuffer)`;

注意数组中的元素是struct类型。我的目的是在不同的数组中挑选不同的元素。
那么我怎么处理这样的空箱呢?如果你能给我一些建议,我将不胜感激。

js5cn81o

js5cn81o1#

当您使用 explode(table_0.Array_0) exploded_a_values AS A ,
在这里 exploded_a_values 变成 tableA 成为表示 exploded column 因此,你不能打电话 A.* 但你当然可以打电话 exploded_a_values.* 因此修改后的查询如下所示-

1. 读取输入

val table_0 =  spark.range(1, 5)
      .withColumn("Array_0", array(lit(1), lit(2)))
      .withColumn("Array_1", array(lit(null).cast(IntegerType)))
    table_0.show(false)
    table_0.printSchema()

输出-

+---+-------+-------+
|id |Array_0|Array_1|
+---+-------+-------+
|1  |[1, 2] |[]     |
|2  |[1, 2] |[]     |
|3  |[1, 2] |[]     |
|4  |[1, 2] |[]     |
+---+-------+-------+

root
 |-- id: long (nullable = false)
 |-- Array_0: array (nullable = false)
 |    |-- element: integer (containsNull = false)
 |-- Array_1: array (nullable = false)
 |    |-- element: integer (containsNull = true)

2. 运行联合查询

table_0.createOrReplaceTempView("table_0")

    val processed = spark.sql(
      """
        |SELECT exploded_a_values.*, table_0.id
        |FROM table_0
        |    LATERAL VIEW explode(table_0.Array_0) exploded_a_values AS A
        |UNION
        |SELECT exploded_b_values.*, table_0.id
        |FROM table_0
        |    LATERAL VIEW explode(table_0.Array_1) exploded_b_values AS B
      """.stripMargin)
    processed.show(false)
    processed.printSchema()

输出-

+----+---+
|A   |id |
+----+---+
|2   |2  |
|2   |4  |
|null|2  |
|null|4  |
|1   |1  |
|2   |1  |
|1   |2  |
|1   |3  |
|2   |3  |
|1   |4  |
|null|1  |
|null|3  |
+----+---+

root
 |-- A: integer (nullable = true)
 |-- id: long (nullable = false)

注意:联合只能在具有兼容列类型的表上执行。

编辑-1(根据评论)

试用 Array<struct> ,同样的查询对我很有效-结果如下:

+------+---+
|A     |id |
+------+---+
|[a, 2]|1  |
|[a, 2]|2  |
|[a, 2]|4  |
|null  |2  |
|null  |4  |
|[a, 2]|3  |
|null  |1  |
|null  |3  |
+------+---+

root
 |-- A: struct (nullable = true)
 |    |-- f1: string (nullable = false)
 |    |-- f2: integer (nullable = false)
 |-- id: long (nullable = false)

有关完整示例,请参阅本要点

相关问题