如何在以后加入带有密钥的unnest数组？

r7s23pms 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(277)

我有两张table，即 table1 以及 table2 . table1 很大，但是 table2 它很小。另外，我还有一个自定义项函数，其接口定义如下：

--table1--
id
1
2
3

--table2--
category
a
b
c
d
e
f
g

UDF: foo(id: Int): List[String]

我打算先调用udf来获得相应的类别： foo(table1.id) ，这将返回一个wrappedarray，然后我要加入 category 在 table2 做更多的操作。预期结果如下所示：

--view--

id,category
1,a
1,c
1,d
2,b
2,c
3,e
3,f
3,g

我试着在Hive里找到一个最不合适的方法，但是运气不好，有人能帮我吗？谢谢！

Hive apache-spark apache-spark-sql hiveql

来源：https://stackoverflow.com/questions/43411832/how-to-unnest-array-with-keys-to-join-on-afterwards

1条答案

按热度按时间

kkih6yb81#

我相信你想用 explode 函数或数据集的 flatMap 接线员。 explode 函数为给定数组或Map列中的每个元素创建新行。 flatMap 运算符首先将函数应用于此数据集的所有元素，然后展平结果，从而返回一个新的数据集。
执行自定义项后 foo(id: Int): List[String] 你会得到一个 Dataset 带类型的列 array .

val fooUDF = udf { id: Int => ('a' to ('a'.toInt + id).toChar).map(_.toString) }

// table1 with fooUDF applied
val table1 = spark.range(3).withColumn("foo", fooUDF('id))

scala> table1.show
+---+---------+
| id|      foo|
+---+---------+
|  0|      [a]|
|  1|   [a, b]|
|  2|[a, b, c]|
+---+---------+

scala> table1.printSchema
root
 |-- id: long (nullable = false)
 |-- foo: array (nullable = true)
 |    |-- element: string (containsNull = true)

scala> table1.withColumn("fooExploded", explode($"foo")).show
+---+---------+-----------+
| id|      foo|fooExploded|
+---+---------+-----------+
|  0|      [a]|          a|
|  1|   [a, b]|          a|
|  1|   [a, b]|          b|
|  2|[a, b, c]|          a|
|  2|[a, b, c]|          b|
|  2|[a, b, c]|          c|
+---+---------+-----------+

有了这个， join 应该很容易。

赞(0）回复(0）举报 2021-06-26

我来回答

如何在以后加入带有密钥的unnest数组？

1条答案

相关问题

热门标签

最新问答