我有一个嵌套数组,我想在其中获取所有元素,并将它们分别放入一个新列。这是我目前所拥有的。尝试编写2个方法,但都不起作用。我从未注解代码中得到的当前错误是cannot resolve 'split(response.indicator, ',')' due to data type mismatch: argument 1 requires string type, however, 'response.indicator' is of array<struct<_VALUE:string,_number:bigint>> type.;;
File.withColumn("response.indicator", explode(col("response.ind")))
.withColumn("response.indicator", split(col("response.indicator"), ","))
.withColumn("key", col("response.indicator").getItem(1))
.withColumn("value", col("response.indicator").getItem(0))
.groupBy("ID")
.pivot("key")
.agg(first("value"))
.show(true);
/*File.select("response.indicator").collectAsList().forEach(row -> {
String name = String.valueOf(row.getList(0).get(1));
String value = String.valueOf(row.getList(0).get(0));
File.withColumn(name, col(value));
});*/
下面是架构
|-- ID: integer (nullable = true)
|-- response: struct (nullable = true)
| |-- indicator: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- _VALUE: string (nullable = true)
| | | |-- _number: long (nullable = true)
我的数据的外观
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|ID |response
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 |[WrappedArray([N,7], [N,8], [N,9], [N,19], [N,20], [N,22], [N,12], [N,1], [N,2], [N,3], [N,4], [N,5], [N,6], [N,10], [N,11], [N,13], [N,14], [N,15], [N,16], [N,17], [N,18], [N,21], [N,25], [N,26])] |
| 2 |[WrappedArray([Y,1], [N,8], [N,9], [N,19], [N,22], [Y,22], [N,20], [Y,7], [Y,23], [N,3], [Y,4], [N,11], [N,6], [Y,27], [N,5], [N,13], [N,14], [N,15], [Y,16], [N,17], [Y,18], [N,21], [N,25], [N,26])] |
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
我想让它看起来像
+--------------------+-----------------------------+
|ID | 1 | 2 | 3 | etc
+--------------------+-----------------------------+
| 1 | N | N | N | etc
| 2 | Y | NULL | N | etc
+--------------------+-----------------------------+
1条答案
按热度按时间lp0sw83n1#
问题是
split(col("response.indicator"), ",")
需要一个字符串列,而response.indicator
实际上是一个struct
。要“展开”一个名为s
的struct
,可以按如下所示使用s.*
: