Spark/Scala将[数组的Map]转换为[Map的Map]

33qvvth1  于 2022-11-16  发布在  Apache
关注(0)|答案(1)|浏览(250)

我希望更改数据在数据框的某个列中的存储方式。列“content-value”当前具有以下类型:

|-- content-value: map (nullable = true)
 |    |-- key: integer
 |    |-- value: array (valueContainsNull = true)
 |    |    |-- element: string (containsNull = true)

数据当前是这样存储的:

{4 -> [5191, 57, -46, POS2], 5 -> [5413, 56, 48, POS2], 2 -> [5421, -59, 47, POS2], 1 -> [5237, -59, -47, POS2], 3 -> [5153, -10, 42, POS1]}

我想将其更改为如下所示MapMap:

{4 -> {value -> 5191, x -> 57, y -> -46, pos -> POS2}, 5 -> {value -> 5413, x -> 56, y -> 48, pos -> POS2}, 2 -> {value -> 5421, x -> -59, y -> 47, pos -> POS2}, 1 -> {value -> 5237, x -> -59, y -> -47, pos -> POS2}, 3 -> {value -> 5153, x -> -10, y -> 42, pos -> POS1}}

我尝试过使用键["value", "x", "y", "pos"]创建一个新列,并使用map_from_array,但没有成功。
很乐意帮忙!

nlejzf6q

nlejzf6q1#

使用数据集:

import spark.implicits._

case class Value(value: String, x: String, y: String, pos: String)

val ds = spark.createDataset[Map[Int, Array[String]]](Seq(Map(4 -> Array("5191", "57", "-46", "POS2"))))

val dsFinal = 
  ds.map(el => el.flatMap {
     case (key, value) => Map(key -> Value(value(0), value(1), value(2), value(3)))})

它给出:

+----------------------------+
|value                       |
+----------------------------+
|{4 -> {5191, 57, -46, POS2}}|
+----------------------------+

相关问题