scala—将json字符串列缩减为key/val列

cuxqih21  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(345)

我有一个具有以下结构的Dataframe:

  1. | a | b | c |
  2. -----------------------------------------------------------------------------
  3. |01 |ABC | {"key1":"valueA","key2":"valueC"} |
  4. |02 |ABC | {"key1":"valueA","key2":"valueC"} |
  5. |11 |DEF | {"key1":"valueB","key2":"valueD", "key3":"valueE"} |
  6. |12 |DEF | {"key1":"valueB","key2":"valueD", "key3":"valueE"} |

我想变成这样:

  1. | a | b | key | value |
  2. --------------------------------------------------------
  3. |01 |ABC | key1 | valueA |
  4. |01 |ABC | key2 | valueC |
  5. |02 |ABC | key1 | valueA |
  6. |02 |ABC | key2 | valueC |
  7. |11 |DEF | key1 | valueB |
  8. |11 |DEF | key2 | valueD |
  9. |11 |DEF | key3 | valueE |
  10. |12 |DEF | key1 | valueB |
  11. |12 |DEF | key2 | valueD |
  12. |12 |DEF | key3 | valueE |

以一种有效的方式,因为数据集可能相当大。

lkaoscv7

lkaoscv71#

尝试使用 from_json 那么函数 explode 阵列。 Example: ```
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val df=Seq(("01","ABC","""{"key1":"valueA","key2":"valueC"}""")).toDF("a","b","c")
val Schema = MapType(StringType, StringType)
df.withColumn("d",from_json(col("c"),Schema)).selectExpr("a","b","explode(d)").show(10,false)
//+---+---+----+------+
//|a |b |key |value |
//+---+---+----+------+
//|01 |ABC|key1|valueA|
//|01 |ABC|key2|valueC|
//+---+---+----+------+

相关问题