union只能在具有兼容列类型的表上执行

1tuwyuhd  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(475)

u“union只能在具有兼容列类型的表上执行。Map<string,int><>structint:int,long:null at 第二个表的第n列。
以下是架构的外观:
数据集1

  1. root
  2. |-- name: string (nullable = true)
  3. |-- count: struct (nullable = true)
  4. | |-- int: integer (nullable = true)
  5. | |-- long: null (nullable = true)

数据集2

  1. root
  2. |-- name: string (nullable = true)
  3. |-- count: map (nullable = true)
  4. | |-- key: string
  5. | |-- value: integer (valueContainsNull = true)

使用以下命令时,无法对df执行并集操作:

  1. data= dataset1_df.union(dataset2_df)

如何解决这个问题?
更新:我想更改架构,例如:
数据集1

  1. root
  2. |-- name: string (nullable = true)
  3. |-- count: long

数据集2

  1. root
  2. |-- name: string (nullable = true)
  3. |-- count: long
ewm0tg9j

ewm0tg9j1#

简单的解决方案是将其中一个Dataframe进行类型转换以匹配另一个Dataframe,如下所示-

  1. val df1 = spark.sql("select 'foo' name, named_struct('int', 1, 'long', null) count")
  2. df1.show(false)
  3. df1.printSchema()
  4. /**
  5. * +----+-----+
  6. * |name|count|
  7. * +----+-----+
  8. * |foo |[1,] |
  9. * +----+-----+
  10. *
  11. * root
  12. * |-- name: string (nullable = false)
  13. * |-- count: struct (nullable = false)
  14. * | |-- int: integer (nullable = false)
  15. * | |-- long: null (nullable = true)
  16. */
  17. val df2 = spark.sql("select 'bar' name, map('2', 3) count")
  18. df2.show(false)
  19. df2.printSchema()
  20. /**
  21. * +----+--------+
  22. * |name|count |
  23. * +----+--------+
  24. * |bar |[2 -> 3]|
  25. * +----+--------+
  26. *
  27. * root
  28. * |-- name: string (nullable = false)
  29. * |-- count: map (nullable = false)
  30. * | |-- key: string
  31. * | |-- value: integer (valueContainsNull = false)
  32. */
  33. df1.withColumn("count",
  34. map($"count.int".cast("string"), $"count.long".cast("integer")))
  35. .union(df2)
  36. .show(false)
  37. /**
  38. * +----+--------+
  39. * |name|count |
  40. * +----+--------+
  41. * |foo |[1 ->] |
  42. * |bar |[2 -> 3]|
  43. * +----+--------+
  44. */
展开查看全部

相关问题