从arraytype()和structtype()创建Map类型

fruv7luv  于 2021-05-17  发布在  Spark
关注(0)|答案(1)|浏览(562)

我有一个json,看起来像这样:

"mapping_field" : {
        "values" : {
            "key1" : {
                "id" : "key1", 
                "field1" : "value1", 
                "field2" : "value2", 
            }, 
            "key2" : {
                "id" : "key2", 
                "field1" : "value3", 
                "field2" : "value4", 
            }
        }, 
        "keys" : [
            "key1", 
            "key2"
        ]
}

我正在尝试将这个结构Map到spark模式。我已经创建了以下内容:;但是它不起作用。我也试过移除 ArrayType 在值字段Map中。

StructType("mapping_field",
    MapType(
        StructField("keys", ArrayType(StringType())),
        StructField("values", ArrayType(StructType([
            StructField("id",StringType()),
            StructField("field1",StringType()),
            StructField("field2",StringType())
        ])))
    )
)

另外,请注意,“key1”和“key2”是动态字段,将使用唯一标识符生成。也可以有两个以上的键。有人能把arraytypeMap到structtype吗?

gab6jxml

gab6jxml1#

提供的json的结构类型:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.{ArrayType, MapType, StructField, StructType, StringType}

val json = """ {
    "mapping_field" : {
            "values" : {
                "key1" : {
                    "id" : "key1",
                    "field1" : "value1",
                    "field2" : "value2"
                },
                "key2" : {
                    "id" : "key2",
                    "field1" : "value3",
                    "field2" : "value4"
                }
            },
            "keys" : [
                "key1",
                "key2"
            ]
    }
  }
  """

val struct = StructType(
  StructField("mapping_field", StructType(
    StructType(
      StructField("values", MapType(StringType, StructType(
        StructField("id", StringType, false) ::
        StructField("field1", StringType, false) ::
        StructField("field2", StringType, false) :: Nil)
      ), false) ::
      StructField("keys", ArrayType(StringType), false) :: Nil)
  ), false) :: Nil)

import spark.implicits._
val df = List(json)
    .toDF("json_col")
    .select(from_json($"json_col", struct))

相关问题