apachespark使用列值创建结构

ffscu2ro  于 2021-05-16  发布在  Spark
关注(0)|答案(2)|浏览(375)

我正在尝试将我的Dataframe转换为json,以便可以将其推入elasticsearch。下面是我的Dataframe的样子:

Provider   Market   Avg.  Deviation
XM         NY       10    5
TL         AT       8     6
LM         CA       7     8

我想要这样:

Column
XM: {
   NY: {
     Avg: 10,
     Deviation: 5
   }
}

我怎样才能创造出这样的东西?

jckbn6z7

jckbn6z71#

检查下面的代码,您可以根据您的要求修改。

scala> :paste
// Entering paste mode (ctrl-D to finish)

df
.select(
  to_json(
    struct(
      map(
        $"provider",
        map(
          $"market",
          struct($"avg",$"deviation")
          )
        ).as("json_data")
      )
    ).as("data")
)
.select(get_json_object($"data","$.json_data").as("data"))
.show(false)

输出

+--------------------------------------+
|data                                  |
+--------------------------------------+
|{"XM":{"NY":{"avg":10,"deviation":5}}}|
|{"TL":{"AT":{"avg":8,"deviation":6}}} |
|{"LM":{"CA":{"avg":7,"deviation":8}}} |
+--------------------------------------+
cnwbcb6i

cnwbcb6i2#

以防有人想在 pyspark 路(spark 2.0+),

from pyspark import Row
from pyspark.sql.functions import get_json_object, to_json, struct,create_map

row = Row('Provider', 'Market', 'Avg', 'Deviation')
row_df = spark.createDataFrame(
    [row('XM', 'NY', '10', '5'), 
     row('TL', 'AT', '8', '6'),
     row('LM', 'CA', '7', '8')])
row_df.show()

row_df.select(
  to_json(struct(
      create_map(
          row_df.Provider, 
          create_map(row_df.Market, 
                     struct(row_df.Avg, row_df.Deviation)
                     )
      )
  )
  ).alias("json")
).select(get_json_object('json', '$.col1').alias('json')).show(truncate=False)

输出:

+--------+------+---+---------+
|Provider|Market|Avg|Deviation|
+--------+------+---+---------+
|      XM|    NY| 10|        5|
|      TL|    AT|  8|        6|
|      LM|    CA|  7|        8|
+--------+------+---+---------+

+------------------------------------------+
|json                                      |
+------------------------------------------+
|{"XM":{"NY":{"Avg":"10","Deviation":"5"}}}|
|{"TL":{"AT":{"Avg":"8","Deviation":"6"}}} |
|{"LM":{"CA":{"Avg":"7","Deviation":"8"}}} |
+------------------------------------------+

相关问题