pyspark 有没有办法指定Spark SQL中inline_outer函数生成的列名？

ghg1uchk 于 2023-02-15 发布在 Spark

关注(0)|答案(2)|浏览(163)

我有一个名为order的表，如下所示：
| 身份证|战役|
| - ------|- ------|
| 第二章|[{"编号"：" 1 "，"职务"："试验"、"类型"："一个"}，{" id"："2"、"职称"："测试2"，"类型"："两个"}]|
| 五个|[{"编号"：" 3 "，"职务"："测试3"，"类型"："三"}]|
我的期望：
| 身份证|活动ID|标题|类型|
| - ------|- ------|- ------|- ------|
| 第二章|1个|测验|一|
| 第二章|第二章|测试2|二|
| 五个|三个|测试3|三|
我的代码：

SELECT orderId AS id, id AS campaignid, title, type
FROM (
    SELECT id AS orderId, inline_outer(from_json(campaigns, 'ARRAY<STRUCT<id: STRING, title: STRING, type: STRING>>'))
    FROM order
);

我必须在subQuery中将id字段重命名为orderId，因为campaigns字段包含一个id键。

- 问：有没有办法指定Spark SQL中inline_outer函数生成的列名？**

我尝试了：
x一个一个一个一个x一个一个二个x
但是，上述两种方法并不符合Spark SQL的语法。
先谢谢你。

pyspark

来源：https://stackoverflow.com/questions/75434236/is-there-a-way-to-specify-the-column-name-generated-by-the-inline-outer-function

2条答案

按热度按时间

kb5ga3dv1#

您需要castfrom_json输出并更改列名：

SELECT id, 
 inline_outer(cast(from_json(campaigns, 'ARRAY<STRUCT<Id: STRING, title: STRING, type: STRING>>')
        as ARRAY<STRUCT<campaignId: STRING, title: STRING, type: STRING>>) ) 
    FROM order

赞(0）回复(0）举报 2023-02-15

uyto3xhc2#

以下是使用完整pyspark的解决方案：

from pyspark.sql import functions as F, types as T

# Define schema of the JSON
schema = T.ArrayType(
    T.StructType(
        [
            T.StructField("id", T.StringType()),
            T.StructField("title", T.StringType()),
            T.StructField("type", T.StringType()),
        ]
    )
)
# OR you can use also this schema with your current example
schema = T.ArrayType(T.MapType(T.StringType(), T.StringType()))

# Convert string to struct 
df = df.withColumn(
    "campaigns",
    F.from_json("campaigns", schema),
)

# Explode the array
df = df.withColumn("campaign", F.explode("campaigns"))

# Rename the field
df = df.select(
    "id",
    F.col("campaign.id").alias("caimpagnId"),
    F.col("campaign.title"),
    F.col("campaign.type"),
)

+---+----------+-----+-----+
| id|caimpagnId|title| type|
+---+----------+-----+-----+
|  2|         1| test|  one|
|  2|         2|test2|  two|
|  5|         3|test3|three|
+---+----------+-----+-----+

赞(0）回复(0）举报 2023-02-15

我来回答

pyspark 有没有办法指定Spark SQL中inline_outer函数生成的列名？

2条答案

相关问题

热门标签

最新问答