Pyspark从字符串列创建Map类型列

jxct1oxe  于 2023-08-02  发布在  Spark
关注(0)|答案(1)|浏览(132)

你好,我有一个列的表是这样的:
VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13
从这一行数据中,
输出应为maptype列:
| Map列| MapColumn |
| --| ------------ |
| {“VER”:“65010000”,“DLL”:“some_dll”,“as”:“bcd,2.sc4”,“OR”:“SCT”,“SG”:“3”,“SLC”:“13”}| {"VER": "65010000", "DLL":"some_dll", "as":"bcd,2.sc4", "OR":"SCT", "SG":"3", "SLC":"13"} |
我尝试使用explode,但我得到的每个键,值对只有一行

df= df.withColumn("map_col", f.explode(f.split(f.col("data")," ")))\
                         .withColumn("key", f.trim(f.split(f.col("data"),":").getItem(0)))\
                         .withColumn("value", f.trim(f.split(f.col("data"),":").getItem(1)))\
                         .withColumn("map_col", f.create_map(f.col("key"),f.col("value")))

字符串

az31mfrm

az31mfrm1#

请注意,您必须确保:

  1. ':'分隔键/值对,它不会显示在其他任何地方
    1.一个空格分隔条目,它不会显示在其他任何地方
from pyspark.sql import functions as F
from pyspark.sql import Column

_data = [
    ('VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13',),
]
df = spark.createDataFrame(_data, ['data', ])

def parse_struct(x: Column) -> Column:
    inner_split = F.split(x, pattern=':')
    return F.struct(inner_split.getItem(0), inner_split.getItem(1))

split = F.split('data', pattern='\s').alias('split')
map_col = F.map_from_entries(F.transform(split, parse_struct))
df2 = df.withColumn('map_column', map_col)
df2.select('map_column').show(10, False)

+----------------------------------------------------------------------------------+
|map_column                                                                        |
+----------------------------------------------------------------------------------+
|{VER -> some_ver, DLL -> some_dll, as -> bcd,2.sc4, OR -> SCT, SG -> 3, SLC -> 13}|
+----------------------------------------------------------------------------------+

字符串

相关问题