pyspark 如何在python中拆分一个有多个分隔符的字符串?

bvhaajcl  于 2022-11-21  发布在  Spark
关注(0)|答案(1)|浏览(112)

我有一个pyspark Dataframe ,其中包含一列,该列包含如下所示的字符串

qwe1
tre1.eyyu
cvbn.poiu.sdfg

- A value could be a single string (qwe1)
- A value could have one delimiter, i.e ".", and characters on both side of it.(tre1.eyyu)
- A value could have two delimiters. (cvbn.poiu.sdfg)

代码如下

p1 = "<path_to_parquet file>"
df_ref_parquet = spark.read.option('header', True).parquet(p1)
table = [x["FILList"] for x in df_LDR_parquet.rdd.collect()]
fil_cd_left = []
for row in table:
    row.split(".")
        fil_cd_left.append(row[0:4])
print(fil_cd_left)

我想创建3个列表。

- hence I have written a script that will iterate over the data frame, split it on "." and create a first list that has all values in the default format as shown above.
- Now I have applied python slicing to get extreme left hand side 4 characters before the delimiter ".", and appended it to another list.

然而,我无法创建另外两个列表,这两个列表将保存分隔符的最右侧字符,以及存在于两个分隔符之间的字符集。
请建议,请让我知道,如果我不能适当地解释。我会尝试重新措辞。
注:我在Stackoverflow中搜索了其他文章,但它们似乎与我的场景无关。

z9smfwbn

z9smfwbn1#

如果您希望每次都有一个三人列表,请预先定义它。

for row in table:
    out = ['','', '']
    for index, word in enumerate(row.split('.', maxsplit=2)):
        out[index] = word
    print(out)

输出量:

['qwe1', '', '']
['tre1', 'eyyu', '']
['cvbn', 'poiu', 'sdfg']

相关问题