pyspark 如何在python中拆分一个有多个分隔符的字符串？

bvhaajcl 于 2022-11-21 发布在 Spark

关注(0)|答案(1)|浏览(112)

我有一个pyspark Dataframe ，其中包含一列，该列包含如下所示的字符串

qwe1
tre1.eyyu
cvbn.poiu.sdfg

- A value could be a single string (qwe1)
- A value could have one delimiter, i.e ".", and characters on both side of it.(tre1.eyyu)
- A value could have two delimiters. (cvbn.poiu.sdfg)

代码如下

p1 = "<path_to_parquet file>"
df_ref_parquet = spark.read.option('header', True).parquet(p1)
table = [x["FILList"] for x in df_LDR_parquet.rdd.collect()]
fil_cd_left = []
for row in table:
    row.split(".")
        fil_cd_left.append(row[0:4])
print(fil_cd_left)

我想创建3个列表。

- hence I have written a script that will iterate over the data frame, split it on "." and create a first list that has all values in the default format as shown above.
- Now I have applied python slicing to get extreme left hand side 4 characters before the delimiter ".", and appended it to another list.

然而，我无法创建另外两个列表，这两个列表将保存分隔符的最右侧字符，以及存在于两个分隔符之间的字符集。
请建议，请让我知道，如果我不能适当地解释。我会尝试重新措辞。
注：我在Stackoverflow中搜索了其他文章，但它们似乎与我的场景无关。

pyspark

来源：https://stackoverflow.com/questions/74320317/how-to-split-a-string-having-more-than-one-separator-in-python

1条答案

按热度按时间

z9smfwbn1#

如果您希望每次都有一个三人列表，请预先定义它。

for row in table:
    out = ['','', '']
    for index, word in enumerate(row.split('.', maxsplit=2)):
        out[index] = word
    print(out)

输出量：

['qwe1', '', '']
['tre1', 'eyyu', '']
['cvbn', 'poiu', 'sdfg']

赞(0）回复(0）举报 2022-11-21

我来回答

pyspark 如何在python中拆分一个有多个分隔符的字符串？

1条答案

相关问题

热门标签

最新问答