我有一个pyspark Dataframe ,其中包含一列,该列包含如下所示的字符串
qwe1
tre1.eyyu
cvbn.poiu.sdfg
- A value could be a single string (qwe1)
- A value could have one delimiter, i.e ".", and characters on both side of it.(tre1.eyyu)
- A value could have two delimiters. (cvbn.poiu.sdfg)
代码如下
p1 = "<path_to_parquet file>"
df_ref_parquet = spark.read.option('header', True).parquet(p1)
table = [x["FILList"] for x in df_LDR_parquet.rdd.collect()]
fil_cd_left = []
for row in table:
row.split(".")
fil_cd_left.append(row[0:4])
print(fil_cd_left)
我想创建3个列表。
- hence I have written a script that will iterate over the data frame, split it on "." and create a first list that has all values in the default format as shown above.
- Now I have applied python slicing to get extreme left hand side 4 characters before the delimiter ".", and appended it to another list.
然而,我无法创建另外两个列表,这两个列表将保存分隔符的最右侧字符,以及存在于两个分隔符之间的字符集。
请建议,请让我知道,如果我不能适当地解释。我会尝试重新措辞。
注:我在Stackoverflow中搜索了其他文章,但它们似乎与我的场景无关。
1条答案
按热度按时间z9smfwbn1#
如果您希望每次都有一个三人列表,请预先定义它。
输出量: