在pyspark中按字符数添加字符

sf6xfgos  于 2023-04-21  发布在  Apache
关注(0)|答案(1)|浏览(99)

寻找插入特殊字符在特定的字符计数在pyspark字符串-

"M202876QC0581AADMM01"
to
"M-202876-QC0581-AA-DMM01"

(1-6-6-2-)
insertion after 1char then after 6char then after 6char then after 2char

尝试了下面的东西,但还没有运气。

df = spark.createDataFrame([('M202876QC0581AADMM01',)], ['str'])
(df.withColumn("str", F.regexp_replace(F.col("str") ,  r"(\d{0})(\d{3})(\d{3})" , "$1-$2-$3"))).collect()

Out[121]: [Row(str='M-202-876QC0581AADMM01')]
xdnvmnnf

xdnvmnnf1#

你快到了,试试这个:

from pyspark.sql.functions import regexp_replace

df = spark.createDataFrame([("M202876QC0581AADMM01",)], ["str"])

pat = r"^(.{1})(.{6})(.{6})(.{2})(.+)"
df = df.withColumn("str", regexp_replace("str", pat, r"$1-$2-$3-$4-$5"))

输出:

df.show(truncate=False)

+------------------------+
|str                     |
+------------------------+
|M-202876-QC0581-AA-DMM01|
+------------------------+

相关问题