spark用正则表达式替换列子串

drkbr07n  于 2021-05-17  发布在  Spark
关注(0)|答案(2)|浏览(640)

我有一张scala spark的table,上面有:

  1. val df = Seq(("1ST ST","NICK"),("2ND STREET","SAM"),("3RD AVE","ERIC"),("4TH AVENUE","SARAH")).toDF("STREET_NAME","NAME")

我想替换子字符串 STREETST 以及 AVENUEAVE 在列中 STREET_NAME . 我试过了,但没用:

  1. df.withColumn(STREET_NAME,
  2. regexp_replace(
  3. $"STREET_NAME",
  4. lit("STREET"),
  5. "ST"
  6. )
  7. )

或者有更好的方法来替换子字符串吗?

jjjwad0x

jjjwad0x1#

lit 不需要,因为第二个参数应该是要匹配的字符串模式,而不是列。

  1. df.withColumn("STREET_NAME", regexp_replace($"STREET_NAME", "STREET", "ST"))
ohtdti5x

ohtdti5x2#

尝试 replace (或) regexp_replace() spark内置功能。

  1. df.show()
  2. //+-----------+-----+
  3. //|STREET_NAME| NAME|
  4. //+-----------+-----+
  5. //| 1ST ST| NICK|
  6. //| 2ND STREET| SAM|
  7. //| 3RD AVE| ERIC|
  8. //| 4TH AVENUE|SARAH|
  9. //+-----------+-----+
  10. df.createOrReplaceTempView("tmp")
  11. spark.sql("select replace(replace(STREET_NAME,'STREET','ST'),'AVENUE','AVE') as STREET_NAME,NAME from tmp").show()
  12. //+-----------+-----+
  13. //|STREET_NAME| NAME|
  14. //+-----------+-----+
  15. //| 1ST ST| NICK|
  16. //| 2ND ST| SAM|
  17. //| 3RD AVE| ERIC|
  18. //| 4TH AVE|SARAH|
  19. //+-----------+-----+
  20. //or using regexp_replace function
  21. df.withColumn("STREET_NAME",regexp_replace(regexp_replace(col("STREET_NAME"),"STREET","ST"),"AVENUE","AVE")).show()
  22. //+-----------+-----+
  23. //|STREET_NAME| NAME|
  24. //+-----------+-----+
  25. //| 1ST ST| NICK|
  26. //| 2ND ST| SAM|
  27. //| 3RD AVE| ERIC|
  28. //| 4TH AVE|SARAH|
  29. //+-----------+-----+
展开查看全部

相关问题