如何找到有数字的字符串

4szc88ey  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(569)

我想用spark找到有数字的字符串。

  1. Example : String = "abc def ghi2 xyz4"
  2. Answer : ghi2 xyz4
yquaqz18

yquaqz181#

将输入拆分为seq
用regex+添加where子句 rlike 用于检查数字的函数
Map到字符串 .map(row => row.getString(0)) 连接并输出字符串

  1. import org.apache.spark.sql.functions._
  2. object CheckDigitInString {
  3. def main(args: Array[String]): Unit = {
  4. val input = "abc def ghi2 xyz4"
  5. val spark = Constant.getSparkSess
  6. import spark.implicits._
  7. val inputDf = input.split(" ").toSeq.toDF
  8. val output = inputDf.where(col("value").rlike(".*[0-9]+.*"))
  9. .map(row => row.getString(0))
  10. .collect().mkString(" ")
  11. println(output)
  12. }
  13. }
展开查看全部
5rgfhyps

5rgfhyps2#

  1. +-----------------+
  2. | value|
  3. +-----------------+
  4. |abc def ghi2 xyz4|
  5. | 0d2 234 AXZ Mxei|
  6. +-----------------+
  7. Seq("abc def ghi2 xyz4","0d2 234 AXZ Mxei").toDF()
  8. .select('*,monotonically_increasing_id.as("id"))
  9. .select('id,explode(split('value," ")))
  10. .select('*,regexp_extract('col,"\\d",0).as("digit"))
  11. .filter('digit.notEqual(""))
  12. .groupBy('id)
  13. .agg(concat_ws(" ",collect_list('col)).as("value"))
  14. .show()

输出

  1. +---+---------+
  2. | id| value|
  3. +---+---------+
  4. | 0|ghi2 xyz4|
  5. | 1| 0d2 234|
  6. +---+---------+

或使用rdd:

  1. Seq("abc def ghi2 xyz4","0d2 234 AXZ Mxei").toDF().rdd.map(s=>s.getString(0).split(" ").filter(s=>s.matches(".*\\d.*")))
展开查看全部

相关问题