我想用spark找到有数字的字符串。
Example : String = "abc def ghi2 xyz4"Answer : ghi2 xyz4
Example : String = "abc def ghi2 xyz4"
Answer : ghi2 xyz4
yquaqz181#
将输入拆分为seq用regex+添加where子句 rlike 用于检查数字的函数Map到字符串 .map(row => row.getString(0)) 连接并输出字符串
rlike
.map(row => row.getString(0))
import org.apache.spark.sql.functions._object CheckDigitInString { def main(args: Array[String]): Unit = { val input = "abc def ghi2 xyz4" val spark = Constant.getSparkSess import spark.implicits._ val inputDf = input.split(" ").toSeq.toDF val output = inputDf.where(col("value").rlike(".*[0-9]+.*")) .map(row => row.getString(0)) .collect().mkString(" ") println(output) }}
import org.apache.spark.sql.functions._
object CheckDigitInString {
def main(args: Array[String]): Unit = {
val input = "abc def ghi2 xyz4"
val spark = Constant.getSparkSess
import spark.implicits._
val inputDf = input.split(" ").toSeq.toDF
val output = inputDf.where(col("value").rlike(".*[0-9]+.*"))
.collect().mkString(" ")
println(output)
}
5rgfhyps2#
+-----------------+| value|+-----------------+|abc def ghi2 xyz4|| 0d2 234 AXZ Mxei|+-----------------+Seq("abc def ghi2 xyz4","0d2 234 AXZ Mxei").toDF().select('*,monotonically_increasing_id.as("id")).select('id,explode(split('value," "))).select('*,regexp_extract('col,"\\d",0).as("digit")).filter('digit.notEqual("")).groupBy('id).agg(concat_ws(" ",collect_list('col)).as("value")).show()
+-----------------+
| value|
|abc def ghi2 xyz4|
| 0d2 234 AXZ Mxei|
Seq("abc def ghi2 xyz4","0d2 234 AXZ Mxei").toDF()
.select('*,monotonically_increasing_id.as("id"))
.select('id,explode(split('value," ")))
.select('*,regexp_extract('col,"\\d",0).as("digit"))
.filter('digit.notEqual(""))
.groupBy('id)
.agg(concat_ws(" ",collect_list('col)).as("value"))
.show()
输出
+---+---------+| id| value|+---+---------+| 0|ghi2 xyz4|| 1| 0d2 234|+---+---------+
+---+---------+
| id| value|
| 0|ghi2 xyz4|
| 1| 0d2 234|
或使用rdd:
Seq("abc def ghi2 xyz4","0d2 234 AXZ Mxei").toDF().rdd.map(s=>s.getString(0).split(" ").filter(s=>s.matches(".*\\d.*")))
2条答案
按热度按时间yquaqz181#
将输入拆分为seq
用regex+添加where子句
rlike
用于检查数字的函数Map到字符串
.map(row => row.getString(0))
连接并输出字符串5rgfhyps2#
输出
或使用rdd: