在列中查找可变长度字符串

nvbavucw  于 2021-05-29  发布在  Spark
关注(0)|答案(2)|浏览(347)

我有一个这样的专栏:

Class
A
AA
BB
AAAA
ABA
AAAAA

我想做的是,过滤掉这个只有a,没有其他的列。结果是这样的:

Class
A
AA
AAAA
AAAAA

在spark有没有办法做到这一点?

bfhwhh0e

bfhwhh0e1#

检查以下代码。

scala> val df = Seq("A","AA","BB","AAAA","ABA","AAAAA","BAB").toDF("Class")
df: org.apache.spark.sql.DataFrame = [Class: string]

scala> df.filter(!col("Class").rlike("[^A]+")).show
+-----+
|Class|
+-----+
|    A|
|   AA|
| AAAA|
|AAAAA|
+-----+
v1uwarro

v1uwarro2#

尝试使用rlike函数

val data1 =
      """
        |Class
        |A
        |AA
        |BB
        |AAAA
        |ABA
        |AAAAA
      """.stripMargin
    val stringDS1 = data1.split(System.lineSeparator())
      .map(_.split("\\|").map(_.replaceAll("""^[ \t]+|[ \t]+$""", "")).mkString(","))
      .toSeq.toDS()
    val df1 = spark.read
      .option("sep", ",")
      .option("inferSchema", "true")
      .option("header", "true")
      .option("nullValue", "null")
      .csv(stringDS1)
    df1.show(false)
    df1.printSchema()

    /**
      * +-----+
      * |Class|
      * +-----+
      * |A    |
      * |AA   |
      * |BB   |
      * |AAAA |
      * |ABA  |
      * |AAAAA|
      * +-----+
      *
      * root
      * |-- Class: string (nullable = true)
      */

    df1.filter(col("Class").rlike("""^A+$"""))
      .show(false)

    /**
      * +-----+
      * |Class|
      * +-----+
      * |A    |
      * |AA   |
      * |AAAA |
      * |AAAAA|
      * +-----+
      */

相关问题