如何在Spark Scala的df列中执行Luhn检查

7y4bm7vi  于 2023-10-18  发布在  Scala
关注(0)|答案(2)|浏览(122)

df有一个字符串列,如“100256437”。我想多加一列来检查它是否会通过Luhn。如果是真的,那就是假的。

def Mod10(c: Column): Column = {
    var (odd, sum) = (true, 0)
 
    for (int <- c.reverse.map { _.toString.toShort }) {
      println(int)
      if (odd) sum += int
      else sum += (int * 2 % 10) + (int / 5)
      odd = !odd
    }
    lit(sum % 10 === 0)
  }

错误代码:

error: value reverse is not a member of org.apache.spark.sql.Column
    for (int <- c.reverse.map { _.toString.toShort }) {
                  ^
error: value === is not a member of Int
    lit(sum % 10 === 0)
                 ^
j7dteeu8

j7dteeu81#

看起来,你正在处理Spark Dataframes。
假设你有一个

val df = List("100256437", "79927398713").toDF()

df.show()
+-----------+
|      value|
+-----------+
|  100256437|
|79927398713|
+-----------+

现在,您可以将此Luhn测试实现为一个UDF,

val isValidLuhn = udf { (s: String) =>
  val array = s.toCharArray.map(_.toString.toInt)

  val len = array.length

  var i = 1
  while (i < len) {
    if (i % 2 == 0) {
      var updated = array(len - i) * 2
      while (updated > 9) {
        updated = updated.toString.toCharArray.map(_.toString.toInt).sum
      }
      array(len - i) = updated
    }
    i = i + 1
  }

  val sum = array.sum

  println(array.toList)

  (sum % 10) == 0
}

它可以用作,

val dfWithLuhnCheck = df.withColumn("isValidLuhn", isValidLuhn(col("value")))

dfWithLuhnCheck.show()
+-----------+-----------+
|      value|isValidLuhn|
+-----------+-----------+
|  100256437|       true|
|79927398713|       true|
+-----------+-----------+
vlju58qv

vlju58qv2#

**Spark 3.5+**有luhn_check

expr("luhn_check(value)")

完整示例:

val df = List("8112189876", "79927398714").toDF()

val df2 = df.withColumn("luhn_valid", expr("luhn_check(value)"))
df2.show()
// +-----------+----------+
// |      value|luhn_valid|
// +-----------+----------+
// | 8112189876|      true|
// |79927398714|     false|
// +-----------+----------+

相关问题