sql中正则表达式的提取

yiytaume  于 2021-07-26  发布在  Java
关注(0)|答案(2)|浏览(514)

我有下面的数据格式,我正试图从中提取id部分, {"memberurn"=urn:li:member:10000012} 这是我的密码,

CAST(regexp_extract(key.memberurn, 'urn:li:member:(\\d+)', 1) AS BIGINT) AS member_id

在输出成员中\u id为null我做错了什么?

kxkpmulp

kxkpmulp1#

试试这个:

from pyspark.sql import SparkSession
 import pyspark.sql.functions as F
 from pyspark.sql.types import LongType

 spark = SparkSession.builder \
.appName('practice')\
.getOrCreate()

 sc= spark.sparkContext

 df= sc.parallelize([
 [""" {"memberurn"=urn:li:member:10000012}"""]]).toDF(["a"])

 df.show(truncate=False)

+-------------------------------------+
|a                                    |
+-------------------------------------+
| {"memberurn"=urn:li:member:10000012}|
+-------------------------------------+

df1= df.withColumn("id", F.regexp_extract(F.col('a'), 
'(urn:li:member:)(\d+)', 2))

df2= df1.withColumn("id",df1["id"].cast(LongType()))

df2.show()

+-------------------------------------+--------+
|a                                    |id      |
+-------------------------------------+--------+
| {"memberurn"=urn:li:member:10000012}|10000012|
+-------------------------------------+--------+

print(df2.printSchema())

root
  |-- a: string (nullable = true)
  |-- id: long (nullable = true)
velaa5lx

velaa5lx2#

在斯卡拉-

使用正则表达式

val df = spark.range(1).withColumn("memberurn", lit("urn:li:member:10000012"))
    df.withColumn("member_id",
      expr("""CAST(regexp_extract(memberurn, 'urn:li:member:(\\d+)', 1) AS BIGINT)"""))
      .show(false)

    /**
      * +---+----------------------+---------+
      * |id |memberurn             |member_id|
      * +---+----------------------+---------+
      * |0  |urn:li:member:10000012|10000012 |
      * +---+----------------------+---------+
      */

使用子字符串索引

df.withColumn("member_id",
      substring_index($"memberurn", ":", -1).cast("bigint"))
      .show(false)

    /**
      * +---+----------------------+---------+
      * |id |memberurn             |member_id|
      * +---+----------------------+---------+
      * |0  |urn:li:member:10000012|10000012 |
      * +---+----------------------+---------+
      */

相关问题