pyspark 尝试将rtrim与expr一起使用时接收ParseExpression

6yt4nkrj  于 2023-05-16  发布在  Spark
关注(0)|答案(2)|浏览(115)

我想从一个给定的列“名称”修剪额外的字符。为此,我使用expr函数,在该函数中传递一个SQL表达式来修剪多余的字符。

from pyspark.sql import Row
from pyspark.sql.functions import expr

data = [
    Row(id = 1, name = "Lisa Brenan", phone = Row(home = "+1 23456789", personal = None), projects = ["CIBC", "Shell"], salary = 11000),
    Row(id = 2, name = " Thomas Kingston", phone = Row(home = "+1 98765432", personal = "+1 2345665432"), projects = ["BMW"], salary = 15000),
    Row(id = 3, name = "[Lucy Pierson]", phone = Row(home = None, personal = None), projects= None, salary = 20000)
]

df = spark.createDataFrame(data)

df.\
    withColumn("correct_name", expr("rtrim(TRAILING ']' FROM name)")).\
    select("correct_name").\
    show()

我收到以下错误消息:

ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'FROM'.(line 1, pos 19)

== SQL ==
rtrim(TRAILING ']' FROM name)
-------------------^^^

请告诉我原因和正确的解决方法。

j0pj023g

j0pj023g1#

我已经找到了自己问题的答案。

from pyspark.sql import Row
from pyspark.sql.functions import expr

data = [
    Row(id = 1, name = "Lisa Brenan", phone = Row(home = "+1 23456789", personal = None), projects = ["CIBC", "Shell"], salary = 11000),
    Row(id = 2, name = " Thomas Kingston", phone = Row(home = "+1 98765432", personal = "+1 2345665432"), projects = ["BMW"], salary = 15000),
    Row(id = 3, name = "[Lucy Pierson]", phone = Row(home = None, personal = None), projects= None, salary = 20000)
]

df = spark.createDataFrame(data)

df.\
    withColumn("correct_name", expr("trim(TRAILING ']' FROM name)")).\
    select("correct_name").\
    show()

不能将TRAILING或LEADING与rtrim和ltrim同时使用。这就是它失败的原因。

nr7wwzry

nr7wwzry2#

这是你要找的东西请注意,expr()中不需要FROM语句。告诉我这是否有帮助。

from pyspark.sql import Row, SparkSession
from pyspark.sql.functions import expr

spark = SparkSession.builder.appName('ss').getOrCreate()

data = [
    Row(id = 1, name = "Lisa Brenan", phone = Row(home = "+1 23456789", personal = None), projects = ["CIBC", "Shell"], salary = 11000),
    Row(id = 2, name = " Thomas Kingston", phone = Row(home = "+1 98765432", personal = "+1 2345665432"), projects = ["BMW"], salary = 15000),
    Row(id = 3, name = "[Lucy Pierson]", phone = Row(home = None, personal = None), projects= None, salary = 20000)
]

df = spark.createDataFrame(data=data)

df = (
    df
    .withColumn("correct_name", expr("rtrim(']', name)"))
    .select("correct_name")
    )

df.show()

退货:

+----------------+
|    correct_name|
+----------------+
|     Lisa Brenan|
| Thomas Kingston|
|   [Lucy Pierson|
+----------------+

相关问题