我想从一个给定的列“名称”修剪额外的字符。为此,我使用expr函数,在该函数中传递一个SQL表达式来修剪多余的字符。
from pyspark.sql import Row
from pyspark.sql.functions import expr
data = [
Row(id = 1, name = "Lisa Brenan", phone = Row(home = "+1 23456789", personal = None), projects = ["CIBC", "Shell"], salary = 11000),
Row(id = 2, name = " Thomas Kingston", phone = Row(home = "+1 98765432", personal = "+1 2345665432"), projects = ["BMW"], salary = 15000),
Row(id = 3, name = "[Lucy Pierson]", phone = Row(home = None, personal = None), projects= None, salary = 20000)
]
df = spark.createDataFrame(data)
df.\
withColumn("correct_name", expr("rtrim(TRAILING ']' FROM name)")).\
select("correct_name").\
show()
我收到以下错误消息:
ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'FROM'.(line 1, pos 19)
== SQL ==
rtrim(TRAILING ']' FROM name)
-------------------^^^
请告诉我原因和正确的解决方法。
2条答案
按热度按时间j0pj023g1#
我已经找到了自己问题的答案。
不能将TRAILING或LEADING与rtrim和ltrim同时使用。这就是它失败的原因。
nr7wwzry2#
这是你要找的东西请注意,
expr()
中不需要FROM
语句。告诉我这是否有帮助。退货: