pyspark 如何使列表元素小写?

4si2a6ki  于 2022-12-11  发布在  Spark
关注(0)|答案(2)|浏览(126)

I have a df tthat one of the columns is a set of words. How I can make them lower case in the efficient way? The df has many column but the column that I am trying to make it lower case is like this:

B
['Summer','Air Bus','Got']
['Parmin','Home']

Note:
In pandas I do df['B'].str.lower()

iyfamqjs

iyfamqjs1#

如果我没理解错的话,你有一个列,它是一个字符串数组。
要降低字符串,可以这样使用lower函数:

from pyspark.sql import SparkSession
import pyspark.sql.functions as F

data = [
    {"B": ["Summer", "Air Bus", "Got"]},
]

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(data)
df = df.withColumn("result", F.expr("transform(B, x -> lower(x))"))

结果:

+----------------------+----------------------+                                 
|B                     |result                |
+----------------------+----------------------+
|[Summer, Air Bus, Got]|[summer, air bus, got]|
+----------------------+----------------------+
toiithl6

toiithl62#

与@vladsiv的答案略有不同,它试图回答上面注解中关于传递动态列名的问题。

# set column name
m = "B"

# use F.tranform directly, rather than in a F.expr
df = df.withColumn("result", F.transform(F.col(m), lambda x:F.lower(x)))

相关问题