python 在pyspark中将列数据转换为大写

rslzwgfq  于 2023-01-19  发布在  Python
关注(0)|答案(2)|浏览(228)

我有一个数据集df在我有

country| indicator|date|year&week| value

作为列名,我想转换的数据只有国家列大写使用pyspark(只有数据没有标题)我尝试

import pyspark.sql.functions as f

df.select("*", f.upper("country"))
display(df)

但出现错误“NoneType”对象没有属性“select”

mf98qq94

mf98qq941#

我不会使用select,因为select不会更改 Dataframe ,它会提供一个新的 Dataframe ,其中添加了一列生成的函数数据。
我使用了withColumn,它工作得很好,请参考以下代码片段:

import pyspark.sql.functions as f
import pandas as pd

# Sample Data
data = {
  "country": ["United States", "Canada", "spain", "germany"],
  "indicator": ["1", "2", "3", "4"],
  "date": ["2022/01/01", "2021/01/01", "2020/01/01", "2019/01/01"],
  "year&week": ["2022-52", "2021-34", "2020-32", "2019-45"],
  "value": ["56", "28", "258", "425"]
}
df = pd.DataFrame.from_dict(data)
# Convert to spark dataframe
df = spark.createDataFrame(df)
# Apply your function to the column you choose
df = df.withColumn("country", f.upper(f.col("country")))

现在,您可以使用df.show()display(df)进行检查,您将获得以下输出:

df.show()
+-------------+---------+----------+---------+-----+
|      country|indicator|      date|year&week|value|
+-------------+---------+----------+---------+-----+
|UNITED STATES|        1|2022/01/01|  2022-52|   56|
|       CANADA|        2|2021/01/01|  2021-34|   28|
|        SPAIN|        3|2020/01/01|  2020-32|  258|
|      GERMANY|        4|2019/01/01|  2019-45|  425|
+-------------+---------+----------+---------+-----+
new9mtju

new9mtju2#

simpleData = [["Canada","Y"],["Spain","N"],["Brazil","Y"],
   ["Japan","Y"],["India","N"] ]

df = spark.createDataFrame(simpleData,["country","indicator"])

#input 

display(df)

import pyspark.sql.functions as f

upperDf=df.withColumn("country", f.upper("country"))

#output

display(upperDf)

相关问题