pyspark-将带有month number的Dataframe列转换为另一个带有month name的Dataframe列

jdzmm42g  于 2021-07-13  发布在  Spark
关注(0)|答案(2)|浏览(552)

我正在尝试将dataframe month number列转换为相应的month name列。我尝试了以下方法-

  1. df_month_name = df.withColumn('month_name',calendar.month_abbr['MONTH_NUMBER'])

我有个错误:

  1. AttributeError: 'function' object has no attribute 'month_abbr'

如果还有更好的办法,请告诉我。谢谢!

3hvapo4f

3hvapo4f1#

你可以用 to_date 要将月份转换为日期,请使用 date_format 要获取月份名称:

  1. from pyspark.sql import functions as F
  2. df = spark.createDataFrame([("1",), ("2",), ("3",), ("4",), ("5",)], ["month_number"])
  3. df1 = df.withColumn("month_name", F.date_format(F.to_date("month_number", "MM"), "MMMM")) \
  4. .withColumn("month_abbr", F.date_format(F.to_date("month_number", "MM"), "MMM"))
  5. df1.show()
  6. # +------------+----------+----------+
  7. # |month_number|month_name|month_abbr|
  8. # +------------+----------+----------+
  9. # | 1| January| Jan|
  10. # | 2| February| Feb|
  11. # | 3| March| Mar|
  12. # | 4| April| Apr|
  13. # | 5| May| May|
  14. # +------------+----------+----------+

请注意,对于spark 3,需要设置 spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY") 将月数转换为日期。
也可以使用保存Map的Map列 month_number -> month_abbr :

  1. import calendar
  2. import itertools
  3. from pyspark.sql import functions as F
  4. months = F.create_map(*[
  5. F.lit(m) for m in itertools.chain(*[(x, calendar.month_abbr[x]) for x in range(1, 12, 1)])
  6. ])
  7. df1 = df.withColumn("month_abbr", months[F.col("month_number")])

使用自定义项的另一种方法:

  1. import calendar
  2. from pyspark.sql import functions as F
  3. month_name = F.udf(lambda x: calendar.month_name[int(x)])
  4. month_abbr = F.udf(lambda x: calendar.month_abbr[int(x)])
  5. df1 = df.withColumn("month_name", month_name(F.col("month_number"))) \
  6. .withColumn("month_abbr", month_abbr(F.col("month_number")))
展开查看全部
n1bvdmb6

n1bvdmb62#

如果有人想在scala中执行此操作,可以按以下方式执行:

  1. //Sample Data
  2. val df = Seq(("1"),("2"),("3"),("4"),("5"),("6"),("7"),("8"),("9"),("10"),("11"),("12")).toDF("month_number")
  3. import org.apache.spark.sql.functions._
  4. val df1 = df.withColumn("Month_Abbr",date_format(to_date($"month_number","MM"),"MMM"))
  5. display(df1)

您可以看到如下输出:

相关问题