Spark DataFrame将字符串转换为日期结果为空值

3pmvbmvn  于 2023-05-18  发布在  Apache
关注(0)|答案(1)|浏览(192)

当我试图将Spark DataFrame中的字符串日期转换为date类型时,我得到了null

# Create a list of data
data = [(1, "20230517"), (2, "20230518"), (3, "20230519"), (4, "null")]

# Create a DataFrame from the list of data
df = spark.createDataFrame(data, ("id", "date"))

df.show()

df.printSchema()

root
 |-- id: long (nullable = true)
 |-- date: string (nullable = true)

# Convert the SaleDate column to datetime format
df1 = df.withColumn("date", df.date.cast('date'))
df1.select('date').show()

+--------+
|date    |
+--------+
|    null|
|    null|
|    null|
|    null|
8yoxcaq7

8yoxcaq71#

对于这个操作,你应该使用F.to_date()并指定你想要解析的格式(在你的例子中是yyyyMMdd):

F.to_date('date', format='yyyyMMdd')

我使用的完整代码:

from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = SparkSession.builder.appName('spark_session').getOrCreate()

# Create a list of data
data = [(1, "20230517"), (2, "20230518"), (3, "20230519"), (4, "null")]

# Create a DataFrame from the list of data
df = spark.createDataFrame(data, ("id", "date"))

# Convert the SaleDate column to datetime format
df1 = df.withColumn("date", F.to_date('date', format='yyyyMMdd'))
df1.select('date').show()

+----------+
|      date|
+----------+
|2023-05-17|
|2023-05-18|
|2023-05-19|
|      null|
+----------+

相关问题