从pyspark写入 Dataframe 时出错

y53ybaqx 于 2021-09-29 发布在 Java

关注(0)|答案(0)|浏览(197)

一些背景：我们在apache超集中创建了一个空帧。然后，我们使用cloudera将api调用的输出写入空帧。将输出转换为python Dataframe 进行存储。
此处的模式示例：

try:
    print('Define schema to create PySpark Dataframe from Pandas Dataframe')
    spark_schema = StructType([
                                 StructField('community', StringType(), True),
                                 StructField('community_id', StringType(), True),
                                 StructField('domain_name', StringType(), True),
                                 StructField('domain_id', StringType(), True),
                                 StructField('full_name', StringType(), True),
                                 StructField('name', StringType(), True),
                                 StructField('asset_id', StringType(), True),
                                 StructField('configuration', StringType(), True),
                                 StructField('configuration_id', StringType(), True),
                                 StructField('status', StringType(), True),
                                 StructField('country', StringType(), True),
                                 StructField('explicitly_approved_by', StringType(), True),
                                 StructField('explicitly_approved_on', DateType(), True),

When I execute this piece of code to write:
df1.write.mode('overwrite').parquet('<path>')

This is the error I get :
Write PySpark Dataframe into S3 Bucket in Analytical Cluster
Invalid argument, not a string or column: 8.962915881474812 of type <class 'float'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
Email Notification Sent

我们有pyspark字符串、datetime和timestamp列，我们试图用varchar、date和timestamp列写入超集。
一个主要问题是pyspark不接受超集接受的varchartType，超集不接受pyspark接受的字符串。
我们如何解决这个问题？

来源：https://stackoverflow.com/questions/68542426/error-occurs-when-writing-dataframe-from-pyspark

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

从pyspark写入 Dataframe 时出错

暂无答案！

相关问题

热门标签

最新问答