如何在pyspark中更新模式

axr492tv  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(309)

我有一个json数据集,其中有下面的模式

myjsondata = spark.read.json("/FileStore/tables/customer.json")

myjsondata.printSchema()


我想更新这个模式,所以我使用了下面的命令

myjsondataDDL="address_id INT,birth_country String,birthdate date,customer_id INT,demographics STRUCT<buy_potential: string,credit_rating: string,education_status: string,income_range: array<>,purchase_estimate:INT,vehicle_count: INT>,email_address: string,firstname: string,gender: string,is_preffered_customer: string,lastname: string,salutation: string"

我无法在此更新架构。怎么做?

hwazgwia

hwazgwia1#

请尝试下面的架构。您的架构有一些语法错误,有一些不需要的冒号(只有struct type中的字段名才需要冒号)和缺少的数组类型。

myjsondataDDL = """
    address_id INT,
    birth_country String,
    birthdate date,
    customer_id INT,
    demographics STRUCT<buy_potential: string, credit_rating: string, education_status: string, income_range: array<int>, purchase_estimate:INT, vehicle_count: INT>,
    email_address string,
    firstname string,
    gender string,
    is_preffered_customer string,
    lastname string,
    salutation string
"""
myjsondata = spark.read.schema(myjsondataDDL).json('absolute path of file')

相关问题