如何在Pyspark中从Schema中动态获取列列表

yr9zkbsy  于 2023-03-28  发布在  Spark
关注(0)|答案(1)|浏览(116)

在下面的架构中:如何在Pyspark中获取列表中的列。
[在此处输入图像描述] enter image description herehttps://i.stack.imgur.com/jYgUQ.png
这就是模式的样子

my_list = parsed_df.schema.fields
for field in my_list:
    print(field.name)
ilmyapht

ilmyapht1#

希望这能帮上忙
带结构的 Dataframe 示例

root
 |-- name: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- middlename: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |-- state: string (nullable = true)
 |-- gender: string (nullable = true)

若要将所有列放入列表中,请在下面的代码

struct_fields = df2.schema['name'].dataType.fieldNames()
print("struct fields :",struct_fields)
print("columns without struct:", df2.columns)
final_columns = df2.columns + struct_fields
print("final columns: ", final_columns)

输出:

struct fields : ['firstname', 'middlename', 'lastname']
columns without struct: ['name', 'state', 'gender']
final columns:  ['name', 'state', 'gender', 'firstname', 'middlename', 'lastname']

相关问题