将dataframe的模式更改为其他模式

dauxcl2d  于 2021-05-22  发布在  Spark
关注(0)|答案(2)|浏览(543)

我有一个像这样的Dataframe

  1. df.printSchema()
  2. root
  3. |-- id: integer (nullable = true)
  4. |-- data: struct (nullable = true)
  5. | |-- foo01 string (nullable = true)
  6. | |-- bar01 string (nullable = true)
  7. | |-- foo02 string (nullable = true)
  8. | |-- bar02 string (nullable = true)

我想把它变成

  1. root
  2. |-- id: integer (nullable = true)
  3. |-- foo: struct (nullable = true)
  4. | |-- foo01 string (nullable = true)
  5. | |-- foo02 string (nullable = true)
  6. |-- bar: struct (nullable = true)
  7. | |-- bar01 string (nullable = true)
  8. | |-- bar02 string (nullable = true)

最好的办法是什么?

ekqde3dh

ekqde3dh1#

您可以将struct函数与select一起使用,如下所示:

  1. from pyspark.sql import functions as F
  2. finalDF = df.select( "id",
  3. F.struct("data.foo01", "data.foo02").alias("foo"),
  4. F.struct("data.bar01", "data.bar02").alias("bar")
  5. )
  6. finalDF.printSchema

架构:

  1. root
  2. |-- id: string (nullable = true)
  3. |-- foo: struct (nullable = false)
  4. | |-- foo01: string (nullable = true)
  5. | |-- foo02: string (nullable = true)
  6. |-- bar: struct (nullable = false)
  7. | |-- bar01: string (nullable = true)
  8. | |-- bar02: string (nullable = true)
展开查看全部
a5g8bdjr

a5g8bdjr2#

您可以简单地使用struct pyspark函数。

  1. from pyspark.sql.functions import struct
  2. new_df = df.select(
  3. 'id',
  4. struct('data.foo01', 'data.foo02').alias('foo'),
  5. struct('data.bar01', 'data.bar02').alias('bar'),
  6. )

与struct pyspark函数相关的附加说明:它可以获取字符串列名列表,以便只将列移动到结构中,或者需要表达式列表。

相关问题