如何处理表定义中的重复列

piv4azn7  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(391)

给定具有以下模式的Dataframe。问题是Dataframe是动态的,它的字段也是动态的。所以你可以预先假设一个给定的模式。

root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)
 |-- c: string (nullable = true)
 |-- a: string (nullable = true)
 |-- b: long (nullable = true)
 |-- c: long (nullable = true)
 |-- d: long (nullable = true)
 |-- a: long (nullable = true)

显示以下错误:-

Found duplicate column(s) in table definition

如何重命名列名以消除歧义

8yparm6h

8yparm6h1#

下面是如何重命名它

import spark.implicits._

val df = Seq(
  ("a", 1, "a"),
  ("a", 1, "a"),
  ("a", 1, "a")
).toDF("a", "x", "a")

val columns = List("a", "b", "c")
val newDF = df.toDF(columns: _*)

newDF.show(false)
newDF.printSchema()

新输出:

+---+---+---+
|a  |b  |c  |
+---+---+---+
|a  |1  |a  |
|a  |1  |a  |
|a  |1  |a  |
+---+---+---+

新架构:

root
 |-- a: string (nullable = true)
 |-- b: integer (nullable = false)
 |-- c: string (nullable = true)

相关问题