从Scala到Spark

2o7dmzc5  于 2022-11-09  发布在  Scala
关注(0)|答案(1)|浏览(209)

我必须将下面的代码转换为Spark,但我不明白Seq在这段代码中到底执行了什么?

val tempFactDF = unionTempDF.join(fact.select("x","y","d","f","s"),
                                  Seq("x","y","d","f")).dropDuplicates
bnlyeluc

bnlyeluc1#

在这里,它对多个列执行联接操作,并定义为Seq("x","y","d","f")
它相当于:

val joiningTable = fact.select("x","y","d","f","s")
unionTempDF.join(joiningTable, unionTempDF("x") === joiningTable("x") &&
unionTempDF("y") === joiningTable("y") &&
unionTempDF("d") === joiningTable("d") &&
unionTempDF("f") === joiningTable("f"))

相关问题