scala—如果第一个Dataframe中存在行，如何更新第二个Dataframe的exists值

pdkcd3nj 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(436)

我有两个Dataframe，我想检查df1是否包含df2中键为a和b的任何行，如果相等，则将df2中的exists更改为true，并使用exists false添加df1中的新行
df1型

a | b | c | d
1 | 1 | 3 | 4
2 | 2 | 4 | 1
3 | 3 | 5 | 3

df2型

a | b | c | d
1 | 1 | 4 | 5
4 | 4 | 3 | 2

这应该是
df3型

a | b | c | d | exists
1 | 1 | 4 | 5 | True
4 | 4 | 3 | 2 | False
1 | 1 | 3 | 4 | False
2 | 2 | 4 | 1 | False
3 | 3 | 5 | 3 | False

到目前为止我有这个

val newdf = df1.join(df2, df1("a")===df2("a") && df1("b") === df2("b"), "left")
   .select(df2("a"), df2("b"),df2("c"),df2("d"),when(df2("a").isNull, false).otherwise(true).alias("exists"))

它回来了

a | b | c | d | exists
1 | 1 | 4 | 5 | True
rest of the rows are null

scala DataFrame apache-spark

来源：https://stackoverflow.com/questions/63235330/how-to-update-second-dataframes-exists-value-if-row-exists-in-first-dataframe

1条答案

按热度按时间

68bkxrlz1#

尝试 left_semi, left_anti 然后加入 unionAll 数据集。 Example: ```
df2.join(df1,Seq("a","b"),"left_semi").withColumn("exists",lit("True")).
unionAll(df2.join(df1,Seq("a","b"),"left_anti").withColumn("exists",lit("False"))).
unionAll(df1.withColumn("exists",lit("False"))).show()
//+---+---+---+---+------+
//| a| b| c| d|exists|
//+---+---+---+---+------+
//| 1| 1| 4| 5| True|
//| 4| 4| 3| 2| False|
//| 1| 1| 3| 4| False|
//| 2| 2| 4| 1| False|
//| 3| 3| 5| 3| False|
//+---+---+---+---+------+

展开查看全部

赞(0）回复(0）举报 2021-05-27

我来回答

scala—如果第一个Dataframe中存在行，如何更新第二个Dataframe的exists值

1条答案

相关问题

热门标签

最新问答