我们如何使用spark中的where条件进行下面的配置单元查询？

3lxsmp7m 于 2021-07-12 发布在 Spark

关注(0)|答案(1)|浏览(291)

我对spark scala框架还比较陌生，下面的查询有子查询。据我所知，spark不支持子查询，而且group by function支持一次多列？

select id, email from test1 
where country in (select distinct salary from test2)
group by id ,email ;

在上面的spark中，查询转换成这样，但问题是如何使用来自不同Dataframe的where条件。我们可以在这里使用连接吗？如何将整个查询转换为spark？

val m = test1.select("id","email")
   val k = test2.select("salary").distinct
   val l =  m.groupby("id","salary")

scala apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/66524996/how-we-can-use-where-condition-in-spark-for-below-hive-query

1条答案

按热度按时间

zfciruhq1#

您可以尝试使用半联接来表示子查询：

val m = test1.select("id","email","country")
val k = test2.select("salary").distinct

val df = m.join(k, m("country") === k("salary"), "left_semi")
val l = df.select("id","salary").distinct()

赞(0）回复(0）举报 2021-07-12

我来回答

我们如何使用spark中的where条件进行下面的配置单元查询？

1条答案

相关问题

热门标签

最新问答