mysql查询到hiveql

ui7jx7zq 于 2021-06-28 发布在 Hive

关注(0)|答案(1)|浏览(437)

工作（身份证、职级）
数据：

work
------------------
1 | A
1 | B
1 | C
1 | D
2 | A
2 | C
2 | B
3 | C

我需要找到所有对ID有共同的排名与他们的计数，它应该显示只有当排名计数大于2，并按降序打印他们。我已经为此编写了一个mysql查询，但是我对sparksql和hiveql还不熟悉。所以请帮我怎么做。例如，使用上述数据，结果集应为：
mysql查询是：

select a.id,b.id
from work as a, work as b
where a.id>b.id
group by a.id,b.id having group_concat(distinct a.rank order by a.rank)=group_concat(distinct b.rank order by b.rank)

---------------------
id1 | id2 | Count
---------------------
 A  | B   |  3
 B  | C   |  3

sql Hive apache-spark apache-spark-sql hiveql

来源：https://stackoverflow.com/questions/40444326/mysql-query-to-hiveql

1条答案

按热度按时间

aij0ehis1#

我不认为Hive支持 group_concat() . 我认为这是一样的：

select a.id, b.id, a.cnt
from (select a.*, count(*) over (partition by a.id) as cnt
      from work a
     ) a join
     (select b.*, count(*) over (partition by b.id) as cnt
      from work b
     ) b
     on a.rank = b.rank and a.cnt = b.cnt
where a.id < b.id   -- I *think* this is allowed in Hive; it not, a subquery or expression in the `having` clause will do the same thing
group by a.id, b.id, a.cnt
having count(*) = a.cnt;

这是获得具有相同排名的ID对的更自然的方法。事实上，它在几乎任何数据库中都应该比mysql版本更有效。这个 cross join 生成大量数据。

赞(0）回复(0）举报 2021-06-28

我来回答

mysql查询到hiveql

1条答案

相关问题

热门标签

最新问答