如何编写查询以避免在select distinct和size collect\U set配置单元查询中使用单个缩减器?

slmsl1lt  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(442)

如何重写这些查询以避免reduce阶段出现单个reducer?它需要很长时间,我失去了使用它的并行性的好处。

select id
, count(distinct locations) AS unique_locations
  from
  mytable
;

select id
, size(collect_set(locations)) AS unique_locations
  from
  mytable
;
wrrgggsh

wrrgggsh1#

对count(distinct var)使用两个查询:

SELECT
 count(1)
FROM (
 SELECT DISTINCT locations as unique_locations 
 from my_table
 ) t;

我认为同样的情况也适用于尺寸集合:

SELECT
  size(unique_locations)
FROM (
 SELECT collect_set(locations) as unique_locations 
 from my_table
 ) t;

相关问题