如何重写这些查询以避免reduce阶段出现单个reducer?它需要很长时间,我失去了使用它的并行性的好处。
select id , count(distinct locations) AS unique_locations from mytable ;
和
select id , size(collect_set(locations)) AS unique_locations from mytable ;
wrrgggsh1#
对count(distinct var)使用两个查询:
SELECT count(1) FROM ( SELECT DISTINCT locations as unique_locations from my_table ) t;
我认为同样的情况也适用于尺寸集合:
SELECT size(unique_locations) FROM ( SELECT collect_set(locations) as unique_locations from my_table ) t;
1条答案
按热度按时间wrrgggsh1#
对count(distinct var)使用两个查询:
我认为同样的情况也适用于尺寸集合: