我有如下记录
1000, 1001, 1002 to 1999, 2000, 2001, 2002 to 2999, 3000, 3001, 3002 to 3999
1000, 1001, 1002 to 1999,
2000, 2001, 2002 to 2999,
3000, 3001, 3002 to 3999
我想用这样一种方式使用配置单元处理下面的记录集:reducer-1将处理数据1000到1999,reducer-2将处理数据2000到2999,reducer-3将处理数据3000到3999。请帮助我解决上述问题。
iecba09b1#
使用 DISTRIBUTE BY ,Map器输出将根据要传输到还原器进行处理的distribute by子句进行分组:
DISTRIBUTE BY
select ... from ...distribute by case when col between 1000 and 1999 then 1 when col between 2000 and 2999 then 2 when col between 3000 and 3999 then 3 end
select ...
from ...
distribute by case when col between 1000 and 1999 then 1
when col between 2000 and 2999 then 2
when col between 3000 and 3999 then 3
end
或者只是 distribute by floor(col/1000)
distribute by floor(col/1000)
1条答案
按热度按时间iecba09b1#
使用
DISTRIBUTE BY
,Map器输出将根据要传输到还原器进行处理的distribute by子句进行分组:或者只是
distribute by floor(col/1000)