如何在hive中转换前分发？

ghg1uchk 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(437)

在hive中，我想按一列分布表，并使用python对每个分布的部分进行转换。

例如：

我想对具有特定列d编号的记录执行以下操作：

from
    (select *
    from raw_table
    where D=12345
    sort by A)
    sb
insert overwrite table u_12345
partition (X,Y)
select transform(cast(A as double),B,C,D,E,F,X,Y)
using 'hello.py'
as A,B,C,D,E,F,X,Y
;

现在我想对所有不同的d列数字进行计算，我编写了如下代码：

from raw_table
insert overwrite table clean_data
partition (X,Y)
select transform(cast(A as double),B,C,D,E,F,X,Y)
using 'hello.py'
as A,B,C,D,E,F,X,Y
distribute by D
;

但这不是我想要的方式。

hadoop Hive

来源：https://stackoverflow.com/questions/25758590/how-to-distribute-before-transform-in-hive

1条答案

按热度按时间

klsxnrf11#

可以使用分布子查询：
我没有测试过这个：

From (select A,B,C,D,E,F,X,Y from raw_table distribute by D)
insert overwrite table clean_data
partition (X,Y)
select transform(cast(A as double),B,C,D,E,F,X,Y)
using 'hello.py'
as A,B,C,D,E,F,X,Y ;

使用我的群集：

create table clean-data as 
select 
transform (key, B,C,D,E,F,G) 
USING 'reducer_script.py' as (key, B,C,D,E,F,G_reduced)
from (key, B,C,D,E,F,G from raw_table distribute by KEY sort by KEY, D ) alias ;

赞(0）回复(0）举报 2021-05-30

我来回答

如何在hive中转换前分发？

例如：

1条答案

相关问题

热门标签

最新问答