hive，分区表的bucketing

nfzehxib 于 2021-05-29 发布在 Hadoop

关注(0)|答案(3)|浏览(844)

这是我的剧本：

--table without partition

drop table if exists ufodata;
create table ufodata ( sighted string, reported string, city string, shape string, duration string, description string )
row format delimited
fields terminated by '\t'
Location '/mapreduce/hive/ufo';

--load my data in ufodata

load data local inpath '/home/training/downloads/ufo_awesome.tsv' into table ufodata;

--create partition table
drop table if exists partufo;
create table partufo ( sighted string, reported string, city string, shape string, duration string, description string )
partitioned by ( year string )
clustered by (year) into 6 buckets
row format delimited
fields terminated by '/t';

--by default dynamic partition is not set
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--by default bucketing is false
set hive.enforcebucketing=true;

--loading mydata
insert overwrite table partufo
partition (year)
select sighted, reported, city, shape, min, description, SUBSTR(TRIM(sighted), 1,4) from ufodata;

错误消息：
失败：语义分析出错：列引用无效
我试着为我的分区table做装饰。如果我删除“clustereby（year）into 6 bucket”，脚本就可以正常工作。如何将分区表存储在桶中

hadoop Hive Bucket

来源：https://stackoverflow.com/questions/33140270/hive-bucketing-for-the-partitioned-table

3条答案

按热度按时间

gcmastyq1#

在执行动态分区时，创建一个包含所有列（包括分区列）的临时表，并将数据加载到临时表中。
使用分区列创建实际的分区表。从临时表加载数据时，分区列应位于select子句的最后一个。

赞(0）回复(0）举报 2021-05-30

3gtaxfhh2#

您可以使用下面的语法创建带有分区的bucketing表。

CREATE TABLE bckt_movies
(mov_id BIGINT , mov_name STRING ,prod_studio STRING, col_world DOUBLE , col_us_canada DOUBLE , col_uk DOUBLE , col_aus DOUBLE)
PARTITIONED BY (rel_year STRING)
CLUSTERED BY(mov_id) INTO 6 BUCKETS;

赞(0）回复(0）举报 2021-05-30

2mbi3lxu3#

有一件重要的事情，我们应该考虑，而在Hive做扣。
同一列名不能同时用于bucketing和partitioning。原因如下：
聚类和排序发生在一个分区内。在每个分区内，只有一个值与分区列相关联（在您的情况下是year），因此不会对集群和排序产生任何影响。这就是你犯错误的原因。。。。

赞(0）回复(0）举报 2021-05-29