我们可以在hive中创建一个同时具有分区和bucketing的表吗?

a6b3iqyw  于 2021-05-29  发布在  Hadoop
关注(0)|答案(3)|浏览(463)

我们能不能在Hive里做一个既有分区又有bucketing的表?

93ze6v8z

93ze6v8z1#

你可以!!在这种情况下,您将在分区数据中拥有存储桶!

vsmadaxz

vsmadaxz2#

对。
分区就是将数据划分到hdfs上的多个目录中。每个目录都是一个分区。例如,如果表定义如下

CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)
COMMENT 'A bucketed copy of user_info'
PARTITIONED BY(ds STRING)
CLUSTERED BY(user_id) INTO 256 BUCKETS;

然后在hdfs上会有如下目录

/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/
/user/hive/warehouse/user_info_bucketed/ds=2011-01-13/

bucketing是关于如何在分区中分布数据的,因此在hdfs上会有类似的文件

/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_0
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_1
...
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_255
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_0
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_1
...
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_255

参考文献:https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl+bucketedtableshttp://www.hadooptpoint.com/hive-buckets-optimization-techniques/

inn6fuwd

inn6fuwd3#

是的,这是直截了当的。
请尝试以下操作:

CREATE TABLE IF NOT EXISTS employee_partition_bucket
( 
employeeID Int,
firstName String,
designation String,
salary Int
) 
PARTITIONED BY (department string)
CLUSTERED BY (designation) INTO 2 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';

在本例中,我创建了按部门划分的分区和按名称划分的桶
希望这对你有帮助

相关问题