按字段值拆分子表上的配置单元表

lymnna71 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(227)

我有一张Hive桌 foo . 此表中有几个字段。其中之一就是 some_id . 此字段中唯一值的数目，范围为5000-10000。对于每个值（例如 10385 )我需要表演 CTAS 像这样的查询

CREATE TABLE bar_10385 AS 
SELECT * FROM foo WHERE some_id=10385 AND other_id=10385;

执行这一系列查询的最佳方法是什么？

Hive hiveql batch-processing

来源：https://stackoverflow.com/questions/52077930/split-hive-table-on-subtables-by-field-value

1条答案

按热度按时间

nue99wik1#

您可以将所有这些表存储在单个分区的表中。这种方法允许您在单个查询中加载所有数据。查询性能不会受到影响。

Create table T (
... --columns here
) 
partitioned by (id int); --new calculated partition key

使用一个查询加载数据，它将只读取源表一次：

insert overwrite table T partition(id)
select ..., --columns
       case when some_id=10385 AND other_id=10385 then 10385 
            when some_id=10386 AND other_id=10386 then 10386
            ...
            --and so on
            else 0 --default partition for records not attributed
        end as id --partition column
   from foo
  where some_id in (10385,10386) AND other_id in (10385,10386) --filter

然后可以在指定分区的查询中使用此表：

select from T where id = 10385; --you can create a view named bar_10385, it will act the same as your table. Partition pruning works fast

赞(0）回复(0）举报 2021-06-26

我来回答

按字段值拆分子表上的配置单元表

1条答案

相关问题

热门标签

最新问答