如何将一个分区的数据插入/复制到配置单元中的多个分区?

disbfnqx  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(466)

我有关于 day='2019-01-01' 在我的配置单元表中,我想将相同的数据复制到整个2019年1月(i、 e.英寸 '2019-01-02' , '2019-01-03' ... '2019-01-31' )
我正在尝试以下内容,但数据仅插入“2019-01-02”,而不是插入“2019-01-03”。

INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03')
SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01';
0s7z1bwu

0s7z1bwu1#

将所有数据与所需日期范围的日历日期交叉连接。使用动态分区:

set hivevar:start_date=2019-01-02; 
set hivevar:end_date=2019-01-31; 

set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict;  

with date_range as 
(--this query generates date range
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
)

INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
SELECT id, name, marks, r.dt --partition column is the last one
  FROM db_t.students s 
       CROSS JOIN date_range r
 WHERE s.dt='2019-01-01'
DISTRIBUTE BY r.dt;

另一种可能的解决方案是使用 hadoop fs -cp 或者 hadoop distcp (对shell中的每个分区或use循环重复此步骤):

hadoop fs -cp '/usr/warehouse/students/dt=2019-01-01' '/usr/warehouse/students/dt=2019-01-02'

还有一个使用union all的解决方案:

set hive.exec.dynamic.partition=true; 
    set hive.exec.dynamic.partition.mode=nonstrict;      

    INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
    SELECT id, name, marks, '2019-01-02' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
    UNION ALL
     SELECT id, name, marks, '2019-01-03' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
    UNION ALL
     SELECT id, name, marks, '2019-01-04' as dt FROM db_t.students s WHERE s.dt='2019-01-01' 
    UNION ALL
    ... 
  ;

相关问题