sql—从配置单元中的多个表中选择增量数据

jgzswidk 于 2021-05-29 发布在 Hadoop

关注(0)|答案(2)|浏览(373)

我在hive数据库中有五个表（a，b，c，d，e），我必须根据“id”列上的逻辑合并这些表中的数据。
条件是：

Select * from A
UNION 
select * from B (except  ids not in A)
UNION 
select * from C (except ids not in A and B)
UNION 
select * from D(except ids not in A,B and C)
UNION 
select * from E(except ids not in A,B,C and D)

必须将此数据插入最终表中。
一种方法是在目标表（target）中创建一个表，并为其附加每个联合阶段的数据，然后使用此表与另一个联合阶段进行连接。
这将是我的.hql文件的一部分：

insert into target 
(select * from A
UNION 
select B.* from 
A 
RIGHT OUTER JOIN B
on A.id=B.id
where ISNULL(A.id));
INSERT INTO target
select C.* from 
target 
RIGHT outer JOIN C
ON target.id=C.id
where ISNULL(target.id);
INSERT INTO target
select D.* from 
target 
RIGHT OUTER JOIN D
ON target.id=D.id
where ISNULL(target.id);
INSERT INTO target
select E.* from 
target 
RIGHT OUTER JOIN E
ON target.id=E.id
where ISNULL(target.id);

有没有更好的方法来实现这一点？我假设我们无论如何都要做多重连接/查找。我期待着在将来找到实现这一点的最佳方法
1）泰兹的Hive
2） Sparksql
非常感谢

sql hadoop Hive apache-spark-sql apache

来源：https://stackoverflow.com/questions/45326914/selecting-incremental-data-from-multiple-tables-in-hive

2条答案

按热度按时间

9udxz4iz1#

如果 id 在每个表中是唯一的 row_number 可以用来代替 rank .

select      *
from       (select      *
                       ,rank () over
                        (
                            partition by    id
                            order by        src
                        )                           as rnk
            from        (           
                                    select 1 as src,* from a
                        union all   select 2 as src,* from b
                        union all   select 3 as src,* from c
                        union all   select 4 as src,* from d
                        union all   select 5 as src,* from e
                        ) t
            ) t
where       rnk = 1
;

展开查看全部

赞(0）回复(0）举报 2021-05-29

mw3dktmi2#

我想我应该这样做：

with ids as (
      select id, min(which) as which
      from (select id, 1 as which from a union all
            select id, 2 as which from b union all
            select id, 3 as which from c union all
            select id, 4 as which from d union all
            select id, 5 as which from e
           ) x
     )
select a.*
from a join ids on a.id = ids.id and ids.which = 1
union all
select b.*
from b join ids on b.id = ids.id and ids.which = 2
union all
select c.*
from c join ids on c.id = ids.id and ids.which = 3
union all
select d.*
from d join ids on d.id = ids.id and ids.which = 4
union all
select e.*
from e join ids on e.id = ids.id and ids.which = 5;

展开查看全部

赞(0）回复(0）举报 2021-05-29

我来回答

sql—从配置单元中的多个表中选择增量数据

2条答案

相关问题

热门标签

最新问答