sql—从配置单元中的多个表中选择增量数据

jgzswidk  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(370)

我在hive数据库中有五个表(a,b,c,d,e),我必须根据“id”列上的逻辑合并这些表中的数据。
条件是:

  1. Select * from A
  2. UNION
  3. select * from B (except ids not in A)
  4. UNION
  5. select * from C (except ids not in A and B)
  6. UNION
  7. select * from D(except ids not in A,B and C)
  8. UNION
  9. select * from E(except ids not in A,B,C and D)

必须将此数据插入最终表中。
一种方法是在目标表(target)中创建一个表,并为其附加每个联合阶段的数据,然后使用此表与另一个联合阶段进行连接。
这将是我的.hql文件的一部分:

  1. insert into target
  2. (select * from A
  3. UNION
  4. select B.* from
  5. A
  6. RIGHT OUTER JOIN B
  7. on A.id=B.id
  8. where ISNULL(A.id));
  9. INSERT INTO target
  10. select C.* from
  11. target
  12. RIGHT outer JOIN C
  13. ON target.id=C.id
  14. where ISNULL(target.id);
  15. INSERT INTO target
  16. select D.* from
  17. target
  18. RIGHT OUTER JOIN D
  19. ON target.id=D.id
  20. where ISNULL(target.id);
  21. INSERT INTO target
  22. select E.* from
  23. target
  24. RIGHT OUTER JOIN E
  25. ON target.id=E.id
  26. where ISNULL(target.id);

有没有更好的方法来实现这一点?我假设我们无论如何都要做多重连接/查找。我期待着在将来找到实现这一点的最佳方法
1) 泰兹的Hive
2) Sparksql
非常感谢

9udxz4iz

9udxz4iz1#

如果 id 在每个表中是唯一的 row_number 可以用来代替 rank .

  1. select *
  2. from (select *
  3. ,rank () over
  4. (
  5. partition by id
  6. order by src
  7. ) as rnk
  8. from (
  9. select 1 as src,* from a
  10. union all select 2 as src,* from b
  11. union all select 3 as src,* from c
  12. union all select 4 as src,* from d
  13. union all select 5 as src,* from e
  14. ) t
  15. ) t
  16. where rnk = 1
  17. ;
展开查看全部
mw3dktmi

mw3dktmi2#

我想我应该这样做:

  1. with ids as (
  2. select id, min(which) as which
  3. from (select id, 1 as which from a union all
  4. select id, 2 as which from b union all
  5. select id, 3 as which from c union all
  6. select id, 4 as which from d union all
  7. select id, 5 as which from e
  8. ) x
  9. )
  10. select a.*
  11. from a join ids on a.id = ids.id and ids.which = 1
  12. union all
  13. select b.*
  14. from b join ids on b.id = ids.id and ids.which = 2
  15. union all
  16. select c.*
  17. from c join ids on c.id = ids.id and ids.which = 3
  18. union all
  19. select d.*
  20. from d join ids on d.id = ids.id and ids.which = 4
  21. union all
  22. select e.*
  23. from e join ids on e.id = ids.id and ids.which = 5;
展开查看全部

相关问题