hive在最近的日期左连接

iezvtpos  于 2021-05-29  发布在  Hadoop
关注(0)|答案(4)|浏览(450)

我试图在连接时使用一个键和两个表中最近的日期连接配置单元中的两个表。例如:下面是两个输入表

  1. <----------TABLE A-------------> <------------TABLE B------------>
  2. A_id A_date changed_col B_id B_date B_value A_id
  3. ********** *********** ********** ************
  4. A01 2017-03-20 ABC B01 2017-04-02 200 A01
  5. A01 2017-04-01 XYZ B01 2017-04-04 500 A01
  6. A01 2017-04-05 LLL

但是,当我将表b与表a连接在一起时,它应该在表a中为同一个键(a\ id)查找最近的最低日期。以下是预期输出表:

  1. B_id B_date A_id A_date changed col B_value
  2. ********** **** ****** ******************
  3. B01 2017-02-04 A01 2017-01-04 XYZ 200
  4. B01 2017-04-04 A01 2017-01-04 XYZ 500

非常感谢您的帮助。谢谢

zysjyyx4

zysjyyx41#

  1. select B.B_id
  2. ,B.B_date
  3. ,B.A_id
  4. ,A.A_date
  5. ,A.changed_col
  6. ,B_value
  7. from B
  8. left join (select *
  9. from (select B.B_id
  10. ,A.A_date
  11. ,A.changed_col
  12. ,row_number () over
  13. (
  14. partition by B.B_id
  15. order by A.A_date desc
  16. ) as rn
  17. from B
  18. join A
  19. on A.A_id = B.A_id
  20. where A.A_date <= B.B_date
  21. ) A
  22. where rn = 1
  23. ) A
  24. on A.B_id =
  25. B.B_id
  1. +------+------------+------+------------+-------------+---------+
  2. | b_id | b_date | a_id | a_date | changed_col | b_value |
  3. +------+------------+------+------------+-------------+---------+
  4. | B01 | 2017-04-02 | A01 | 2017-04-01 | XYZ | 200 |
  5. | B01 | 2017-04-04 | A01 | 2017-04-01 | XYZ | 500 |
  6. +------+------------+------+------------+-------------+---------+
展开查看全部
ycl3bljg

ycl3bljg2#

  1. select B.B_id
  2. ,B.B_date
  3. ,B.A_id
  4. ,A.A_data.A_date as A_date
  5. ,A.A_data.changed_col as changed_col
  6. ,B_value
  7. from B
  8. left join (select B.B_id
  9. ,max (named_struct ('A_date',A_date,'changed_col',changed_col)) as A_data
  10. from B
  11. join A
  12. on A.A_id = B.A_id
  13. where A.A_date <= B.B_date
  14. group by B.B_id
  15. ) A
  16. on A.B_id =
  17. B.B_id
  1. +------+------------+------+------------+-------------+---------+
  2. | b_id | b_date | a_id | a_date | changed_col | b_value |
  3. +------+------------+------+------------+-------------+---------+
  4. | B01 | 2017-04-02 | A01 | 2017-04-01 | XYZ | 200 |
  5. | B01 | 2017-04-04 | A01 | 2017-04-01 | XYZ | 500 |
  6. +------+------------+------+------------+-------------+---------+
展开查看全部
lrpiutwd

lrpiutwd3#

另一种选择是:

  1. select B.B_id, B.B_date, B.A_id, A1.A_id, A1.A_date, A1.changed_col, B.B_value
  2. from
  3. TABLE_B as B,
  4. (
  5. select A_id, A_date, chaged_col
  6. from TABLE_A as A
  7. where A_date = (select min(A_date) from TABLE_A where A_id = A.A_id) ) as A1
  8. where B.A_id = A.A_id
gk7wooem

gk7wooem4#

  1. select B_id
  2. ,dt as B_date
  3. ,A_id
  4. ,A_data.A_date as A_date
  5. ,A_data.changed_col
  6. ,B_value
  7. from (select B_id,dt,B_value,A_id,tab
  8. ,max
  9. ( case
  10. when tab = 'A'
  11. then named_struct ('A_date',dt,'changed_col',changed_col)
  12. end
  13. ) over
  14. (
  15. partition by A_id
  16. order by dt,tab
  17. rows between unbounded preceding
  18. and current row
  19. ) as A_data
  20. from (select B_id,B_date as dt,B_value,A_id
  21. ,'B' as tab,null as changed_col
  22. from B
  23. union all
  24. select null as B_id,A_date as dt,null as B_value,A_id
  25. ,'A' as tab,changed_col
  26. from A
  27. ) t
  28. ) t
  29. where tab = 'B'
  1. +------+------------+------+------------+-------------+---------+
  2. | b_id | b_date | a_id | a_date | changed_col | b_value |
  3. +------+------------+------+------------+-------------+---------+
  4. | B01 | 2017-04-02 | A01 | 2017-04-01 | XYZ | 200 |
  5. | B01 | 2017-04-04 | A01 | 2017-04-01 | XYZ | 500 |
  6. +------+------------+------+------------+-------------+---------+
展开查看全部

相关问题