使用Hive我需要找到第一个在一个特定的ID和最后一个出相同的同一会话,而不是最后一个出的一天

hujrc8aj  于 2023-10-18  发布在  Hive
关注(0)|答案(1)|浏览(142)

我有这样的数据:

  1. +------+----------+---------+---------+-------------------+----+----------+
  2. | ID | punchdate|punchtype|punchtime| punchdatetime| uuu| feed_date|
  3. +------+----------+---------+---------+-------------------+----+----------+
  4. | 1000|2023-08-08| IN| 05:40:00|2023-08-08 05:40:00| 002|2023-08-08|
  5. | 1000|2023-08-08| OUT| 06:24:00|2023-08-08 05:24:00| 002|2023-08-08|
  6. | 10008|2023-08-08| IN| 05:44:00|2023-08-08 05:44:00| 001|2023-08-08|
  7. | 10008|2023-08-08| OUT| 05:46:00|2023-08-08 05:46:00| 001|2023-08-08|
  8. | 10008|2023-08-08| IN| 05:54:00|2023-08-08 05:54:00| 001|2023-08-08|
  9. | 10008|2023-08-08| OUT| 06:00:00|2023-08-08 06:00:00| 001|2023-08-08|
  10. |000003|2023-08-08| IN| 05:44:00|2023-08-08 05:44:00| 001|2023-08-08|
  11. |000003|2023-08-08| OUT| 05:46:00|2023-08-08 05:46:00| 001|2023-08-08|
  12. |000003|2023-08-08| IN| 05:54:00|2023-08-08 05:54:00| 001|2023-08-08|
  13. +------+----------+---------+---------+-------------------+----+----------+
  14. Need to extract like this in hive
  15. +-------+----------+-----------------------+---------------------+----+----------+
  16. | id | punchdate| punchINtime | punchOUTtime | uuu| feed_date|
  17. +-------+----------+-----------------------+---------------------+----+----------+
  18. | 1000|2023-08-08| 2023-08-08 05:40:00 | 2023-08-08 06:24:00 | 002|2023-08-08|
  19. | 10008|2023-08-08| 2023-08-08 05:44:00 | 2023-08-08 05:46:00 | 001|2023-08-08|
  20. | 10008|2023-08-08| 2023-08-08 05:54:00 | 2023-08-08 06:00:00 | 001|2023-08-08|
  21. | 000003|2023-08-08| 2023-08-08 05:44:00 | 2023-08-08 05:46:00 | 001|2023-08-08|
  22. | 000003|2023-08-08| 2023-08-08 05:54:00 | current_timestamp | 001|2023-08-08|
  23. +-------+----------+-----------------------+---------------------+----+----------+

我已经尝试了几个查询使用最小,最大,领先和滞后。可以帮助我在如何写这个在Hive

rqenqsqc

rqenqsqc1#

我想这应该对你有用:

  1. SELECT
  2. id,
  3. punchdate,
  4. punchINtime,
  5. punchOUTtime,
  6. uuu,
  7. feed_date
  8. FROM (
  9. SELECT
  10. id,
  11. punchdate,
  12. MIN(CASE WHEN punchtype = 'IN' THEN punchdatetime END) AS punchINtime,
  13. MAX(CASE WHEN punchtype = 'OUT' THEN punchdatetime END) AS punchOUTtime,
  14. uuu,
  15. feed_date,
  16. ROW_NUMBER() OVER (PARTITION BY id, feed_date ORDER BY punchdatetime) AS rn
  17. FROM table
  18. GROUP BY id, punchdate, uuu, feed_date
  19. ) subquery
  20. WHERE punchINtime IS NOT NULL OR rn = 1;
展开查看全部

相关问题