配置单元查询中的sql临时连接(时间上非常接近的事件)

y53ybaqx  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(393)

我需要一个我很难理解的Hive查询。
我有一个像这样的时间序列:

time                          source    word1   word2    ...etc
2012-02-01 23:43:16.9988243    0001      2B3B    FAF0
2012-02-01 23:43:16.9993561    0002      2326    ABAA
2012-02-01 23:43:16.9998879    0002      2327    ABAA

我需要一个这样的查询,如果 source 满足特定条件时,除了该记录外,它还应该从第二个记录开始及时返回一个或多个非常接近的记录 source 满足不同的条件。
到目前为止,我的尝试是这样的:

SELECT time
    FROM messages C
    JOIN messages D on
       D.time
           BETWEEN C.time - INTERVAL '0.001' SECOND
           AND     C.time + INTERVAL '0.001' SECOND

    WHERE C.source = '0001'
    AND   D.Source = '0002'
    AND   C.word1 = '2B3B'
    AND   D.word2 = 'ABAA'

它应该返回上面示例数据中的第一个和第二个记录(不应该返回第三个,因为时间比.001秒远)。
但查询不起作用。错误消息是

FAILED: SemanticException '0.001' encountered with 0 children
bis0qfac

bis0qfac1#

这将是一个幼稚的解决方案:

select  *

from                    messages c
        cross join      messages m 

where   m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'

;
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
|            time            | source | word1 | word2 |            time            | source | word1 | word2 |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
| 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |   0002 |  2326 | ABAA  |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+

这是性能良好的解决方案

select  *

from                    messages c

        join            messages m

        on              floor (cast(c.time as decimal(37,7)) / (2 * 0.001))   =
                        floor (cast(m.time as decimal(37,7)) / (2 * 0.001))

where   m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'

union all

select  *

from                    messages c

        join            messages m

        on              floor ((cast(c.time as decimal(37,7)) + 0.001) / (2 * 0.001))   =
                        floor ((cast(m.time as decimal(37,7)) + 0.001) / (2 * 0.001))

where   floor (cast(c.time as decimal(37,7)) / (2 * 0.001))     <>
        floor (cast(m.time as decimal(37,7)) / (2 * 0.001))

    and m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
|            time            | source | word1 | word2 |           _col4            | _col5 | _col6 | _col7 |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
| 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |  0002 |  2326 | ABAA  |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+

插图

事件a和b将被工会的上层所有人抓住。
事件b和c将被工会的下半部分抓住。

0        0.002    0.004    0.006    0.008    0.01      
    |        |        |        |        |        |
-------------------------------------------------------
                      |        |
                      |        |
                          A  B  C
                           |        |
                           |        |
-------------------------------------------------------
         |        |        |        |        |                
         0.001    0.003    0.005    0.007    0.009

相关问题