为什么在where子句中包含null时,记录却从左join中删除

xhv8bpkk  于 2021-06-25  发布在  Hive
关注(0)|答案(3)|浏览(284)

我运行以下查询:

create table c.hello as
select a.*, b.timestamp, b.alert 
from nice a
left join bye b 
  on a.number = b.number_nb 
where (Unix_Timestamp(a.time) - Unix_Timestamp(b.timestamp) >= 0)
  and (Unix_Timestamp(a.time) - Unix_Timestamp(b.timestamp) <= 86400) 
   or b.alert_timestamp is null;

为什么我的hello表返回的记录比nice表少?如何解决此问题,我希望保留表nice中的所有记录。我以为where子句中的or语句可以处理这个问题,但不知道为什么不能。我知道where子句将查询转换为内部连接,但是我认为where子句with is null应该可以解决这个问题。你能帮忙吗?

kyks70gy

kyks70gy1#

您可以通过将相关条件移到 ON 条款:

create table c.hello as
    select a.*, b.timestamp, b.alert 
    from nice a left join
         bye b 
         on a.number = b.number_nb and
            (Unix_Timestamp(a.time) - Unix_Timestamp(b.timestamp) >= 0) and
            (Unix_Timestamp(a.time) - Unix_Timestamp(b.timestamp) <= 86400) ;

代码失败的原因是 a 在中有匹配的行 b 这两个条件都不满足。因此,这两个时间限制没有得到满足。价值不是 null 因为有匹配的行。

lqfhib0f

lqfhib0f2#

有一些最小的可复制的例子总是有帮助的。让我们考虑以下示例:

with A as (
select stack (3,
1, '2019-11-10 10:10:10.000',
2, '2019-12-10 10:10:10.000',
4, '2019-25-10 10:10:10.000'
) as (number_nb,ts)
),

B as (
select stack (2,
1, '2019-11-10 10:10:10.000',
2, '2019-25-10 10:10:10.000' --big difference in timestamp here
) as (number_nb,ts)
)

select a_number_nb, a_ts, 
       b_number_nb, b_ts,
       ts_diff
from
(
select a.number_nb as a_number_nb,a.ts as a_ts, b.number_nb as b_number_nb, b.ts as b_ts, Unix_Timestamp(a.ts) - Unix_Timestamp(b.ts) as ts_diff  
  from A a 
       left join B b on a.number_nb=b.number_nb
)s;

退货:

a_number_nb a_ts                    b_number_nb b_ts                    ts_diff
1           2019-11-10 10:10:10.000 1           2019-11-10 10:10:10.000 0
2           2019-12-10 10:10:10.000 2           2019-25-10 10:10:10.000 -34300800
4           2019-25-10 10:10:10.000 NULL        NULL                    NULL

现在如果你加上where子句

where (ts_diff between 0 and 86400) or b_number_nb is NULL

甚至这个:

where (ts_diff between 0 and 86400) or ts_diff is NULL

结果是:

a_number_nb a_ts                    b_number_nb b_ts                    ts_diff
1           2019-11-10 10:10:10.000 1           2019-11-10 10:10:10.000 0
4           2019-25-10 10:10:10.000 NULL        NULL                    NULL

过滤掉第二行,因为不满足(0和86400之间的ts_差异)。
如果你真的想要这个结果:

a_number_nb a_ts                    b_number_nb b_ts
1           2019-11-10 10:10:10.000 1           2019-11-10 10:10:10.000
2           2019-12-10 10:10:10.000 NULL        NULL
4           2019-25-10 10:10:10.000 NULL        NULL

然后删除where条件,改用用例语句:

... --Use same CTEs as in first example, skipped

select a_number_nb, a_ts, 
       case when (ts_diff between 0 and 86400) then b_number_nb end b_number_nb , --NULL if ts_diff NOT between 0 and 86400
       case when (ts_diff between 0 and 86400) then b_ts        end b_ts
       --ts_diff
from
(
select a.number_nb as a_number_nb,a.ts as a_ts, b.number_nb as b_number_nb, b.ts as b_ts, Unix_Timestamp(a.ts) - Unix_Timestamp(b.ts) as ts_diff  
  from A a 
       left join B b on a.number_nb=b.number_nb
)s
6rvt4ljy

6rvt4ljy3#

因为你在 WHERE 子句,这就是筛选出行。
将该条件作为 JOIN predicate ,如:

create table c.hello as
select a.*, b.timestamp, b.alert 
from nice a
left join bye b 
  on a.number = b.number_nb 
 and (Unix_Timestamp(a.time) - Unix_Timestamp(b.timestamp) >= 0)
 and (Unix_Timestamp(a.time) - Unix_Timestamp(b.timestamp) <= 86400) 
  or b.alert_timestamp is null;

你看到区别了吗?

相关问题