hive查询删除带有“not column=value”where子句的空值

b5buobof  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(394)

表1数据示例:

year month day utmsource
2017 03    26  NULL
2017 03    27  NULL
2017 03    27  facebook
2017 03    27  newsletter
2017 03    27  banner
2017 03    27  facebook

预期选择:

year month day utmsource
2017 03    27  NULL
2017 03    27  newsletter
2017 03    27  banner

我的配置单元查询:

-- result = 0, it did not include the NULL utmsource record
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource="facebook"

-- result = 1 the NULL utmsource record is included
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND (utmsource IS NULL OR NOT utmsource="facebook")

-- also returns 0, the NULL utmsource record is not included
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';

问题:
有人能解释一下这种行为吗?
我可以更改一个设置来检索查询2的结果而不在我的查询中添加额外的或功能吗?=>not equals在结果中包含空值

whitzsjs

whitzsjs1#

你想要的是 NULL -安全等式(或不等式)运算符。在ansisql中,有一个名为 is distinct from . hive似乎使用mysql版本,即 <=> . 所以,你可以做:

SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';

文档中描述了此运算符。
我还应该指出,你可能会发现这是一个更简单的公式 SELECT :

SELECT (COUNT(*) - COUNT(utmsource)) as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';

尽管总体而言,这似乎是最简单的:

SELECT COUNT(*)as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND utmsource IS NULL;

与…的比较 'Facebook' 是不必要的。

相关问题