将默认配置单元结果更改为某些值

56lgkhnf  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(334)

我试图从表中获取重复的记录计数,但对于特定的分区,数据不可用,所以hive只打印“ok”结果。是否可以使用0或null之类的值更改此结果。是的,用nvl,coalesce,case选项都试过了,还是没问题。目标是只检查重复计数,所以至少需要一个值

select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from  xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
9jyewag0

9jyewag01#

它不会返回空数据集上的任何行,因为您正在使用 group by 以及 having 过滤器。通过没有要分组的内容进行分组,这就是它不返回任何行的原因。不带group by和having query返回0:

select  nvl(count(*),0) cnt, 'xyz' AS TABLE_NAME
from  xyz
where data_dt='20170423'

作为一种解决方案,当数据集为空时,可以用空行合并所有数据

select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from  xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1

UNION ALL --returns 1 row on empty dataset

select col1, col2, DUPLICATE_ROW_COUNT, TABLE_NAME 
  from (select null col1, null col2, null AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
       )a --inner join will not return rows when non-empty dataset
      inner join (
select count(*) cnt from  --should will return 0 on empty dataset
( --your original query
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from  xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
)s --your original query
)s on s.cnt=0

也可以使用cte( WITH )以及 WHERE NOT EXISTS 而不是 inner join 对于您的子查询,没有测试它。
您还可以使用shell获取结果并在空值上进行测试:

dataset=$(hive -e "set hive.cli.print.header=false; [YOUR QUERY HERE]);

# test on empty dataset

if [[ -z "$dataset" ]] ; then 
  dataset=0
fi

相关问题