将默认配置单元结果更改为某些值

56lgkhnf 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(334)

我试图从表中获取重复的记录计数，但对于特定的分区，数据不可用，所以hive只打印“ok”结果。是否可以使用0或null之类的值更改此结果。是的，用nvl，coalesce，case选项都试过了，还是没问题。目标是只检查重复计数，所以至少需要一个值

select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from  xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1

hadoop Hive apache-spark-sql

来源：https://stackoverflow.com/questions/43568451/change-default-hive-result-to-some-values

1条答案

按热度按时间

9jyewag01#

它不会返回空数据集上的任何行，因为您正在使用 group by 以及 having 过滤器。通过没有要分组的内容进行分组，这就是它不返回任何行的原因。不带group by和having query返回0:

select  nvl(count(*),0) cnt, 'xyz' AS TABLE_NAME
from  xyz
where data_dt='20170423'

作为一种解决方案，当数据集为空时，可以用空行合并所有数据

select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from  xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1

UNION ALL --returns 1 row on empty dataset

select col1, col2, DUPLICATE_ROW_COUNT, TABLE_NAME 
  from (select null col1, null col2, null AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
       )a --inner join will not return rows when non-empty dataset
      inner join (
select count(*) cnt from  --should will return 0 on empty dataset
( --your original query
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from  xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
)s --your original query
)s on s.cnt=0

也可以使用cte( WITH )以及 WHERE NOT EXISTS 而不是 inner join 对于您的子查询，没有测试它。
您还可以使用shell获取结果并在空值上进行测试：

dataset=$(hive -e "set hive.cli.print.header=false; [YOUR QUERY HERE]);

# test on empty dataset

if [[ -z "$dataset" ]] ; then 
  dataset=0
fi

赞(0）回复(0）举报 2021-05-29

我来回答

将默认配置单元结果更改为某些值

1条答案

相关问题

热门标签

最新问答