sql—所有属性上null和not null计数的分析值

yrdbyhpb 于 2021-07-13 发布在 Hive

关注(0)|答案(2)|浏览(405)

我有一个表，它在下表中有3个属性，使用hive/sql我有一个由50多个属性组成的用例，这些属性要求我对这个值nullcount，notnull等进行分析，根据当前的用例，我不能使用pyspark/plsql进行任何数据转换，这是我遇到的障碍
为了演示的目的，假设我有3个属性，对于马来西亚和泰国的2个表，它们只需要notnull计数

describe malaysia
ColumnName |  Datatype
custid     |  String
productid  |  String
addressid  |  String
describe thailand
ColumnName |  Datatype
custid     |  String
productid  |  String
addressid  |  String

按照我目前的方法，我就是这样做的，

select COUNT(CASE WHEN custid IS NOT NULL then 1 ELSE NULL END) as "NullCount",
COUNT(CASE WHEN productid IS NOT NULL then 1 ELSE NULL END) as "NullCount",
COUNT(CASE WHEN addressid IS NOT NULL then 1 ELSE NULL END) as "NullCount" 
from malaysia 
union
select COUNT(CASE WHEN custid IS NOT NULL then 1 ELSE NULL END) as "NullCount",
COUNT(CASE WHEN productid IS NOT NULL then 1 ELSE NULL END) as "NullCount",
COUNT(CASE WHEN addressid IS NOT NULL then 1 ELSE NULL END) as "NullCount" 
from thailand

因为我知道每个表有50多个属性，所以查询会很长，我觉得这不是正确的方法，因为维护会很混乱和困难。我想知道是否有更好的方法产生同样的结果。

sql Hive bigdata

来源：https://stackoverflow.com/questions/66978815/profiling-values-for-null-and-not-null-count-on-all-attributes

2条答案

按热度按时间

6yoyoihd1#

修正你的数据模型！您应该将所有数据放在一个表中，表中有一列 country . 然后你可以简单地做：

select country,
       count(custid) as custid_not_null,
       count(productid) as productid_not_null,
       count(addressid) as addressid_not_null
from all_countries
group by country ;

请注意 count(<expression>) 统计非- NULL 表达式的值。这个 case 表达完全是多余的。
如果没有，您可以创建一个视图：

create view all_counties as
    select 'malaysia' as country, t.*
    from malaysia m
    union all
    select 'thailand' as country, t.*
    from thailand t
    union all
    . . .;

但我建议你把所有的数据放在一张表里。

展开查看全部

赞(0）回复(0）举报 2021-07-13

ztigrdn82#

您需要稍微更改一下sql。用sum代替count。为了更好的分析，我增加了总数。我没有任何简单的方法，但你可以把列列表放在xl中，并从中创建一个公式，自动创建这个字符串 SUM(CASE WHEN custid IS NOT NULL then 1 ELSE 0 END) as "Nullcustid" .

select 
COUNT(*) "TotalCount",
SUM(CASE WHEN custid IS NOT NULL then 1 ELSE 0 END) as "NullCust",
SUM(CASE WHEN productid IS NOT NULL then 1 ELSE  0 END) as "Nullprod",
SUM(CASE WHEN addressid IS NOT NULL then 1 ELSE 0 END) as "Nulladdress" 
from malaysia 
union
select 
COUNT(*) "TotalCount",
SUM(CASE WHEN custid IS NOT NULL then 1 ELSE 0 END) as "NullCust",
SUM(CASE WHEN productid IS NOT NULL then 1 ELSE  0 END) as "Nullprod",
SUM(CASE WHEN addressid IS NOT NULL then 1 ELSE 0 END) as "Nulladdress" 
from thailand

赞(0）回复(0）举报 2021-07-13

我来回答

sql—所有属性上null和not null计数的分析值

2条答案

相关问题

热门标签

最新问答