sql—所有属性上null和not null计数的分析值

yrdbyhpb  于 2021-07-13  发布在  Hive
关注(0)|答案(2)|浏览(405)

我有一个表,它在下表中有3个属性,使用hive/sql我有一个由50多个属性组成的用例,这些属性要求我对这个值nullcount,notnull等进行分析,根据当前的用例,我不能使用pyspark/plsql进行任何数据转换,这是我遇到的障碍
为了演示的目的,假设我有3个属性,对于马来西亚和泰国的2个表,它们只需要notnull计数

  1. describe malaysia
  2. ColumnName | Datatype
  3. custid | String
  4. productid | String
  5. addressid | String
  6. describe thailand
  7. ColumnName | Datatype
  8. custid | String
  9. productid | String
  10. addressid | String

按照我目前的方法,我就是这样做的,

  1. select COUNT(CASE WHEN custid IS NOT NULL then 1 ELSE NULL END) as "NullCount",
  2. COUNT(CASE WHEN productid IS NOT NULL then 1 ELSE NULL END) as "NullCount",
  3. COUNT(CASE WHEN addressid IS NOT NULL then 1 ELSE NULL END) as "NullCount"
  4. from malaysia
  5. union
  6. select COUNT(CASE WHEN custid IS NOT NULL then 1 ELSE NULL END) as "NullCount",
  7. COUNT(CASE WHEN productid IS NOT NULL then 1 ELSE NULL END) as "NullCount",
  8. COUNT(CASE WHEN addressid IS NOT NULL then 1 ELSE NULL END) as "NullCount"
  9. from thailand

因为我知道每个表有50多个属性,所以查询会很长,我觉得这不是正确的方法,因为维护会很混乱和困难。我想知道是否有更好的方法产生同样的结果。

6yoyoihd

6yoyoihd1#

修正你的数据模型!您应该将所有数据放在一个表中,表中有一列 country . 然后你可以简单地做:

  1. select country,
  2. count(custid) as custid_not_null,
  3. count(productid) as productid_not_null,
  4. count(addressid) as addressid_not_null
  5. from all_countries
  6. group by country ;

请注意 count(<expression>) 统计非- NULL 表达式的值。这个 case 表达完全是多余的。
如果没有,您可以创建一个视图:

  1. create view all_counties as
  2. select 'malaysia' as country, t.*
  3. from malaysia m
  4. union all
  5. select 'thailand' as country, t.*
  6. from thailand t
  7. union all
  8. . . .;

但我建议你把所有的数据放在一张表里。

展开查看全部
ztigrdn8

ztigrdn82#

您需要稍微更改一下sql。用sum代替count。为了更好的分析,我增加了总数。我没有任何简单的方法,但你可以把列列表放在xl中,并从中创建一个公式,自动创建这个字符串 SUM(CASE WHEN custid IS NOT NULL then 1 ELSE 0 END) as "Nullcustid" .

  1. select
  2. COUNT(*) "TotalCount",
  3. SUM(CASE WHEN custid IS NOT NULL then 1 ELSE 0 END) as "NullCust",
  4. SUM(CASE WHEN productid IS NOT NULL then 1 ELSE 0 END) as "Nullprod",
  5. SUM(CASE WHEN addressid IS NOT NULL then 1 ELSE 0 END) as "Nulladdress"
  6. from malaysia
  7. union
  8. select
  9. COUNT(*) "TotalCount",
  10. SUM(CASE WHEN custid IS NOT NULL then 1 ELSE 0 END) as "NullCust",
  11. SUM(CASE WHEN productid IS NOT NULL then 1 ELSE 0 END) as "Nullprod",
  12. SUM(CASE WHEN addressid IS NOT NULL then 1 ELSE 0 END) as "Nulladdress"
  13. from thailand

相关问题