我有一个由以下列组成的数据集:型号、里程、制造商、发动机排量、发动机功率、车身类型、颜色、skt_year、变速器、车门计数、座位计数、燃油类型、创建日期、查看日期、价格为了查看每个属性中有多少缺失值,并显示每列中有多少缺失值,我们有超过50%的缺失值。Hive里怎么会达到这种程度?
bksxznpy1#
你可以在配置单元中创建一个SQL语句来计算一列中有多少个空值。2不幸的是你需要分别计算每一列。
select tot_count,null_model count_null_model,100*null_model/tot_count percent_null_model,null_mileage count_null_mileage,100*null_mileage/tot_count percent_null_mileage,...from (select count(*) tot_count,sum( if(mileage is null,1,0) as null_mileage,sum( if(model is null,1,0) as null_model,...from my_table)rs
select
tot_count,
null_model count_null_model,
100*null_model/tot_count percent_null_model,
null_mileage count_null_mileage,
100*null_mileage/tot_count percent_null_mileage,
...
from
(select count(*) tot_count,
sum( if(mileage is null,1,0) as null_mileage,
sum( if(model is null,1,0) as null_model,
from my_table)rs
这里sum( if(mileage is null,1,0) as null_mileage-计算表格中有多少空值。外部100*null_mileage/tot_count percent_null_mileage-正在计算空值的百分比。如果您不知道,可以在此处放置筛选器,如100*null_mileage/tot_count >50
sum( if(mileage is null,1,0) as null_mileage
100*null_mileage/tot_count percent_null_mileage
100*null_mileage/tot_count >50
1条答案
按热度按时间bksxznpy1#
你可以在配置单元中创建一个SQL语句来计算一列中有多少个空值。2不幸的是你需要分别计算每一列。
这里
sum( if(mileage is null,1,0) as null_mileage
-计算表格中有多少空值。外部
100*null_mileage/tot_count percent_null_mileage
-正在计算空值的百分比。如果您不知道,可以在此处放置筛选器,如100*null_mileage/tot_count >50