在hive/impala中运行多个sql查询以测试通过或失败

lqfhib0f  于 2021-06-25  发布在  Hive
关注(0)|答案(1)|浏览(566)

我正在运行100个查询(测试用例)来检查hive/impala中的数据质量。大多数查询根据某些条件检查空值。我使用条件聚合来计算下面这样的琐碎测试用例。我想向这种类型的检查添加一个更复杂的查询条件。我还想看看计数是否有空。
我想知道如何合并更复杂的查询,并在存在空值时添加计数。预期产量低于。
到目前为止我所拥有的:

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS' ELSE 'FAIL' END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS' ELSE 'FAIL' END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS' ELSE 'FAIL' END) as car_sale_test       
FROM car_data;

要添加的更复杂类型查询:

SELECT Count(*), 
       car_job 
FROM   car_data 
WHERE  car_job NOT IN ( "car_type", "car_license", "car_cancellation", 
                        "car_color", "car_contract", "car_metal", "car_number" ) 
        OR car_job IS NULL 
GROUP  BY car_job

预期输出示例:

car_type_test  car_color_test  car_sale_test  car_job_test
PASS           PASS             PASS           FAIL
                                               102
hvvq6cgz

hvvq6cgz1#

我建议把它放在一行而不是两行:

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_type))
        END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_color))
        END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_sale))
        END) as car_sale_test       
FROM car_data;

相关问题