hivesql动态地从表中获取空列计数

bqujaahr 于 2021-06-26 发布在 Hive

关注(0)|答案(2)|浏览(391)

我正在使用datastax+spark集成和spark sql thrift server，它为我提供了一个hive sql接口来查询cassandra中的表。
数据库中的表是动态创建的，我要做的是仅基于表名在表的每列中获取空值的计数。
我可以使用 describe database.table 但是在hivesql中，如何在另一个select查询中使用它的输出，该查询对所有列都计数为null。
更新一：用杜杜的解决方案回溯
运行查询时出错：texecutestatementresp（status=tstatus（errorcode=0，errormessage=“org.apache.spark.sql.analysisexception:explode/json\u tuple/udtf；”中的“*”用法无效，sqlstate=none，infomessages=[“org.apache.hive.service.cli.hivesqlexception:org.apache.spark.sql.analysisexception：explode/json_tuple/udtf；中“”的用法无效：16:15“，'org.apache.spark.sql.hive.thriftserver.sparkexecutestamentoperation:org$apache$spark$sql$hive$thriftserver$sparkexecutestatemento操作$$execute:sparkexecutestatementoperation.scala：258'，'org.apache.spark.sql.hive.thriftserver.sparkexecutestamentoperation:runinternal：sparkexecutestatementoperation。scala:152','org.apache.hive.service.cli.operation。operation:run：操作。java:257'，'org.apache.hive.service.cli.session.hivesessionimpl:executestatementinternal：hivesessionimpl。java:388'，'org.apache.hive.service.cli.session.hivesessionimpl:executestatement：hivesessionimpl。java:369','org.apache.hive.service.cli.cliservice:executestatement：cliservice。java:262'，'org.apache.hive.service.cli.thrift.thriftcliservice:executestatement：节俭服务。java:437'，'org.apache.hive.service.cli.thrift.tcliservice$processor$executestatement:getresult：t仪器。java:1313','org.apache.hive.service.cli.thrift.tcliservice$处理器$executestatement:getresult：t仪器。java:1298'，'org.apache.thrift.processfunction:process：processfunction。java:39'，'org.apache.thrift.tbaseprocessor:process：t基础处理器。java:39'，'org.apache.hive.service.auth.tsetipaddressprocessor:process：t提示处理器。java:56','org.apache.thrift.server.tthreadpoolserver$工作erprocess:run：t线程池服务器。java:286'，'java.util.concurrent.threadpoolexecutor:runworker：threadpoolexecutor。java:1142'，'java.util.concurrent.threadpoolexecutor$worker:run：threadpoolexecutor。java:617'，'java.lang。thread:run：线程。java:745']，状态代码=3），操作句柄=无）

Hive apache-spark-sql hiveql

来源：https://stackoverflow.com/questions/44585722/hive-sql-dynamically-get-null-column-counts-from-a-table

2条答案

按热度按时间

ldioqlga1#

在下面的解决方案中，不需要单独处理每一列。结果是一个列索引和该列中空值的数目。
稍后，您可以通过列索引将其连接到从元存储检索到的信息。
一个限制是包含精确文本的字符串 null 将被计为空。

演示

cte( mytable 定义如下： with mytable as )显然可以用实际表格代替

with        mytable as 
            (
                select  stack
                        (
                            5

                           ,1   ,1.2     ,date '2017-06-21'     ,null
                           ,2   ,2.3     ,null                  ,null
                           ,3   ,null    ,null                  ,'hello'
                           ,4   ,4.5     ,null                  ,'world'
                           ,5   ,null    ,date '2017-07-22'     ,null
                        ) as (id,amt,dt,txt)
            )

select      pe.pos                                          as col_index
           ,count(case when pe.val='null' then 1 end)       as nulls_count

from        mytable t lateral view posexplode (split(printf(concat('%s',repeat('\u0001%s',field(unhex(1),t.*,unhex(1))-2)),t.*),'\\x01')) pe

group by    pe.pos       
;

+-----------+-------------+
| col_index | nulls_count |
+-----------+-------------+
|         0 |           0 |
|         1 |           2 |
|         2 |           3 |
|         3 |           3 |
+-----------+-------------+

赞(0）回复(0）举报 2021-06-26

tyky79it2#

而不是描述 database.table ，您可以使用 Select column_name from system_schema.columns where keyspace_name='YOUR KEYSPACE' and table_name='YOUR TABLE' 还有一个名为 kind 上表中的值 partition_key,clustering,regular .
值为的列 partition_key 以及 clustering 不会有空值。
对于其他可以使用的列 select sum(CASE WHEN col1 is NULL THEN 1 ELSE 0 END) as col1_cnt,sum(CASE WHEN col2 is NULL THEN 1 ELSE 0 END) as col2_cnt from table1 where col1 is null; 你也可以试试下面的查询（我自己没有试过）

SELECT COUNT(*)-COUNT(col1) As A, COUNT(*)-COUNT(col2) As B, COUNT(*)-COUNT(col3) As C
FROM YourTable;

对于上述查询，您可以每次为count而不是count（*）创建变量。
注： system_schema.columns 是cassandra表，cassandra用户应该对此表具有读取权限

赞(0）回复(0）举报 2021-06-26

我来回答

hivesql动态地从表中获取空列计数

2条答案

演示

相关问题

热门标签

最新问答