impala/hive以获取表列表及其大小

zte4gxcn  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(758)

我在oracledb中使用了一个查询来生成数据库中的表列表及其所有者和相应的表大小。这是我共享的示例查询。

select owner, table_name, round((num_rows*avg_row_len)/(1024*1024)) MB 
from all_tables 
where owner not like 'SYS%'  -- Exclude system tables.
and num_rows > 0  -- Ignore empty Tables.
order by MB desc -- Biggest first.

我想要 Impala /Hive的类似输出。
注意:我试过了 show table stats <table_name> 它将显示单个表的统计信息。但是我想一次得到所有的数据。有人能帮我吗。

oxcyiej7

oxcyiej71#

Hive

客户端

show table extended like '.*'
tableName:t100k
owner:cloudera
location:file:/home/cloudera/local/t100k
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 i}
partitioned:false
partitionColumns:
totalNumberFiles:1
totalFileSize:588895
maxFileSize:588895
minFileSize:588895
lastAccessTime:0
lastUpdateTime:1492675975000

tableName:t10k
owner:cloudera
location:file:/home/cloudera/local/t10k
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 i}
partitioned:false
partitionColumns:
totalNumberFiles:1
totalFileSize:48894
maxFileSize:48894
minFileSize:48894
lastAccessTime:0
lastUpdateTime:1492675978000

tableName:t1k
owner:cloudera
location:file:/home/cloudera/local/t1k
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 i}
partitioned:false
partitionColumns:
totalNumberFiles:1
totalFileSize:3893
maxFileSize:3893
minFileSize:3893
lastAccessTime:0
lastUpdateTime:1492675983000

tableName:t1m
owner:cloudera
location:file:/home/cloudera/local/t1m
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 i}
partitioned:false
partitionColumns:
totalNumberFiles:1
totalFileSize:6888896
maxFileSize:6888896
minFileSize:6888896
lastAccessTime:0
lastUpdateTime:1492675968000

元存储(例如mysql)

select    d.name                                                                                         as db_name
         ,t.tbl_name                                                                                     as tbl_name
         ,from_unixtime(min(t.create_time))                                                              as create_time
         ,min(t.owner)                                                                                   as owner
         ,min(case when tp.param_key = 'COLUMN_STATS_ACCURATE'  then tp.param_value                 end) as COLUMN_STATS_ACCURATE
         ,min(case when tp.param_key = 'last_modified_by'       then tp.param_value                 end) as last_modified_by
         ,min(case when tp.param_key = 'last_modified_time'     then from_unixtime(tp.param_value)  end) as last_modified_time  
         ,min(case when tp.param_key = 'numFiles'               then tp.param_value                 end) as numFiles
         ,min(case when tp.param_key = 'numRows'                then tp.param_value                 end) as numRows
         ,min(case when tp.param_key = 'rawDataSize'            then tp.param_value                 end) as rawDataSize
         ,min(case when tp.param_key = 'totalSize'              then tp.param_value                 end) as totalSize
         ,min(case when tp.param_key = 'transient_lastDdlTime'  then from_unixtime(tp.param_value)  end) as transient_lastDdlTime

from            metastore.DBS           as d
          join  metastore.TBLS          as t
          on    t.db_id = d.db_id
          join  metastore.TABLE_PARAMS  as tp
          on    tp.tbl_id = t.tbl_id

group by  d.name
         ,t.tbl_name

order by  d.name
         ,t.tbl_name
+---------+----------+---------------------+----------+-----------------------+------------------+---------------------+----------+---------+-------------+-----------+-----------------------+
| db_name | tbl_name |     create_time     |  owner   | COLUMN_STATS_ACCURATE | last_modified_by | last_modified_time  | numFiles | numRows | rawDataSize | totalSize | transient_lastDdlTime |
+---------+----------+---------------------+----------+-----------------------+------------------+---------------------+----------+---------+-------------+-----------+-----------------------+
| local   | t100k    | 2017-04-19 23:25:20 | cloudera | true                  | cloudera         | 2017-04-19 23:27:28 |        1 |  100000 |      488895 |    588895 | 2017-04-20 01:12:55   |
| local   | t10k     | 2017-04-19 23:25:26 | cloudera | true                  | cloudera         | 2017-04-19 23:27:26 |        1 |   10000 |       38894 |     48894 | 2017-04-20 01:12:58   |
| local   | t1k      | 2017-04-19 23:25:30 | cloudera | true                  | cloudera         | 2017-04-19 23:27:22 |        1 |    1000 |        2893 |      3893 | 2017-04-20 01:13:03   |
| local   | t1m      | 2017-04-19 23:20:59 | cloudera | true                  | cloudera         | 2017-04-19 23:27:30 |        1 | 1000000 |     5888896 |   6888896 | 2017-04-20 01:12:48   |
+---------+----------+---------------------+----------+-----------------------+------------------+---------------------+----------+---------+-------------+-----------+-----------------------+

相关问题