如果是avro、orc或parquet表,我可以使用各自的库来获取模式。但是,如果输入/输出格式是txt,并且数据存储在csv文件中,如何通过编程获得模式?谢谢,
eh57zj3b1#
你可以用 DESCRIBE 语句,它显示有关表的元数据,如列名及其数据类型。这个 DESCRIBE FORMATTED 以apache hive用户熟悉的格式显示其他信息。例子:我创建了一个如下表。
DESCRIBE
DESCRIBE FORMATTED
CREATE TABLE IF NOT EXISTS Employee_Local( EmployeeId INT,Name STRING, Designation STRING,State STRING, Number STRING)ROW Format Delimited Fields Terminated by ',' STORED AS Textfile;
CREATE TABLE IF NOT EXISTS Employee_Local( EmployeeId INT,Name STRING,
Designation STRING,State STRING, Number STRING)
ROW Format Delimited Fields Terminated by ',' STORED AS Textfile;
描述语句您可以使用缩写desc作为describe语句。
hive> DESCRIBE Employee_Local;OKemployeeid int name string designation string state string number string
hive> DESCRIBE Employee_Local;
OK
employeeid int
name string
designation string
state string
number string
描述格式化语句
hive> describe formatted Employee_Local;OK# col_name data_type commentemployeeid int name string designation string state string number string # Detailed Table InformationDatabase: default Owner: cloudera CreateTime: Fri Mar 15 10:53:35 PDT 2019 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_test Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1552672415 # Storage InformationSerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim , serialization.format , Time taken: 0.544 seconds, Fetched: 31 row(s)
hive> describe formatted Employee_Local;
# col_name data_type comment
# Detailed Table Information
Database: default
Owner: cloudera
CreateTime: Fri Mar 15 10:53:35 PDT 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_test
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1552672415
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.544 seconds, Fetched: 31 row(s)
甚至可以从spark shell获得配置单元表的模式,如下所示:
scala> spark.sql("desc formatted test_loop").collect().foreach(println)[policyid,bigint,null][statecode,string,null][county,string,null][eq_site_limit,bigint,null][hu_site_limit,bigint,null][fl_site_limit,bigint,null][fr_site_limit,bigint,null][tiv_2011,bigint,null][tiv_2012,double,null][eq_site_deductible,double,null][hu_site_deductible,double,null][fl_site_deductible,double,null][fr_site_deductible,double,null][point_latitude,double,null][point_longitude,double,null][line,string,null][construction,string,null][point_granularity,bigint,null][,,][# Detailed Table Information,,][Database:,default,][Owner:,mapr,][Create Time:,Fri May 26 17:56:04 EDT 2017,][Last Access Time:,Wed Dec 31 19:00:00 EST 1969,][Location:,maprfs:/user/hv2/warehouse/test_loop,][Table Type:,MANAGED,][Table Parameters:,,][ rawDataSize,254192494,][ numFiles,1,][ transient_lastDdlTime,1495845784,][ totalSize,251167564,][ numRows,3024360,][,,][# Storage Information,,][SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,][InputFormat:,org.apache.hadoop.mapred.TextInputFormat,][OutputFormat:,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,][Compressed:,No,][Storage Desc Parameters:,,][ serialization.format,1,]
scala> spark.sql("desc formatted test_loop").collect().foreach(println)
[policyid,bigint,null]
[statecode,string,null]
[county,string,null]
[eq_site_limit,bigint,null]
[hu_site_limit,bigint,null]
[fl_site_limit,bigint,null]
[fr_site_limit,bigint,null]
[tiv_2011,bigint,null]
[tiv_2012,double,null]
[eq_site_deductible,double,null]
[hu_site_deductible,double,null]
[fl_site_deductible,double,null]
[fr_site_deductible,double,null]
[point_latitude,double,null]
[point_longitude,double,null]
[line,string,null]
[construction,string,null]
[point_granularity,bigint,null]
[,,]
[# Detailed Table Information,,]
[Database:,default,]
[Owner:,mapr,]
[Create Time:,Fri May 26 17:56:04 EDT 2017,]
[Last Access Time:,Wed Dec 31 19:00:00 EST 1969,]
[Location:,maprfs:/user/hv2/warehouse/test_loop,]
[Table Type:,MANAGED,]
[Table Parameters:,,]
[ rawDataSize,254192494,]
[ numFiles,1,]
[ transient_lastDdlTime,1495845784,]
[ totalSize,251167564,]
[ numRows,3024360,]
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.mapred.TextInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[ serialization.format,1,]
1条答案
按热度按时间eh57zj3b1#
你可以用
DESCRIBE
语句,它显示有关表的元数据,如列名及其数据类型。这个
DESCRIBE FORMATTED
以apache hive用户熟悉的格式显示其他信息。例子:
我创建了一个如下表。
描述语句
您可以使用缩写desc作为describe语句。
描述格式化语句
甚至可以从spark shell获得配置单元表的模式,如下所示: