如何使用java获取以textinput/outputformat创建的配置单元表的模式

sqserrrh  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(400)

如果是avro、orc或parquet表,我可以使用各自的库来获取模式。但是,如果输入/输出格式是txt,并且数据存储在csv文件中,如何通过编程获得模式?
谢谢,

eh57zj3b

eh57zj3b1#

你可以用 DESCRIBE 语句,它显示有关表的元数据,如列名及其数据类型。
这个 DESCRIBE FORMATTED 以apache hive用户熟悉的格式显示其他信息。
例子:
我创建了一个如下表。

  1. CREATE TABLE IF NOT EXISTS Employee_Local( EmployeeId INT,Name STRING,
  2. Designation STRING,State STRING, Number STRING)
  3. ROW Format Delimited Fields Terminated by ',' STORED AS Textfile;

描述语句
您可以使用缩写desc作为describe语句。

  1. hive> DESCRIBE Employee_Local;
  2. OK
  3. employeeid int
  4. name string
  5. designation string
  6. state string
  7. number string

描述格式化语句

  1. hive> describe formatted Employee_Local;
  2. OK
  3. # col_name data_type comment
  4. employeeid int
  5. name string
  6. designation string
  7. state string
  8. number string
  9. # Detailed Table Information
  10. Database: default
  11. Owner: cloudera
  12. CreateTime: Fri Mar 15 10:53:35 PDT 2019
  13. LastAccessTime: UNKNOWN
  14. Protect Mode: None
  15. Retention: 0
  16. Location: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_test
  17. Table Type: MANAGED_TABLE
  18. Table Parameters:
  19. transient_lastDdlTime 1552672415
  20. # Storage Information
  21. SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  22. InputFormat: org.apache.hadoop.mapred.TextInputFormat
  23. OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  24. Compressed: No
  25. Num Buckets: -1
  26. Bucket Columns: []
  27. Sort Columns: []
  28. Storage Desc Params:
  29. field.delim ,
  30. serialization.format ,
  31. Time taken: 0.544 seconds, Fetched: 31 row(s)

甚至可以从spark shell获得配置单元表的模式,如下所示:

  1. scala> spark.sql("desc formatted test_loop").collect().foreach(println)
  2. [policyid,bigint,null]
  3. [statecode,string,null]
  4. [county,string,null]
  5. [eq_site_limit,bigint,null]
  6. [hu_site_limit,bigint,null]
  7. [fl_site_limit,bigint,null]
  8. [fr_site_limit,bigint,null]
  9. [tiv_2011,bigint,null]
  10. [tiv_2012,double,null]
  11. [eq_site_deductible,double,null]
  12. [hu_site_deductible,double,null]
  13. [fl_site_deductible,double,null]
  14. [fr_site_deductible,double,null]
  15. [point_latitude,double,null]
  16. [point_longitude,double,null]
  17. [line,string,null]
  18. [construction,string,null]
  19. [point_granularity,bigint,null]
  20. [,,]
  21. [# Detailed Table Information,,]
  22. [Database:,default,]
  23. [Owner:,mapr,]
  24. [Create Time:,Fri May 26 17:56:04 EDT 2017,]
  25. [Last Access Time:,Wed Dec 31 19:00:00 EST 1969,]
  26. [Location:,maprfs:/user/hv2/warehouse/test_loop,]
  27. [Table Type:,MANAGED,]
  28. [Table Parameters:,,]
  29. [ rawDataSize,254192494,]
  30. [ numFiles,1,]
  31. [ transient_lastDdlTime,1495845784,]
  32. [ totalSize,251167564,]
  33. [ numRows,3024360,]
  34. [,,]
  35. [# Storage Information,,]
  36. [SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
  37. [InputFormat:,org.apache.hadoop.mapred.TextInputFormat,]
  38. [OutputFormat:,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,]
  39. [Compressed:,No,]
  40. [Storage Desc Parameters:,,]
  41. [ serialization.format,1,]
展开查看全部

相关问题