hive表？

f0brbegy 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(295)

我在hdfs目录中有数千个avro文件，格式为yyyy/mm/dd/。在每个目录中，可能有200-400个.avro文件，其中包含当天的数据。
当我创建一个外部表时，我认为location属性假设一个文件。。。有没有办法把它指向一个文件目录，让它读取所有的文件？

hadoop Hive avro

来源：https://stackoverflow.com/questions/35583299/hive-table-from-multiple-avro-files

2条答案

按热度按时间

r6vfmomb1#

确保在生成表时指定分区。然后使用alter表格，根据需要添加胎面，如下所示：

create external table mydatabase.NEW_TABLE
partitioned by (date string)
row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
stored as inputformat    '
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
tblproperties ('avro.schema.literal'='{
"name": "my_record",
"type": "record",
"fields": [
   {"name":"boolean1", "type":"boolean"},
   {"name":"int1", "type":"int"},
   {"name":"long1", "type":"long"},
   {"name":"float1", "type":"float"},
   {"name":"double1", "type":"double"},
   {"name":"string1", "type":"string"},
   {"name": "nullable_int", "type": ["int", "null"]]}');
alter table mydatabase.NEW_TABLE add partition (date='20150304') location '/path/to/somefiles/20150304;
alter table mydatabase.NEW_TABLE add partition (date='20150305') location '/path/to/somefiles/20150305;
alter table mydatabase.NEW_TABLE add partition (date='20150306') location '/path/to/somefiles/20150306;

您可以根据需要添加任意多的分区。我建议您将此表设置为外部表，以便在出错时不会将数据放在分区中。

赞(0）回复(0）举报 2021-06-02

ac1kyiln2#

直接从hive文档：

hive.mapred.supports.subdirectories
  Default Value: false
  Added In: Hive 0.10.0 with HIVE-3276

正在运行的hadoop版本是否支持表/分区的子目录。如果hadoop版本支持表/分区的子目录，那么可以应用许多配置单元优化。这个支持是由mapreduce-1501添加的。
反过来，hadoop特性也可以通过 mapred.input.dir.recursive .
reference:that post （除其他外）

赞(0）回复(0）举报 2021-06-02

我来回答

hive表？

2条答案

相关问题

热门标签

最新问答