我最近设置了Hive。我已经创建了一个外部表来访问mongodb中的数据库。现在,如果我运行这样的查询 SELECT id FROM users LIMIT 1;
执行该命令平均需要18秒。这将需要同样的时间,即使 LIMIT
设置为10、100、1000、10000。日志包含以下内容:
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min=null, max= { "_id" : { "$oid" : "55cdbffaa9ad1735c531a362"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdbffaa9ad1735c531a362"}}, max= { "_id" : { "$oid" : "55cdc000a9ad1735d5cb42ab"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc000a9ad1735d5cb42ab"}}, max= { "_id" : { "$oid" : "55cdc002a9ad1735d5cb56f9"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc002a9ad1735d5cb56f9"}}, max= { "_id" : { "$oid" : "55cdc008a9ad1735eaffb513"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc008a9ad1735eaffb513"}}, max= { "_id" : { "$oid" : "55cdc00ba9ad1735eaffc961"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc00ba9ad1735eaffc961"}}, max= { "_id" : { "$oid" : "55cdc012a9ad1735fab2a0dd"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc012a9ad1735fab2a0dd"}}, max= null
其实中间有很多类似的行,我省略了。从日志上我只能猜测,即使我这样做 limit 1
hive从mongodb获取整个集合,然后选择1来显示。有没有办法改变这个,这样Hive只有一行当我这样做 limit 1
?
1条答案
按热度按时间eni9jsuy1#
在配置单元表的情况下(对于外部表也可能是这样),如果您只从数据库中选择一个特定的字段,那么map reduce任务(或您正在使用的任何执行引擎)就会启动,而如果您选择*则不需要map reduce->它会快得多。这可能是行动迟缓的原因。