hive在limit 1查询中花费了很长时间

thtygnil  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(651)

我最近设置了Hive。我已经创建了一个外部表来访问mongodb中的数据库。现在,如果我运行这样的查询 SELECT id FROM users LIMIT 1; 执行该命令平均需要18秒。这将需要同样的时间,即使 LIMIT 设置为10、100、1000、10000。日志包含以下内容:

2015-08-24 09:19:37,918 INFO  [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min=null, max= { "_id" : { "$oid" : "55cdbffaa9ad1735c531a362"}}
2015-08-24 09:19:37,918 INFO  [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdbffaa9ad1735c531a362"}}, max= { "_id" : { "$oid" : "55cdc000a9ad1735d5cb42ab"}}
2015-08-24 09:19:37,918 INFO  [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc000a9ad1735d5cb42ab"}}, max= { "_id" : { "$oid" : "55cdc002a9ad1735d5cb56f9"}}
2015-08-24 09:19:37,918 INFO  [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc002a9ad1735d5cb56f9"}}, max= { "_id" : { "$oid" : "55cdc008a9ad1735eaffb513"}}
2015-08-24 09:19:37,919 INFO  [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc008a9ad1735eaffb513"}}, max= { "_id" : { "$oid" : "55cdc00ba9ad1735eaffc961"}}
2015-08-24 09:19:37,919 INFO  [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc00ba9ad1735eaffc961"}}, max= { "_id" : { "$oid" : "55cdc012a9ad1735fab2a0dd"}}
2015-08-24 09:19:37,919 INFO  [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc012a9ad1735fab2a0dd"}}, max= null

其实中间有很多类似的行,我省略了。从日志上我只能猜测,即使我这样做 limit 1 hive从mongodb获取整个集合,然后选择1来显示。有没有办法改变这个,这样Hive只有一行当我这样做 limit 1 ?

eni9jsuy

eni9jsuy1#

在配置单元表的情况下(对于外部表也可能是这样),如果您只从数据库中选择一个特定的字段,那么map reduce任务(或您正在使用的任何执行引擎)就会启动,而如果您选择*则不需要map reduce->它会快得多。这可能是行动迟缓的原因。

相关问题