当我用多个对象运行create请求时,hadoop配置单元一直冻结

fnvucqvd  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(238)

当我创建一些简单的表时,我的配置单元就工作了,但是当我尝试运行任何一个包含大量对象的创建表时,它会在提供以下内容之后立即冻结,

  1. Query ID = root_20160321031616_6fbfd536-f3e5-4517-ab8b-2dc8ddb34b85
  2. Total jobs = 3
  3. Launching Job 1 out of 3
  4. Number of reduce tasks is set to 0 since there's no reduce operator
  5. Starting Job = job_1458530057671_0001, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1458530057671_0001/
  6. Kill Command = /usr/hdp/2.2.0.0-2041/hadoop/bin/hadoop job -kill job_1458530057671_0001

我不记得以前它工作时是否有“…没有reduce操作符”。
我尝试运行的代码相对简单,

  1. create table BMO_F069_table as
  2. select
  3. get_json_object(BMO_F069.json, '$.text') as text,
  4. get_json_object(BMO_F069.json, '$.in_reply_to_user_id') as in_reply_to_user_id,
  5. get_json_object(BMO_F069.json, '$.id') as id,
  6. get_json_object(BMO_F069.json, '$.favorite_count') as favorite_count,
  7. get_json_object(BMO_F069.json, '$.coordinates') as coordinates,
  8. get_json_object(BMO_F069.json, '$.id_str') as id_str,
  9. get_json_object(BMO_F069.json, '$.user.location') as location,
  10. get_json_object(BMO_F069.json, '$.lang') as lang,
  11. get_json_object(BMO_F069.json, '$.indices') as indices,
  12. get_json_object(BMO_F069.json, '$.type') as type,
  13. get_json_object(BMO_F069.json, '$.hashtags') as hashtags,
  14. get_json_object(BMO_F069.json, '$.user_mentions') as user_mentions,
  15. get_json_object(BMO_F069.json, '$.user.screen_name') as screen_name,
  16. get_json_object(BMO_F069.json, '$.user.name') as name,
  17. get_json_object(BMO_F069.json, '$.in_reply_to_screen_name') as in_reply_to_screen_name,
  18. get_json_object(BMO_F069.json, '$.retweet_count') as retweet_count,
  19. get_json_object(BMO_F069.json, '$.favorited') as favorited,
  20. get_json_object(BMO_F069.json, '$.retweeted_status') as retweeted_status,
  21. get_json_object(BMO_F069.json, '$.user') as user,
  22. get_json_object(BMO_F069.json, '$.followers_count') as followers_count,
  23. get_json_object(BMO_F069.json, '$.statuses_count') as statuses_count,
  24. get_json_object(BMO_F069.json, '$.description') as description,
  25. get_json_object(BMO_F069.json, '$.geo_enabled') as geo_enabled,
  26. get_json_object(BMO_F069.json, '$.favourites_count') as favourites_count,
  27. get_json_object(BMO_F069.json, '$.created_at') as created_at,
  28. get_json_object(BMO_F069.json, '$.time_zone') as time_zone,
  29. get_json_object(BMO_F069.json, '$.listed_count') as listed_count,
  30. get_json_object(BMO_F069.json, '$.in_reply_to_user_id_str') as in_reply_to_user_id_str
  31. from BMO_F069;

数据由60 mb的数据组成。不幸的是,我对集群的了解还不够,无法给出具体的规格。对不起的。但我也很感激你的反馈。谢谢,在过去的几周里,我已经运行了数百次类似的查询,数据大到半TB,没有任何问题。当它在一个作业之间冻结时,它停止了任何新提交的工作。有没有办法重置它?
当我从终端运行hive时,我得到下面的开场白。这正常吗?我不记得以前的信息了。

  1. 16/03/21 21:16:55 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
  2. 16/03/21 21:16:55 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist
  3. 16/03/21 21:16:55 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
  4. 16/03/21 21:16:55 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

非常感谢您的帮助。

cuxqih21

cuxqih211#

当您启动一个超级未优化的作业时,hive仍然会尝试完成它的任务,无论它需要多长时间。
由于您没有提供任何关于集群规格、数据量和查询的有用信息,。。。我猜可能是您的查询编写得不好,或者您缺少集群资源来及时完成您的请求。

相关问题