hive查询with join:可用空间低于配置的保留量

emeijp43  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(346)

我正在单节点群集上使用配置单元执行sql查询,并收到以下错误:

  1. MapReduce Jobs Launched:
  2. Stage-Stage-20: HDFS Read: 4456448 HDFS Write: 0 FAIL
  3. Total MapReduce CPU Time Spent: 0 msec

在日志中 http://localhost:50070/logs/hadoop-hadoop-namenode-hadoop.log 可用空间似乎低于配置的保留量:

  1. org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker:
  2. Space available on volume '/dev/mapper/vg_hadoop-lv_root' is 40734720,
  3. which is below the configured reserved amount 104857600`

你明白为什么会出现这个错误吗?
同样在disk analyzer中,在执行查询之前,我有12,6gb的可用空间,当执行因错误而停止时,disk analyzer显示只有2gb的可用空间。我还更新了30gb以上的虚拟机,同样的事情也发生了。
完全错误:

  1. Warning: Map Join MAPJOIN[110][bigTable=?] in task 'Stage-20:MAPRED' is a cross product
  2. Warning: Shuffle Join JOIN[8][tables = [part, supplier]] in Stage 'Stage-1:MAPRED' is a cross product
  3. Query ID = hadoopadmin_20160324175146_7ab8931d-eeac-4e03-b833-3592ed96521f
  4. Total jobs = 9
  5. Stage-27 is selected by condition resolver.
  6. Stage-1 is filtered out by condition resolver.
  7. 16/03/24 17:51:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  8. Execution log at: /tmp/hadoopadmin/hadoopadmin_20160324175146_7ab8931d-eeac-4e03-b833-3592ed96521f.log
  9. 2016-03-24 17:52:01 Starting to launch local task to process map join; maximum memory = 518979584
  10. 2016-03-24 17:52:05 Dump the side-table for tag: 1 with group count: 1 into file: file:/tmp/hadoopadmin/614990eb-e755-4bca-bccf-be19bd5c6882/hive_2016-03-24_17-51-46_111_5082675810708688029-1/-local-10017/HashTable-Stage-20/MapJoin-mapfile61--.hashtable
  11. 2016-03-24 17:52:06 Uploaded 1 File to: file:/tmp/hadoopadmin/614990eb-e755-4bca-bccf-be19bd5c6882/hive_2016-03-24_17-51-46_111_5082675810708688029-1/-local-10017/HashTable-Stage-20/MapJoin-mapfile61--.hashtable (938915 bytes)
  12. 2016-03-24 17:52:06 End of local task; Time Taken: 4.412 sec.
  13. Execution completed successfully
  14. MapredLocal task succeeded
  15. Launching Job 2 out of 9
  16. Number of reduce tasks is set to 0 since there's no reduce operator
  17. Job running in-process (local Hadoop)
  18. 2016-03-24 17:52:10,043 Stage-20 map = 0%, reduce = 0%
  19. 2016-03-24 17:53:10,214 Stage-20 map = 0%, reduce = 0%
  20. 2016-03-24 17:54:10,272 Stage-20 map = 0%, reduce = 0%
  21. 2016-03-24 17:55:10,336 Stage-20 map = 0%, reduce = 0%
  22. 2016-03-24 17:56:10,386 Stage-20 map = 0%, reduce = 0%
  23. 2016-03-24 17:57:10,435 Stage-20 map = 0%, reduce = 0%
  24. log4j:ERROR Failed to flush writer,
  25. java.io.IOException: No space left on device
  26. at java.io.FileOutputStream.writeBytes(Native Method)
  27. at java.io.FileOutputStream.write(FileOutputStream.java:326)
  28. at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
  29. at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
  30. at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
  31. at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
  32. at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
  33. at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
  34. at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
  35. at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:369)
  36. at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
  37. at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
  38. at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
  39. at org.apache.log4j.Category.callAppenders(Category.java:206)
  40. at org.apache.log4j.Category.forcedLog(Category.java:391)
  41. at org.apache.log4j.Category.log(Category.java:856)
  42. at org.apache.commons.logging.impl.Log4JLogger.fatal(Log4JLogger.java:239)
  43. at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:171)
  44. at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  45. at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
  46. at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  47. at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
  48. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  49. at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  50. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  51. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  52. at java.lang.Thread.run(Thread.java:745)
  53. Ended Job = job_local60483225_0001 with errors
  54. Error during job, obtaining debugging information...
  55. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
  56. MapReduce Jobs Launched:
  57. Stage-Stage-20: HDFS Read: 4472832 HDFS Write: 0 FAIL
  58. Total MapR
  59. educe CPU Time Spent: 0 msec
  60. hive>

查询:

  1. select
  2. nation,
  3. o_year,
  4. sum(amount) as sum_profit
  5. from
  6. (select
  7. n_name as nation,
  8. year(o_orderdate) as o_year,
  9. l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
  10. from part,
  11. supplier,
  12. lineitem,
  13. partsupp,
  14. orders,
  15. nation
  16. where
  17. s_suppkey = l_suppkey and
  18. ps_suppkey = l_suppkey and
  19. ps_partkey = l_partkey and
  20. p_partkey = l_partkey and
  21. o_orderkey = l_orderkey and
  22. s_nationkey = n_nationkey and
  23. p_name like '%plum%' ) as profit
  24. group by nation, o_year
  25. order by nation, o_year desc;
kd3sttzy

kd3sttzy1#

这可能是你的问题:

  1. Warning: Map Join MAPJOIN[110][bigTable=?] in task 'Stage-20:MAPRED' is a cross product
  2. Warning: Shuffle Join JOIN[8][tables = [part, supplier]] in Stage 'Stage-1:MAPRED' is a cross product

如果有许多键,交叉产品往往会将几GB的表转换成TB级的表。。。重新评估你的查询,确保它做你认为它是。
编辑现在您已经添加了查询,我可以添加更多。此部分:

  1. from part,
  2. supplier,
  3. lineitem,
  4. partsupp,
  5. orders,
  6. nation

是你优化事物的地方。这是创建笛卡尔积,这是你的问题。所发生的事情是,首先将所有表合并到一个叉积中,然后根据您的数据保存记录 where 子句,而不是使用 on 条款。试试这个(公认的更丑)优化版的查询:

  1. select
  2. nation,
  3. o_year,
  4. sum(amount) as sum_profit
  5. from
  6. (select
  7. n_name as nation,
  8. year(o_orderdate) as o_year,
  9. l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
  10. from
  11. orders o join
  12. (select
  13. l_extendedprice,
  14. l_discount,
  15. l_quantity,
  16. l_orderkey,
  17. n_name,
  18. ps_supplycost
  19. from part p join
  20. (select
  21. l_extendedprice,
  22. l_discount,
  23. l_quantity,
  24. l_partkey,
  25. l_orderkey,
  26. n_name,
  27. ps_supplycost
  28. from partsupp ps join
  29. (select
  30. l_suppkey,
  31. l_extendedprice,
  32. l_discount,
  33. l_quantity,
  34. l_partkey,
  35. l_orderkey,
  36. n_name
  37. from
  38. (select s_suppkey, n_name
  39. from nation n join supplier s on n.n_nationkey = s.s_nationkey
  40. ) s1 join lineitem l on s1.s_suppkey = l.l_suppkey
  41. ) l1 on ps.ps_suppkey = l1.l_suppkey and ps.ps_partkey = l1.l_partkey
  42. ) l2 on p.p_name like '%plum%' and p.p_partkey = l2.l_partkey
  43. ) l3 on o.o_orderkey = l3.l_orderkey
  44. )profit
  45. group by nation, o_year
  46. order by nation, o_year desc;

根据这个基准脚本。

展开查看全部

相关问题