为什么在这个mapreduce中最后的reduce步骤非常慢(hiveql、hdfs和mapreduce)

72qzrwbm  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(365)

一些背景信息:
我正在使用dataikudss、hdfs和分区数据集。我运行了一个特殊的作业(配置单元查询),它有两个输入数据集—一个是非常大的分区数据集,另一个是小的(约250行,2列)非分区数据集。我们把分区表称为a,把非分区表称为b。
问题:
查询的格式如下:,

SELECT a.f1, f2, ..., fn
  FROM A as a
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}

以下是mapreduce作业的当前输出(请记住此作业仍在运行):

[09:05:53] [INFO] [dku.utils]  - INFO  : Total jobs = 4
[09:05:53] [INFO] [dku.utils]  - INFO  : Starting task [Stage-10:CONDITIONAL] in serial mode
[09:05:53] [INFO] [dku.utils]  - INFO  : Stage-11 is filtered out by condition resolver.
[09:05:53] [INFO] [dku.utils]  - INFO  : Stage-1 is selected by condition resolver.
[09:05:53] [INFO] [dku.utils]  - INFO  : Launching Job 1 out of 4
[09:05:53] [INFO] [dku.utils]  - INFO  : Starting task [Stage-1:MAPRED] in serial mode
[09:05:53] [INFO] [dku.utils]  - INFO  : Number of reduce tasks not specified. Estimated from input data size: 307
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to change the average load for a reducer (in bytes):
[09:05:53] [INFO] [dku.utils]  - INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to limit the maximum number of reducers:
[09:05:53] [INFO] [dku.utils]  - INFO  :   set hive.exec.reducers.max=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to set a constant number of reducers:
[09:05:53] [INFO] [dku.utils]  - INFO  :   set mapreduce.job.reduces=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : number of splits:75
[09:05:53] [INFO] [dku.utils]  - INFO  : Submitting tokens for job: job_1529508387832_10211
[09:05:53] [INFO] [dip.hiveserver2.log.sniffer]  - sniffed applicationId => application_1529508387832_10211/
[09:05:53] [INFO] [dku.utils]  - INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/bin/hadoop job  -kill job_1529508387832_10211
[09:05:53] [INFO] [dku.utils]  - INFO  : Hadoop job information for Stage-1: number of mappers: 75; number of reducers: 307
[09:05:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:05:47,749 Stage-1 map = 0%,  reduce = 0%
[09:06:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:06:48,444 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 278.18 sec
[09:06:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:06:51,682 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 373.35 sec
[09:07:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:00,159 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 501.46 sec
[09:07:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:02,235 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 539.6 sec
[09:07:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:07,472 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 1389.81 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:10,605 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 2172.52 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:12,574 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 2577.81 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:13,604 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 2865.44 sec
[09:07:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:15,747 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 3110.21 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:19,898 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 4080.2 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:21,988 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 4522.48 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:23,015 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 4755.96 sec
[09:07:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:29,335 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 5710.85 sec
[09:07:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:31,407 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 5948.34 sec
[09:07:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:34,555 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 6399.6 sec
[09:07:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:37,663 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 6811.22 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:38,695 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 7087.68 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:39,729 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 7288.22 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:40,769 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 7520.54 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:41,903 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 7771.37 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:42,930 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 7936.9 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:45,035 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 8254.78 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:46,075 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 8428.35 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:47,111 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 8661.23 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:48,153 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 8834.37 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:49,193 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 8983.68 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:50,227 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 9149.94 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:51,263 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 9268.9 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:52,301 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 9415.86 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:53,352 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 9540.63 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:54,381 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 9711.54 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:55,421 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 9823.52 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:56,453 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 10010.83 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:57,492 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 10081.9 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:58,532 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 10230.13 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:59,576 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 10392.61 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:00,607 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 10483.38 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:01,649 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 10618.16 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:02,672 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 10684.82 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:03,695 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 10701.95 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:04,720 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 10767.21 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:05,750 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 10849.92 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:06,780 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 10924.45 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:07,902 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 11000.21 sec
[09:08:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:09,965 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 11013.58 sec
[09:08:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:10,991 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 11057.76 sec
[09:08:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:14,216 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 11157.51 sec
[09:08:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:17,362 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 11392.85 sec
[09:08:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:20,460 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 11610.7 sec
[09:08:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:26,106 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 11781.65 sec
[09:08:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:28,163 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 11788.58 sec
[09:08:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:35,410 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 12167.24 sec
[09:08:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:38,540 Stage-1 map = 86%,  reduce = 1%, Cumulative CPU 12317.09 sec
[09:08:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:39,583 Stage-1 map = 86%,  reduce = 2%, Cumulative CPU 12329.8 sec
[09:08:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:40,631 Stage-1 map = 86%,  reduce = 3%, Cumulative CPU 12333.61 sec
[09:08:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:47,927 Stage-1 map = 87%,  reduce = 3%, Cumulative CPU 12651.77 sec
[09:08:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:48,970 Stage-1 map = 88%,  reduce = 4%, Cumulative CPU 12826.37 sec
[09:08:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:51,092 Stage-1 map = 88%,  reduce = 5%, Cumulative CPU 12857.19 sec
[09:08:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:54,214 Stage-1 map = 90%,  reduce = 5%, Cumulative CPU 13037.63 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:59,396 Stage-1 map = 91%,  reduce = 5%, Cumulative CPU 13117.71 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:00,440 Stage-1 map = 92%,  reduce = 5%, Cumulative CPU 13238.06 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:01,485 Stage-1 map = 93%,  reduce = 5%, Cumulative CPU 13249.8 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:05,660 Stage-1 map = 94%,  reduce = 5%, Cumulative CPU 13306.0 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:06,706 Stage-1 map = 97%,  reduce = 5%, Cumulative CPU 13393.5 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:07,751 Stage-1 map = 97%,  reduce = 6%, Cumulative CPU 13409.12 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:08,795 Stage-1 map = 98%,  reduce = 6%, Cumulative CPU 13433.07 sec
[09:09:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:09,835 Stage-1 map = 98%,  reduce = 8%, Cumulative CPU 13474.03 sec
[09:09:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:10,874 Stage-1 map = 98%,  reduce = 9%, Cumulative CPU 13484.64 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:14,004 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 13580.71 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:15,118 Stage-1 map = 100%,  reduce = 13%, Cumulative CPU 13619.15 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:16,160 Stage-1 map = 100%,  reduce = 16%, Cumulative CPU 13707.2 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:17,210 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 14456.75 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:18,258 Stage-1 map = 100%,  reduce = 39%, Cumulative CPU 14708.07 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:19,291 Stage-1 map = 100%,  reduce = 40%, Cumulative CPU 14768.29 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:20,329 Stage-1 map = 100%,  reduce = 41%, Cumulative CPU 14834.88 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:21,360 Stage-1 map = 100%,  reduce = 42%, Cumulative CPU 14902.4 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:22,399 Stage-1 map = 100%,  reduce = 45%, Cumulative CPU 15040.16 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:23,433 Stage-1 map = 100%,  reduce = 47%, Cumulative CPU 15165.58 sec
[09:09:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:28,627 Stage-1 map = 100%,  reduce = 63%, Cumulative CPU 15792.29 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:30,711 Stage-1 map = 100%,  reduce = 64%, Cumulative CPU 15889.21 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:31,753 Stage-1 map = 100%,  reduce = 65%, Cumulative CPU 15898.95 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:32,789 Stage-1 map = 100%,  reduce = 66%, Cumulative CPU 15927.26 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:33,822 Stage-1 map = 100%,  reduce = 71%, Cumulative CPU 16086.93 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:34,866 Stage-1 map = 100%,  reduce = 90%, Cumulative CPU 16711.13 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:35,907 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 16795.34 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:36,952 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 16881.47 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:37,995 Stage-1 map = 100%,  reduce = 96%, Cumulative CPU 16891.18 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:44,249 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 16958.21 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:45,292 Stage-1 map = 100%,  reduce = 98%, Cumulative CPU 17011.88 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:47,378 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17055.07 sec
[09:10:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:10:47,421 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17545.89 sec
[09:11:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:11:47,872 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17764.45 sec
[09:12:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:12:48,287 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 18330.86 sec
[09:13:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:13:48,855 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 19232.58 sec
...
...
...
...
...
[10:22:20] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:22:17,545 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 26857.28 sec
[10:23:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:23:17,654 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 26974.96 sec
[10:24:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:24:18,112 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27081.0 sec
[10:25:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:25:18,964 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27187.52 sec
[10:26:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:26:19,404 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27287.58 sec

如您所见,reduce阶段的最后1%需要相当长的时间(到目前为止,1小时32分钟的总时间中有1小时27分钟)。
我很难找到一个清晰简洁的解释为什么会发生这种情况的资源。另外值得一提的是,我对mapreduce有一个非常基本的了解,但我目前正在hdfs环境中研究mapreduce。从我所读到的内容来看,由于reduce的性质,reduce任务的大部分可能被固定在一个reducer上 JOIN 正在执行。
如果有人能对正在发生的事情给出一个高层次的解释,给我指出一些好的资源的方向,并给我一些建议,告诉我如何在未来的工作中通过改变我的Hive查询来避免这种情况,那将是非常感激的。

lxkprmvk

lxkprmvk1#

如果final reducer是一个join,那么它看起来像join key中的skew。首先检查两件事:
检查b.f1连接键是否没有重复项:

select b.f1, count(*) cnt from B b 
 group by b.f1 
having count(*)>1 order by cnt desc;

检查a.f1的分布:

select a.f1, count(*) cnt from A a
 group by a.f1  
order by cnt desc
limit 10;

此查询将显示倾斜的键。
如果存在倾斜(具有相同值的行太多),则分别连接倾斜的键,使用union all:

SELECT a.f1, f2, ..., fn
  FROM ( select * from A where f1 = skewed_value) as a --skewed
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}
UNION ALL
SELECT a.f1, f2, ..., fn
  FROM ( select * from A where f1 != skewed_value) as a --all other
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}

最后,如果没有歪斜和重复的问题,那么尝试增加reducer的并行性:获取每个reducer配置的当前字节数
设置hive.exec.reducers.bytes.per.reducer;
通常这将返回大约1g的值。尝试除以2,在查询之前设置新值,并检查它将启动多少个reducer和性能。成功的标准是更多的减速机已经开始和性能提高。

set hive.exec.reducers.bytes.per.reducer=67108864;

每个减速机的字节数越少,启动的减速机就越多,从而增加了并行性;
更新:尝试启用map join,您的第二个表足够小,可以容纳在内存中,mapjoin将在没有reducer的情况下工作,并且在reducer上使用skew也没有问题。
如何启用mapjoin:https://stackoverflow.com/a/49154414/2700344

相关问题