Hive作业占用了太多时间

gg0vcinb  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(316)

这个阶段是表a(10万行)和表b(500万行)在一个键上的连接。
表a只有两列,表id为匹配键。
尝试了很多方法将此阶段转换为Map联接而不是公共联接,但它仍然作为公共联接运行,需要很长时间。有什么加速的建议吗?
还有,为什么总是这样 67% 减少发生的如此之快,之后,它一步一步走,需要很长的时间?

2015-12-21 01:12:55,635 Stage-2 map = 0%,  reduce = 0%
2015-12-21 01:13:39,342 Stage-2 map = 20%,  reduce = 0%, Cumulative CPU 5.49 sec
2015-12-21 01:13:43,618 Stage-2 map = 40%,  reduce = 0%, Cumulative CPU 31.79 sec
2015-12-21 01:13:45,692 Stage-2 map = 60%,  reduce = 0%, Cumulative CPU 34.42 sec
2015-12-21 01:13:46,735 Stage-2 map = 73%,  reduce = 0%, Cumulative CPU 45.1 sec
2015-12-21 01:13:48,812 Stage-2 map = 80%,  reduce = 0%, Cumulative CPU 46.87 sec
2015-12-21 01:13:57,125 Stage-2 map = 93%,  reduce = 0%, Cumulative CPU 60.03 sec
2015-12-21 01:13:58,160 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 61.46 sec
2015-12-21 01:14:42,001 Stage-2 map = 100%,  reduce = 67%, Cumulative CPU 72.34 sec
2015-12-21 01:15:42,196 Stage-2 map = 100%,  reduce = 67%, Cumulative CPU 141.27 sec
2015-12-21 01:16:31,357 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 183.86 sec
2015-12-21 01:17:31,587 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 245.5 sec
2015-12-21 01:18:31,840 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 306.58 sec
2015-12-21 01:19:32,275 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 371.49 sec
2015-12-21 01:20:32,549 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 433.61 sec
2015-12-21 01:20:58,591 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 457.46 sec
2015-12-21 01:21:58,904 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 516.95 sec
2015-12-21 01:22:59,143 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 576.51 sec
2015-12-21 01:23:59,480 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 636.39 sec
2015-12-21 01:24:59,810 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 692.75 sec
2015-12-21 01:25:59,978 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 757.39 sec
aemubtdh

aemubtdh1#

你的减速机正在一步一步地缓慢推进,需要时间来完成。
Map从本质上减少了工作 three stages : Map task , Shuffle 以及 Reducer task .
每个阶段都有贡献 33.33% 整体工作完成的完成情况。这是前两个阶段 Map task 以及 Shuffle 大量数据正在完成。这就是为什么你看到 Reducer 已完成 67% . 其余的完成取决于工程进度 Reducer task . 这个 Reducer side join 这需要时间。

yrefmtwq

yrefmtwq2#

你可以用 set mapreduce.job.reduces=<number_of_reducers> . 如果没有加速,请粘贴完整的日志。您可以从as4开始,看看它是否提高了性能。
同时给出一些关于集群配置的细节。单节点或多节点,如果是多节点,有多少等。

相关问题