java—确定emr作业花在map和REDUCT任务上的时间的最佳方法是什么?

6ioyuze2  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(323)

我正在amazon的aws emr中运行一个定制的jar-hadoop作业,我想收集运行所有map任务的时间与运行reduce任务的时间的数据。在这个框架中有没有一种方法可以挖掘我没有找到的数据?如果没有人对生成此数据的最佳方法有任何建议?
谢谢您,

1szpjjfi

1szpjjfi1#

您可以在客户机日志的作业计数器部分中找到此信息。例如:

Job Counters 
        Killed reduce tasks=1
        Launched map tasks=1
        Launched reduce tasks=7
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=1071855
        Total time spent by all reduces in occupied slots (ms)=4083210
      **Total time spent by all map tasks (ms)=23819**
      **Total time spent by all reduce tasks (ms)=45369**
        Total vcore-milliseconds taken by all map tasks=23819
        Total vcore-milliseconds taken by all reduce tasks=45369
        Total megabyte-milliseconds taken by all map tasks=34299360
        Total megabyte-milliseconds taken by all reduce tasks=130662720
Map-Reduce Framework
        Map input records=3929235
        Map output records=15716940
        Map output bytes=132989251
        Map output materialized bytes=633590
        Input split bytes=86

相关问题