hadoop上的“Map任务所花费的时间”包括什么？

hadoop作业成功后，将显示各种计数器的摘要，请参见下面的示例。我的问题是，这份报告包括哪些内容 Total time spent by all map tasks 计数器，特别是在Map器作业不是节点本地的情况下，是否包括数据复制时间？

17/01/25 09:06:12 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=2941
                FILE: Number of bytes written=241959
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=3251
                HDFS: Number of bytes written=2051
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=23168
                Total time spent by all reduces in occupied slots (ms)=4957
                Total time spent by all map tasks (ms)=5792
                Total time spent by all reduce tasks (ms)=4957
                Total vcore-milliseconds taken by all map tasks=5792
                Total vcore-milliseconds taken by all reduce tasks=4957
                Total megabyte-milliseconds taken by all map tasks=23724032
                Total megabyte-milliseconds taken by all reduce tasks=5075968
        Map-Reduce Framework
                Map input records=9
                Map output records=462
                Map output bytes=4986
                Map output materialized bytes=2941
                Input split bytes=109
                Combine input records=462
                Combine output records=221
                Reduce input groups=221
                Reduce shuffle bytes=2941
                Reduce input records=221
                Reduce output records=221
                Spilled Records=442
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=84
                CPU time spent (ms)=2090
                Physical memory (bytes) snapshot=471179264
                Virtual memory (bytes) snapshot=4508950528
                Total committed heap usage (bytes)=326631424
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=3142
        File Output Format Counters
                Bytes Written=2051

我认为数据拷贝时间已包含在 Total time spent by all map tasks 公制。
首先，如果检查服务器端代码（主要与资源管理相关），可以看到 MILLIS_MAPS 常量（对应于您引用的度量），在 TaskAttempImpl 类，获取任务尝试的持续时间。任务尝试启动时间是在容器启动并即将开始执行时设置的（据我的源代码所知，此时似乎两个组件都没有移动任何数据，只传递拆分的元数据）。
现在，当容器启动时 InputFormat 正在打开一个 InputStream ，它负责获取Map程序开始处理所需的数据（此时，可以将流附加到不同的文件系统，但请看 DistributedFileSystem ). 您可以检查中执行的步骤 MapTask.runNewMapper(...) 方法，其中：

input.initialize(split, mapperContext);
mapper.run(mapperContext);

（我使用的是hadoop 2.6）

hadoop上的“Map任务所花费的时间”包括什么？

1条答案

相关问题

热门标签

最新问答