hadoop作业成功后,将显示各种计数器的摘要,请参见下面的示例。我的问题是,这份报告包括哪些内容 Total time spent by all map tasks
计数器,特别是在Map器作业不是节点本地的情况下,是否包括数据复制时间?
17/01/25 09:06:12 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2941
FILE: Number of bytes written=241959
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=3251
HDFS: Number of bytes written=2051
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=23168
Total time spent by all reduces in occupied slots (ms)=4957
Total time spent by all map tasks (ms)=5792
Total time spent by all reduce tasks (ms)=4957
Total vcore-milliseconds taken by all map tasks=5792
Total vcore-milliseconds taken by all reduce tasks=4957
Total megabyte-milliseconds taken by all map tasks=23724032
Total megabyte-milliseconds taken by all reduce tasks=5075968
Map-Reduce Framework
Map input records=9
Map output records=462
Map output bytes=4986
Map output materialized bytes=2941
Input split bytes=109
Combine input records=462
Combine output records=221
Reduce input groups=221
Reduce shuffle bytes=2941
Reduce input records=221
Reduce output records=221
Spilled Records=442
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=84
CPU time spent (ms)=2090
Physical memory (bytes) snapshot=471179264
Virtual memory (bytes) snapshot=4508950528
Total committed heap usage (bytes)=326631424
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3142
File Output Format Counters
Bytes Written=2051
1条答案
按热度按时间u5rb5r591#
我认为数据拷贝时间已包含在
Total time spent by all map tasks
公制。首先,如果检查服务器端代码(主要与资源管理相关),可以看到
MILLIS_MAPS
常量(对应于您引用的度量),在TaskAttempImpl
类,获取任务尝试的持续时间。任务尝试启动时间是在容器启动并即将开始执行时设置的(据我的源代码所知,此时似乎两个组件都没有移动任何数据,只传递拆分的元数据)。现在,当容器启动时
InputFormat
正在打开一个InputStream
,它负责获取Map程序开始处理所需的数据(此时,可以将流附加到不同的文件系统,但请看DistributedFileSystem
). 您可以检查中执行的步骤MapTask.runNewMapper(...)
方法,其中:(我使用的是hadoop 2.6)