我有两个问题:
我有1个主节点和1个节点的设置,并处理一个5mb的文件。我发现不管有没有datanode,总的处理时间几乎和下面一样。我指的是cpu时间花费了将近6秒。任何人都可以在这里回答说datanode实际上正在做这项工作?
我该怎么监控呢?
Map input records=1
Map output records=802685
Map output bytes=8428185
Map output materialized bytes=10033561
Input split bytes=97
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=10033561
Reduce input records=802685
Reduce output records=3
Spilled Records=1605370
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=527
CPU time spent (ms)=5800
Physical memory (bytes) snapshot=550604800
Virtual memory (bytes) snapshot=5864865792
Total committed heap usage (bytes)=421007360
Peak Map Physical memory (bytes)=416796672
Peak Map Virtual memory (bytes)=2929139712
Peak Reduce Physical memory (bytes)=133808128
Peak Reduce Virtual memory (bytes)=2935726080
运行20mb的文件时遇到内存不足错误。我已为此处理设置了4gb。想知道为什么这个hadoop会消耗这么多资源。
它只是一个map reduce作业,作为下面的简单文本,并生成下面的计数的输出。
,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,
有人有答案吗?
暂无答案!
目前还没有任何答案,快来回答吧!