hadoop在datanode中的处理时间

kulphzqa  于 2021-06-01  发布在  Hadoop
关注(0)|答案(0)|浏览(301)

我有两个问题:
我有1个主节点和1个节点的设置,并处理一个5mb的文件。我发现不管有没有datanode,总的处理时间几乎和下面一样。我指的是cpu时间花费了将近6秒。任何人都可以在这里回答说datanode实际上正在做这项工作?
我该怎么监控呢?

Map input records=1
            Map output records=802685
            Map output bytes=8428185
            Map output materialized bytes=10033561
            Input split bytes=97
            Combine input records=0
            Combine output records=0
            Reduce input groups=3
            Reduce shuffle bytes=10033561
            Reduce input records=802685
            Reduce output records=3
            Spilled Records=1605370
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=527
            CPU time spent (ms)=5800
            Physical memory (bytes) snapshot=550604800
            Virtual memory (bytes) snapshot=5864865792
            Total committed heap usage (bytes)=421007360
            Peak Map Physical memory (bytes)=416796672
            Peak Map Virtual memory (bytes)=2929139712
            Peak Reduce Physical memory (bytes)=133808128
            Peak Reduce Virtual memory (bytes)=2935726080

运行20mb的文件时遇到内存不足错误。我已为此处理设置了4gb。想知道为什么这个hadoop会消耗这么多资源。
它只是一个map reduce作业,作为下面的简单文本,并生成下面的计数的输出。

,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,TrainBUS,car,

有人有答案吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题