刚刚看到1个reducer

9jyewag0  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(326)

最近安装了hadoop2.7.1在伪分布式模式与Yarn在8核,28gbram虚拟机与ubuntu14.04 lts。
我们的文件通常是20-40gbs,因此尝试为单个vm找到最佳配置。我们在mapred-site.xml(如下)中设置了配置,以允许运行多个Map器和还原器(使用slowstart=1顺序运行它们)。我看到多个Map,但只有一个缩小。
我们以前的hadoop(2.2.0)集群位于2-4个节点上,因此下面的许多配置都来自该设置。
mapred-site.xml:

<property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>

  <property>
      <name>mapreduce.task.io.sort.factor</name>
      <value>48</value>
  </property>
  <property>
      <name>mapreduce.task.io.sort.mb</name>
      <value>512</value>
  </property>

  <property>
      <name>mapred.child.java.opts</name>
      <value>-Xmx4096m</value>
  </property>
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>3072</value>
    <description>upper memory limit (MB) that Hadoop allows allocated to a mapper</description>
  </property>
  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx2048m</value>
    <description>maximum JVM heap size for map tasks</description>
  </property>
  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>5120</value>
    <description>upper memory limit (MB) that Hadoop allows allocated to a reducer</description>
  </property>
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx4096m</value>
    <description>maximum JVM heap size for reduce tasks</description>
  </property>

  <property>
      <name>mapreduce.tasktracker.map.tasks.maximum</name>
      <value>8</value>
      <description>maximum MAP tasks that can be run in parallel on this node </description>
  </property>
<property>
    <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
    <value>4</value>
    <description>maximum REDUCE tasks that can be run in parallel on this node </description>
</property>
<property>
    <name>mapred.reduce.slowstart.completed.maps</name>
    <value>1</value>
    <description>Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.</description>
</property>

core-site.xml:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/drive1/cluster/hadoop/tmp</value>
 </property>
 <property>
  <name>fs.defaultFS</name>
  <value>hdfs://localhost:9000</value>
 </property>

hdfs-site.xml文件

<property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/mnt/drive1/cluster/hadoop/hdfs/namenode</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/mnt/drive1/cluster/hadoop/hdfs/datanode</value>
  </property>

yarn-site.xml:

<property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
  </property>
  <property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>24576</value>
  </property>
  <property>
      <name>yarn.nodemanager.vmem-check-enabled</name>
      <value>false</value>
  </property>
  <property>
      <name>yarn.nodemanager.pmem-check-enabled</name>
      <value>false</value>
  </property>
ix0qys7i

ix0qys7i1#

根据文件
mapreduce.job.reduces默认值为1。
description:每个作业的默认reduce任务数。
您可以通过在mapred-site.xml中设置该属性来覆盖每个作业的值,也可以覆盖整个集群。

相关问题