我刚刚在psuedo分布式模式下设置了hadoop/yarn 2.x(特别是v0.23.3)。
我遵循了一些博客和网站的说明,这些博客和网站或多或少提供了相同的设置方法。我还看了第三版的o'reilly的hadoop书(讽刺的是,这本书对我的帮助最小)。
问题是:
After running "start-dfs.sh" and then "start-yarn.sh", while all of the daemons
do start (as indicated by jps(1)), the Resource Manager web portal
(Here: http://localhost:8088/cluster/nodes) indicates 0 (zero) job-nodes in the
cluster. So while submitting the example/test Hadoop job indeed does get
scheduled, it pends forever because, I assume, the configuration doesn't see a
node to run it on.
Below are the steps I performed, including resultant configuration files.
Hopefully the community help me out... (And thank you in advance).
配置:
在my和hadoop的unix帐户配置文件中都设置了以下环境变量:~/.profile:
export HADOOP_HOME=/home/myself/APPS.d/APACHE_HADOOP.d/latest
# Note: /home/myself/APPS.d/APACHE_HADOOP.d/latest -> hadoop-0.23.3
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_INSTALL=${HADOOP_HOME}
export HADOOP_CLASSPATH=${HADOOP_HOME}/lib
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop/conf
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop/conf
export JAVA_HOME=/usr/lib/jvm/jre
hadoop$java版本
java version "1.7.0_06-icedtea<br>
OpenJDK Runtime Environment (fedora-2.3.1.fc17.2-x86_64)<br>
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)<br>
# Although the above shows OpenJDK, the same problem happens with Sun's JRE/JDK.
namenode和datanode目录,也在etc/hadoop/conf/hdfs-site.xml中指定:
/home/myself/APPS.d/APACHE_HADOOP.d/latest/YARN_DATA.d/HDFS.d/DATANODE.d/
/home/myself/APPS.d/APACHE_HADOOP.d/latest/YARN_DATA.d/HDFS.d/NAMENODE.d/
接下来是各种xml配置文件(这里是yarn/mrv2/v0.23.3):
hadoop$ pwd; ls -l
/home/myself/APPS.d/APACHE_HADOOP.d/latest/etc/hadoop/conf
lrwxrwxrwx 1 hadoop hadoop 16 Sep 20 13:14 core-site.xml -> ../core-site.xml
lrwxrwxrwx 1 hadoop hadoop 16 Sep 20 13:14 hdfs-site.xml -> ../hdfs-site.xml
lrwxrwxrwx 1 hadoop hadoop 18 Sep 20 13:14 httpfs-site.xml -> ../httpfs-site.xml
lrwxrwxrwx 1 hadoop hadoop 18 Sep 20 13:14 mapred-site.xml -> ../mapred-site.xml
-rw-rw-r-- 1 hadoop hadoop 10 Sep 20 15:36 slaves
lrwxrwxrwx 1 hadoop hadoop 16 Sep 20 13:14 yarn-site.xml -> ../yarn-site.xml
core-site.xml文件
<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
mapred-site.xml文件
<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>
<!-- Same problem whether this (legacy) stanza is included or not. -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml文件
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/myself/APPS.d/APACHE_HADOOP.d/YARN_DATA.d/HDFS.d/NAMENODE.d</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/myself/APPS.d/APACHE_HADOOP.d/YARN_DATA.d/HDFS.d/DATANODE.d</value>
</property>
</configuration>
yarn-site.xml文件
<?xml version="1.0"?>
<!-- yarn-site.xml -->
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/myself/APPS.d/APACHE_HADOOP.d/YARN_DATA.d/TEMP.d</value>
</property>
</configuration>
etc/hadoop/conf/saves文件
localhost
# Community/friends, is this entry correct/needed for my psuedo-dist mode?
其他总结说明:
(1) As you may have gleaned from above, all files/directories are owned
by the 'hadoop' UNIX user. There is a hadoop:hadoop, UNIX User and
Group, respectively.
(2) The following command was run after the NAMENODE & DATANODE directories
(listed above) were created (and whose paths were entered into
hdfs-site.xml):
hadoop$ hadoop namenode -format
(3) Next, I ran "start-dfs.sh", then "start-yarn.sh".
Here is jps(1) output:
hadoop@e6510$ jps
21979 DataNode
22253 ResourceManager
22384 NodeManager
22156 SecondaryNameNode
21829 NameNode
22742 Jps
谢谢您!
2条答案
按热度按时间11dmarpk1#
在这个问题上做了很多努力但都没有成功(相信我,我都试过了),我用一个不同的解决方案创建了hadoop。在上面,我从一个下载镜像下载了hadoop发行版(同样是v0.23.3)的gzip/tar包,这次我使用了rpm包的caldera-cdh发行版,它是我通过yum repos安装的。希望这能对别人有所帮助,下面是详细的步骤。
第一步:
对于hadoop 0.20.x(mapreduce版本1):
-或者-
对于hadoop 0.23.x(mapreduce版本2):
在上述两种情况下,安装“psuedo”包(代表“伪分布式hadoop”模式)将很方便地触发您需要的所有其他必要包的安装(通过依赖关系解析)。
第二步:
安装sun/oracle的javajre(如果您还没有这样做的话)。您可以通过他们提供的rpm或gzip/tar-ball便携版本来安装它。只要您适当地设置和导出“javahome”环境,并确保${javahome}/bin/java在您的路径中,哪一个都不重要。
注意:我实际上创建了一个名为“latest”的符号链接,并在更新java时将其指向/重新指向java版本特定的目录。为了读者的理解,我在上面已经明确了。
步骤3:将hdfs格式化为“hdfs”unix用户(在上面的“yum安装”过程中创建)。
第4步:
手动启动hadoop守护进程。
第5步:
检查是否正常。以下是针对mapreducev1的(对于mapreducev2,在这个肤浅的层次上没有太大的不同)。
我希望这有帮助!
pxq42qpu2#
诺埃尔,
前几天我遵循了本教程中的步骤http://www.thecloudavenue.com/search?q=0.23 我成功地建立了一个由3centos6.3机器组成的小集群