我已经使用 ambari 1.7
我面临一个奇怪的问题。第一次启动数据节点时,它会在几秒钟内自动关闭。当我尝试重新启动datanode时,它没有启动。请帮我解决这个问题。
重新启动datanode的日志如下
2015-01-26 17:58:02,233 - Error while executing command 'start':
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/datanode.py", line 37, in start
datanode(action="start")
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_datanode.py", line 55, in datanode
create_log_dir=True
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/utils.py", line 102, in service
not_if=service_is_up
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
raise ex
Fail: Execution of 'ulimit -c unlimited; su -s /bin/bash - hdfs -c 'export HADOOP_LIBEXEC_DIR=/usr/hdp/current/hadoop-client/libexec && /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode'' returned 1. stdin: is not a tty
starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-node1.out
在重启hadoop集群之后,我遇到了类似的问题。这是我的日志文件,来自/var/log/hadoop/hdfs/hadoop-hdfs-datanode-master.hadoopcluster.out
ulimit -a for user hdfs
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62510
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 128000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
1条答案
按热度按时间r6hnlfcb1#
以下是我如何解决这个问题(这不是一个完美的解决方案,但我把它放在这里供您参考)。
经过几次尝试和google搜索,我发布了它,可能是由于名称节点和数据节点之间的不一致造成的。
所以我从每个数据节点删除了整个hdfs目录。数据节点目录的位置可以从
hdfs-site.xml
.之后,我通过命令格式化namenode
hadoop namenode -format
那时,我可以启动数据节点,但启动namenode失败。最后,我从主计算机中删除了name node目录并重新启动整个集群。
现在它运行良好,但我不可避免地丢失了旧hdfs中的原始数据。