Hadoop HDFS活动NameNode在升级后未变为活动状态

7cjasjjr  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(320)

我正在将hadoop从版本3.0.0升级到版本3.2.2。以下是我遵循的步骤:
1.获取活动的名称节点:

$ hdfs haadmin -getServiceState nn1
standby

$ hdfs haadmin -getServiceState nn2
active

1.打开安全模式并保存命名空间(在nn 2上运行的命令):

$ hdfs dfsadmin -safemode enter
$ hdfs dfsadmin -saveNamespace

1.停止所有Hadoop服务并升级所有节点上的二进制文件
1.在所需节点上启动zookeeper-failover-controller和日志节点
1.使用-upgrade -renameReserved标志启动nn 2(最后一个活动名称节点)

$ hdfs --daemon start namenode -upgrade -renameReserved

从日志:

STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = <hostname2>/<IP2>
STARTUP_MSG:   args = [-upgrade, -renameReserved]
STARTUP_MSG:   version = 3.2.2

org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode [-upgrade, -renameReserved]

org.apache.hadoop.hdfs.server.namenode.FSImage: Starting upgrade of local storage directories.
   old LV = -64; old CTime = 1636096354752.
   new LV = -65; new CTime = 1642257088033
org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory

org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory

org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state

1.停止nn 2并正常启动,然后关闭safemode(我们已将这些定义为服务,因此将nn 2作为服务启动):

$ hdfs --daemon stop namenode
$ sudo service hadoop-hdfs-namenode start

从日志:

STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = <hostname2>/<IP2>
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 3.2.2

org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The reported blocks 0 needs additional 2537 blocks to reach the threshold 0.9990 of total blocks 2540.
The minimum number of live datanodes is not required. Safe mode will be turned off automatically once the thresholds have been reached.

org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for standby state
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active node every 120 seconds.

org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thread...
Checkpointing active NN to possible NNs: [http://<hostname1>:<port>]
Serving checkpoints at http://<hostname2>:<port>

org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode
org.apache.hadoop.ipc.Client: Retrying connect to server: <hostname1>/<ip1>:<port>

1.在活动nn(本例中为nn 2)上关闭安全模式

$ hdfs dfsadmin -safemode leave
safemode: Call From <hostname2>/<IP2> to <hostname1>:<port> failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

1.等到安全模式关闭:

$ hdfs dfsadmin -safemode get
safemode: Call From <hostname2>/<IP2> to <hostname1>:<port> failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

我希望在nn 2上关闭安全模式,并在nn 1未运行时出现异常,但没有发生这种情况。除了尝试连接到<hostname1>/<IP1>:<port>之外,我在日志中没有看到任何其他内容
在步骤8之后,我启动备用namenode并发出hdfs namenode -bootstrapStandby,然后重新启动两个namenode并完成升级。
我已经测试了这些升级步骤至少20-25次,但这一次我在第7步卡住了.
由于nn 2(hostname 2)在升级前是活动的namenode,我希望它会再次出现并成为活动的namenode(并关闭安全模式),但在这种情况下没有发生。我找不到与它相关的任何内容,有人能帮助解决这个问题吗?

nwwlzxa7

nwwlzxa71#

看起来get safemode命令有问题,如果nn1没有运行,它不会检查nn2并返回错误。为了解决这个问题,我将步骤更改为:
1.使用hdfs namenode -bootstrapStandby的引导备用namenode文件系统映像将被复制到nn2并关闭。使用sudo service hadoop-hdfs-namenode start在nn2主机上启动namenode服务
1.关闭HDFS安全模式:hdfs dfsadmin -safemode leave
1.等待关闭安全模式:hdfs dfsadmin -safemode get

相关问题