我正在将hadoop从版本3.0.0升级到版本3.2.2。以下是我遵循的步骤:
1.获取活动的名称节点:
$ hdfs haadmin -getServiceState nn1
standby
$ hdfs haadmin -getServiceState nn2
active
1.打开安全模式并保存命名空间(在nn 2上运行的命令):
$ hdfs dfsadmin -safemode enter
$ hdfs dfsadmin -saveNamespace
1.停止所有Hadoop服务并升级所有节点上的二进制文件
1.在所需节点上启动zookeeper-failover-controller和日志节点
1.使用-upgrade -renameReserved
标志启动nn 2(最后一个活动名称节点)
$ hdfs --daemon start namenode -upgrade -renameReserved
从日志:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = <hostname2>/<IP2>
STARTUP_MSG: args = [-upgrade, -renameReserved]
STARTUP_MSG: version = 3.2.2
org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode [-upgrade, -renameReserved]
org.apache.hadoop.hdfs.server.namenode.FSImage: Starting upgrade of local storage directories.
old LV = -64; old CTime = 1636096354752.
new LV = -65; new CTime = 1642257088033
org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory
org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory
org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state
1.停止nn 2并正常启动,然后关闭safemode(我们已将这些定义为服务,因此将nn 2作为服务启动):
$ hdfs --daemon stop namenode
$ sudo service hadoop-hdfs-namenode start
从日志:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = <hostname2>/<IP2>
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.2.2
org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The reported blocks 0 needs additional 2537 blocks to reach the threshold 0.9990 of total blocks 2540.
The minimum number of live datanodes is not required. Safe mode will be turned off automatically once the thresholds have been reached.
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for standby state
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active node every 120 seconds.
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thread...
Checkpointing active NN to possible NNs: [http://<hostname1>:<port>]
Serving checkpoints at http://<hostname2>:<port>
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode
org.apache.hadoop.ipc.Client: Retrying connect to server: <hostname1>/<ip1>:<port>
1.在活动nn(本例中为nn 2)上关闭安全模式
$ hdfs dfsadmin -safemode leave
safemode: Call From <hostname2>/<IP2> to <hostname1>:<port> failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
1.等到安全模式关闭:
$ hdfs dfsadmin -safemode get
safemode: Call From <hostname2>/<IP2> to <hostname1>:<port> failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
我希望在nn 2上关闭安全模式,并在nn 1未运行时出现异常,但没有发生这种情况。除了尝试连接到<hostname1>/<IP1>:<port>
之外,我在日志中没有看到任何其他内容
在步骤8之后,我启动备用namenode并发出hdfs namenode -bootstrapStandby
,然后重新启动两个namenode并完成升级。
我已经测试了这些升级步骤至少20-25次,但这一次我在第7步卡住了.
由于nn 2(hostname 2)在升级前是活动的namenode,我希望它会再次出现并成为活动的namenode(并关闭安全模式),但在这种情况下没有发生。我找不到与它相关的任何内容,有人能帮助解决这个问题吗?
1条答案
按热度按时间nwwlzxa71#
看起来
get safemode
命令有问题,如果nn1没有运行,它不会检查nn2并返回错误。为了解决这个问题,我将步骤更改为:1.使用
hdfs namenode -bootstrapStandby
的引导备用namenode文件系统映像将被复制到nn2并关闭。使用sudo service hadoop-hdfs-namenode start
在nn2主机上启动namenode服务1.关闭HDFS安全模式:
hdfs dfsadmin -safemode leave
1.等待关闭安全模式:
hdfs dfsadmin -safemode get