正在重试连接到服务器:已尝试9次;重试策略是RetryUpMaximumCountWithFixedSleep(maxretries=10,sleeptime=1000毫秒)

w3nuxt5m  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(570)

我有三个物理节点。在每个节点中,我用这个命令输入docker。

  1. docker run -v /home/user/.ssh:/root/.ssh --privileged
  2. -p 5050:5050 -p 5051:5051 -p 5052:5052 -p 2181:2181 -p 8089:8081
  3. -p 6123:6123 -p 8084:8080 -p 50090:50090 -p 50070:50070
  4. -p 9000:9000 -p 2888:2888 -p 3888:3888 -p 4041:4040 -p 8020:8020
  5. -p 8485:8485 -p 7078:7077 -p 52222:22 -e WEAVE_CIDR=10.32.0.3/12
  6. -e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins
  7. -e LIBPROCESS_IP=10.32.0.3
  8. -e MESOS_RESOURCES=ports*:[11000-11999]
  9. -ti hadoop_marathon_mesos_flink_2 /bin/bash

我将hadoop配置为:core-site.xml:

  1. <configuration>
  2. <property>
  3. <name>fs.defaultFS</name>
  4. <value>hdfs://mycluster</value>
  5. </property>
  6. <property>
  7. <name>fs.default.name</name>
  8. <value>hdfs://mycluster</value>
  9. </property>
  10. </configuration>

hdfs-site.xml:

  1. <configuration>
  2. <property>
  3. <name>dfs.namenode.shared.edits.dir</name>
  4. <value>
  5. qjournal://10.32.0.1:8485;10.32.0.2:8485;10.32.0.3:8485/mycluster
  6. </value>
  7. </property>
  8. <property>
  9. <name>dfs.journalnode.edits.dir</name>
  10. <value>/tmp/hadoop/dfs/jn</value>
  11. </property>
  12. <property>
  13. <name>dfs.nameservices</name>
  14. <value>mycluster</value>
  15. <description>Logical name for this new
  16. nameservice</description>
  17. </property>
  18. <property>
  19. <name>dfs.ha.namenodes.mycluster</name>
  20. <value>nn1,nn2</value>
  21. <description>Unique identifiers for each NameNode in the
  22. nameservice</description>
  23. </property>
  24. <property>
  25. <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  26. <value>10.32.0.1:8020</value>
  27. </property>
  28. <property>
  29. <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  30. <value>10.32.0.2:8020</value>
  31. </property>
  32. <property>
  33. <name>dfs.namenode.http-address.mycluster.nn1</name>
  34. <value>10.32.0.1:50070</value>
  35. </property>
  36. <property>
  37. <name>dfs.namenode.http-address.mycluster.nn2</name>
  38. <value>10.32.0.2:50070</value>
  39. </property>
  40. <property>
  41. <name>dfs.client.failover.proxy.provider.mycluster</name>
  42. <value>
  43. org.apache.hadoop.hdfs.server.namenode.ha.
  44. ConfiguredFailoverProxyProvider
  45. </value>
  46. </property>
  47. <property>
  48. <name>dfs.ha.fencing.methods</name>
  49. <value>shell(/bin/true)</value>
  50. </property>
  51. <property>
  52. <name>dfs.replication</name>
  53. <value>1</value>
  54. </property>
  55. <property>
  56. <name>dfs.namenode.name.dir</name>
  57. <value>file:///usr/local/hadoop_store/hdfs/namenode</value>
  58. </property>
  59. <property>
  60. <name>dfs.datanode.data.dir</name>
  61. <value>file:///usr/local/hadoop_store/hdfs/datanode</value>
  62. </property>
  63. <property>
  64. <name>dfs.namenode.datanode.registration.
  65. ip-hostname-check</name>
  66. <value>false</value>
  67. </property>
  68. <property>
  69. <name>dfs.ha.automatic-failover.enabled</name>
  70. <value>true</value>
  71. </property>
  72. <property>
  73. <name>ha.zookeeper.quorum</name>
  74. <value>10.32.0.1:2181,10.32.0.2:2181,10.32.0.3:2181</value>
  75. </property>
  76. </configuration>

问题在于格式化namenode时:

  1. hadoop namenode -format

它不能格式化namenode。我收到这个错误:
2019-05-06 06:35:09969 info ipc.client:正在重试连接到服务器:10.32.0.2/10.32.0.2:8485。已尝试9次;重试策略是RetryUpMaximumCountWithFixedSleep(maxretries=10,sleeptime=1000毫秒)
2019-05-06 06:35:09969 info ipc.client:正在重试连接到服务器:10.32.0.3/10.32.0.3:8485。已尝试9次;重试策略是RetryUpMaximumCountWithFixedSleep(maxretries=10,sleeptime=1000毫秒)
2019-05-06 06:35:09987错误namenode.namenode:无法启动namenode。org.apache.hadoop.hdfs.qjournal.client.quorumexception:无法检查jns是否准备好格式化。1引发异常:
10.32.0.1:8485:从50c5244de4cd/10.32.0.1调用50c5244de4cd:8485连接失败异常:java.net.connectexception:连接被拒绝;有关详细信息,请参阅:http://wiki.apache.org/hadoop/connectionrefused
我已经发布了hadoop中需要的端口,但是我仍然收到了拒绝的连接。
有人能告诉我配置有什么问题吗?
先谢谢你。

agyaoht7

agyaoht71#

解决这个问题的原因是core-site.xml中的zookeeper配置。我将在以下内容中详细介绍高可用性hadoop配置:hdfs-site.xml:

  1. <property>
  2. <name>dfs.nameservices</name>
  3. <value>mycluster</value>
  4. <description>Logical name for this new nameservice</description>
  5. </property>
  6. <property>
  7. <name>dfs.ha.namenodes.mycluster</name>
  8. <value>nn1,nn2</value>
  9. <description>Unique identifiers for each NameNode in
  10. the nameservice</description>
  11. </property>
  12. <property>
  13. <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  14. <value>10.32.0.1:8020</value>
  15. </property>
  16. <property>
  17. <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  18. <value>10.32.0.2:8020</value>
  19. </property>
  20. <property>
  21. <name>dfs.namenode.http-address.mycluster.nn1</name>
  22. <value>10.32.0.1:50070</value>
  23. </property>
  24. <property>
  25. <name>dfs.namenode.http-address.mycluster.nn2</name>
  26. <value>10.32.0.2:50070</value>
  27. </property>
  28. <property>
  29. <name>dfs.client.failover.proxy.provider.mycluster</name>
  30. <value>org.apache.hadoop.hdfs.
  31. server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  32. </property>
  33. <property>
  34. <name>dfs.namenode.shared.edits.dir</name>
  35. <value>
  36. qjournal://10.32.0.1:8485;10.32.0.2:8485;10.32.0.3:8485/mycluster
  37. </value>
  38. </property>
  39. <property>
  40. <name>dfs.permissions.enable</name>
  41. <value> false </value>
  42. </property>
  43. <property>
  44. <name>dfs.ha.fencing.methods</name>
  45. <value>sshfence</value>
  46. </property>
  47. <property>
  48. <name>dfs.ha.fencing.ssh.private-key-files</name>
  49. <value>/home/hdfs/.ssh/id_rsa</value>
  50. </property>
  51. <property>
  52. <name>dfs.ha.fencing.ssh.connect-timeout</name>
  53. <value>30000</value>
  54. </property>
  55. <property>
  56. <name>dfs.permissions.superusergroup</name>
  57. <value>hdfs</value>
  58. </property>
  59. <property>
  60. <name>dfs.replication</name>
  61. <value>1</value>
  62. </property>
  63. <property>
  64. <name>dfs.namenode.name.dir</name>
  65. <value>file:///usr/local/hadoop_store/hdfs/namenode</value>
  66. </property>
  67. <property>
  68. <name>dfs.datanode.data.dir</name>
  69. <value>file:///usr/local/hadoop_store/hdfs/datanode</value>
  70. </property>
  71. <property>
  72. <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
  73. <value>false</value>
  74. </property>
  75. <property>
  76. <name>dfs.ha.automatic-failover.enabled</name>
  77. <value>true</value>
  78. </property>

core-site.xml(例如在节点“10.32.0.1”中):

  1. <property>
  2. <name>fs.defaultFS</name>
  3. <value>hdfs://mycluster</value>
  4. </property>
  5. <property>
  6. <name>dfs.journalnode.edits.dir</name>
  7. <value>/tmp/hadoop/dfs/journalnode</value>
  8. </property>
  9. <property>
  10. <name>fs.default.name</name>
  11. <value>hdfs://mycluster</value>
  12. </property>
  13. <property>
  14. <name>ha.zookeeper.quorum</name>
  15. <value>0.0.0.0:2181,10.32.0.2:2181,10.32.0.3:2181</value>
  16. </property>

zookeeper配置,例如“10.32.0.1”中的配置是:

  1. server.1=0.0.0.0:2888:3888
  2. server.2=10.32.0.2:2888:3888
  3. server.3=10.32.0.3:2888:3888

另外,我在/var/lib/zookeeper/data中创建了myid文件,其中包含该节点的id。首先,删除以下所有文件夹:

  1. rm -rf /tmp/hadoop/dfs/journalnode
  2. rm -rf /usr/local/hadoop_store/hdfs/namenode
  3. rm -rf /usr/local/hadoop_store/hdfs/datanode
  4. rm -rf /opt/hadoop/logs/*

然后,创建以下文件夹:

  1. mkdir /usr/local/hadoop_store/hdfs/namenode
  2. mkdir /usr/local/hadoop_store/hdfs/datanode

然后,对这些文件夹授予正确的权限:

  1. chmod 777 /usr/local/hadoop_store/hdfs/namenode
  2. chmod 777 /usr/local/hadoop_store/hdfs/datanode
  3. chown -R root /usr/local/hadoop_store/hdfs/namenode
  4. chown -R root /usr/local/hadoop_store/hdfs/datanode
  5. chmod 777 /tmp/hadoop/dfs/journalnode
  6. chown -R root /tmp/hadoop/dfs/journalnode

现在您可以按照此阶段格式化这些文件夹。最重要的是如何格式化这三个节点。你必须遵循以下几个阶段:1。停止hdfs服务2。只启动日志节点(因为它们需要知道格式)

  1. /opt/hadoop/bin/hdfs --daemon start journalnode

在第一个namenode上(作为用户hdfs或root)
hadoop namenode-格式
在日志节点上:
hdfs namenode-initializesharededits-force
重新启动zookeeper:
/home/zookeeper-3.4.14/bin/zkserver.sh重新启动
格式zookeeper:

  1. hdfs zkfc -formatZK -force (to force zookeeper to reinitialise)

重新启动第一个namenode:

  1. /opt/hadoop/bin/hdfs --daemon start namenode

在第二个namenode上:

  1. hdfs namenode -bootstrapStandby -force ​(force synch with first namenode)

在每个数据节点上清除数据目录:

  1. hadoop datanode -format

重新启动hdfs服务:

  1. /opt/hadoop/sbin/start-dfs.sh

顺便说一下,我有三个节点,两个namenodes和一个datanode。您可以在/opt/hadoop/logs/中检查hadoop日志。

展开查看全部

相关问题