nameservice1和nameservice2之间的distcp

bvuwiixz  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(471)

我们有cdh 5.2和cloudera manager 5。
我们要将数据从nameservice2复制到nameservice1
两个集群在同一cdh版本上
当我试着 hadoop distcp hdfs://nameservice2/foo/bar hdfs://nameservice1/bar/foo 我出错了 java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice2 所以我将以下配置从nameservice2添加到nameservice1
cloudera manager(网关默认组)中hdfs-site.xml的hdfs客户端高级配置片段(安全阀)

<property>
<name>dfs.nameservices</name>
<value>nameservices2</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.nameservices2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices2</name>
<value>namenode36,namenode405</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.cc:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50470</value>
</property>

但我还是犯了同样的错误。
有解决办法吗?
谢谢

z9ju0rcb

z9ju0rcb1#

在启用ha的hdfs namenode nameservice1中,nameservice2是逻辑名称,不能将端口与该逻辑名称一起使用。
你有两种方法。
简单的方法是找到活动namenodes并使用活动namenodesnamenode:port in distcp命令如下所示。namenodewebui可用于查找两个集群的活动namenodes。

hadoop distcp hdfs://hnn001.prod.cc:8020:8020/foo/bar hdfs://<dest-cluster-active-nn-hostname>:8020/bar/foo

另一种方法是使用两个集群的逻辑名称,如下所示,但是在尝试下面的命令之前,请确保您已经在客户端hdfs-site.xml中正确配置了nameservice1和nameservice2。

hadoop distcp hdfs://nameservice2/foo/bar hdfs://nameservice1/bar/foo

正在本地群集中配置远程群集的名称服务。
看起来nameservice2是本地的,nameservice1是远程的。您需要在本地集群ie中保留nameservice1和nameservice2的所有相关属性。

<configuration>
<!-- Available nameservices -->
<property>
<name>dfs.nameservices</name>
<value>nameservices1,nameservices2</value>
</property>

<!-- Local nameservice2 properties -->
<property>
<name>dfs.client.failover.proxy.provider.nameservices2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices2</name>
<value>namenode36,namenode405</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.cc:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50470</value>
</property>

<!-- Remote nameservice1 properties -->
<!-- You can find these properties in the remote machine's hdfs-site.xml file -->

<property>
<name>dfs.client.failover.proxy.provider.nameservices1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices1</name>
<value>namenodeXX,namenodeYY</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices1.namenode**XX**</name>
<value><Remote-nn1>:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:50470</value>
</property>

<!-- Other properties --> 

</configuration>

在上面的配置文件中,用远程机器的hdfs site.xml中的相应值替换yy xx等所有占位符。

相关问题