hadoop 从客户端程序访问配置为高可用性的HDFS

jqjz2hbq  于 2022-11-01  发布在  Hadoop
关注(0)|答案(1)|浏览(396)

我试图了解通过HDFS集群外部的名称服务(连接活动名称节点-高可用性名称节点)连接HDFS的工作和不工作程序的上下文。
非工作程序:
当我读取两个配置文件(core-site.xml和hdfs-site.xml)并访问HDFS文件时,会抛出错误

  1. import org.apache.hadoop.conf.Configuration
  2. import org.apache.hadoop.fs.{FileSystem, Path}
  3. object HadoopAccess {
  4. def main(args: Array[String]): Unit ={
  5. val hadoopConf = new Configuration(false)
  6. val coreSiteXML = "C:\\Users\\507\\conf\\core-site.xml"
  7. val HDFSSiteXML = "C:\\Users\\507\\conf\\hdfs-site.xml"
  8. hadoopConf.addResource(new Path("file:///" + coreSiteXML))
  9. hadoopConf.addResource(new Path("file:///" + HDFSSiteXML))
  10. println("hadoopConf : " + hadoopConf.get("fs.defaultFS"))
  11. val fs = FileSystem.get(hadoopConf)
  12. val check = fs.exists(new Path("/apps/hive"));
  13. //println("Checked : "+ check)
  14. }
  15. }

错误:我们看到Unknownhost异常错误
Hadoop配置:

  1. hdfs://mycluster
  2. Configuration: file:/C:/Users/64507/conf/core-site.xml, file:/C:/Users/64507/conf/hdfs-site.xml
  3. log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
  4. log4j:WARN Please initialize the log4j system properly.
  5. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
  6. Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster
  7. at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
  8. at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
  9. at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
  10. at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
  11. at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
  12. at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
  13. at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
  14. at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
  15. at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
  16. at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
  17. at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
  18. at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
  19. at HadoopAccess$.main(HadoopAccess.scala:28)
  20. at HadoopAccess.main(HadoopAccess.scala)
  21. Caused by: java.net.UnknownHostException: mycluster

工作程序:我专门将High availability设置为hadoopConf对象并传递给Filesystem对象,程序工作

  1. import org.apache.hadoop.conf.Configuration
  2. import org.apache.hadoop.fs.{FileSystem, Path}
  3. object HadoopAccess {
  4. def main(args: Array[String]): Unit ={
  5. val hadoopConf = new Configuration(false)
  6. val coreSiteXML = "C:\\Users\\507\\conf\\core-site.xml"
  7. val HDFSSiteXML = "C:\\Users\\507\\conf\\hdfs-site.xml"
  8. hadoopConf.addResource(new Path("file:///" + coreSiteXML))
  9. hadoopConf.addResource(new Path("file:///" + HDFSSiteXML))
  10. hadoopConf.set("fs.defaultFS", hadoopConf.get("fs.defaultFS"))
  11. //hadoopConf.set("fs.defaultFS", "hdfs://mycluster")
  12. //hadoopConf.set("fs.default.name", hadoopConf.get("fs.defaultFS"))
  13. hadoopConf.set("dfs.nameservices", hadoopConf.get("dfs.nameservices"))
  14. hadoopConf.set("dfs.ha.namenodes.mycluster", "nn1,nn2")
  15. hadoopConf.set("dfs.namenode.rpc-address.mycluster.nn1", "namenode1:8020")
  16. hadoopConf.set("dfs.namenode.rpc-address.mycluster.nn2", "namenode2:8020")
  17. hadoopConf.set("dfs.client.failover.proxy.provider.mycluster",
  18. "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider")
  19. println(hadoopConf)
  20. /* val namenode = hadoopConf.get("fs.defaultFS")
  21. println("namenode: "+ namenode) */
  22. val fs = FileSystem.get(hadoopConf)
  23. val check = fs.exists(new Path("hdfs://mycluster/apps/hive"));
  24. //println("Checked : "+ check)
  25. }
  26. }

我们需要在hadoopconf对象中设置此配置(如dfs.nameservices,fs.client.failover.proxy.provider.mycluster,dfs.namenode.rpc-address.mycluster.nn1)的值的任何原因,因为此值已存在于hdfs-site.xml文件和core-site.xml中。这些配置是高可用性名称节点设置。
上面的程序,我正在运行通过边缘模式或本地IntelliJ.
Hadoop版本:霍顿2.7.3.2工厂:2.6.1
我在Spark Scala REPL中的观察:
当我执行val hadoopConf = new Configuration(false)val fs = FileSystem.get(hadoopConf)时。这会给我本地文件系统。所以当我执行下面的

  1. hadoopConf.addResource(new Path("file:///" + coreSiteXML))
  2. hadoopConf.addResource(new Path("file:///" + HDFSSiteXML))

,现在文件系统改为DFSFileSyyem..我的假设是,Spark中的某些客户端库在构建期间的某个地方或边缘节点的公共位置不可用。

kgqe7b3p

kgqe7b3p1#

Spark中的某些客户端库在构建期间或边缘节点公共位置的某个地方不可用
这个常见的地方是$SPARK_HOME/conf和/或$HADOOP_CONF_DIR,但是如果你只是使用java jar或IntelliJ运行一个普通的Scala应用,这与Spark无关。
...此值已存在于hdfs-site.xml文件和core-site.xml中
然后,应该相应地读取它们,但是在代码中重写也不会有什么坏处。
这些值是必需的,因为它们指示实际namenode运行的位置;否则,它会认为mycluster是唯一一个服务器的真实的DNS名称,而实际上不是

相关问题