sbt项目java.io.filenotfoundexception:filenotfoundexception:hadoop\u home unset

yx2lnoni  于 2021-06-01  发布在  Hadoop
关注(0)|答案(1)|浏览(653)

我正在尝试使用avroparquetwriter将avro格式的文件转换为Parquet文件。我加载模式

val schema:org.apache.Schema = ... getSchema(...)
val parquetFile = new Path("Location/for/parquetFile.txt")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,schema)

在初始化avroparquetwriter之前,我的代码运行良好。然后抛出以下错误:

> java.lang.RuntimeException: java.io.FileNotFoundException:
> java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are
> unset. -see https://wiki.apache.org/hadoop/WindowsProblems    at
> org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:722)  at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:256)
>   at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:273)
>   at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:767)
>   at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:235)...etc

它给出的建议,以及我发现的建议,都与如何在机器上运行hadoop集群有关。但是,我并不是在运行hadoop集群,也不是打算这么做。我已经导入了它的一些库,以便在sbt文件中与我的程序的其他部分一起使用,但这并没有启动本地集群。
它刚开始这么做。在我的另外两个同事中,一个能够毫无问题地运行这个程序,另一个刚开始遇到和我一样的问题。以下是my build.sbt的相关部分:

lazy val root = (project in file("."))
  .settings(
    commonSettings,
    name := "My project",
    version := "0.1",
        libraryDependencies ++= Seq(
          "org.apache.hadoop" % "hadoop-common" % "2.9.0",
      "com.typesafe.akka" %% "akka-actor" % "2.5.2",
      "com.lightbend.akka" %% "akka-stream-alpakka-s3" % "0.9",
      "com.enragedginger" % "akka-quartz-scheduler_2.12" % "1.6.0-akka-2.4.x",
      "com.typesafe.akka" % "akka-agent_2.12" % "2.5.2",
      "com.typesafe.akka" % "akka-remote_2.12" % "2.5.2",
      "com.typesafe.akka" % "akka-stream_2.12" % "2.5.2",
      "org.apache.kafka" % "kafka-clients" % "0.10.2.1",
      "com.typesafe.akka" %% "akka-stream-kafka" % "0.16",
      "com.typesafe.akka" %% "akka-persistence" % "2.5.2",
      "org.iq80.leveldb"            % "leveldb" % "0.7",
      "org.fusesource.leveldbjni"   % "leveldbjni-all"   % "1.8",
      "javax.mail" % "javax.mail-api" % "1.5.6",
      "com.sun.mail" % "javax.mail" % "1.5.6",
      "commons-io" % "commons-io" % "2.5",
      "org.apache.avro" % "avro" % "1.8.1",
      "net.liftweb" % "lift-json_2.12" % "3.1.0-M1",
      "com.google.code.gson" % "gson" % "2.8.1",
      "org.json4s" %% "json4s-jackson" % "3.5.2",
      "com.amazonaws" % "aws-java-sdk-s3" % "1.11.149",
          //"com.amazonaws" % "aws-java-sdk" % "1.11.286",
      "org.scalikejdbc" %% "scalikejdbc"         % "3.0.0",
      "org.scalikejdbc" %% "scalikejdbc-config"  % "3.0.0",
      "org.scalikejdbc" % "scalikejdbc-interpolation_2.12" % "3.0.2",
      "com.microsoft.sqlserver" % "mssql-jdbc" % "6.1.0.jre8",
      "org.apache.commons" % "commons-pool2" % "2.4.2",
      "commons-pool" % "commons-pool" % "1.6",
      "com.jcraft" % "jsch" % "0.1.54",
      "ch.qos.logback" % "logback-classic" % "1.2.3",
      "com.typesafe.scala-logging" %% "scala-logging" % "3.7.2",
      "org.scalactic" %% "scalactic" % "3.0.4",
          "mysql" % "mysql-connector-java" % "8.0.8-dmr",
      "org.scalatest" %% "scalatest" % "3.0.4" % "test"
        )
  )

关于为什么它不能运行hadoop相关的依赖项,有什么想法吗?

unftdfkk

unftdfkk1#

答案是听从他们的建议-
我从下载了最新版本的winutils.exehttps://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin
然后我在中手动创建了这个目录结构 C:/Users/MyName/Hadoop/bin -请注意 bin 一定在那儿。你可以打电话给 Hadoop /不管你想要什么目录,但是 bin/ 必须在一个级别内。
我把winutils.exe放进了垃圾箱。
在我的代码中,我必须在初始化parquet writer(我可以想象它可以在初始化之前的任何时候)上面放置这一行来设置hadoop行:

System.setProperty("hadoop.home.dir", "C:/Users/nhanak/Hadoop/")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,iInfo.schema)

可选-如果您只想将其保留在项目中,而不想将其转移到本地计算机上,或者其他人将要删除此repo,或者您想将其打包到一个jar中以发送到任何地方,等等-在项目中创建一个目录结构,并在其中存储winutils.exe-假设您创建了目录结构 src/main/resources/HadoopResources/bin 在项目中,将winutils.exe放在bin中。然后,要使用winutils.exe,需要如下设置hadoop主页:

val file = new File("src/main/resources/HadoopResources")
      System.setProperty("hadoop.home.dir", file.getAbsolutePath)

相关问题