hadoop aws和spark不兼容

sczxawaw  于 2021-07-13  发布在  Hadoop
关注(0)|答案(1)|浏览(488)

我有一个非常奇怪的依赖错误:
我有一个简单的scala代码:

  1. val spark: SparkSession = SparkSession.builder()
  2. .master("local[1]")
  3. .appName("HDFStoAWSExample")
  4. .getOrCreate()
  5. spark.sparkContext
  6. .hadoopConfiguration.set("fs.s3a.access.key", "ACCESS_KEY")
  7. spark.sparkContext
  8. .hadoopConfiguration.set("fs.s3a.secret.key", "SECRET_KEY")
  9. spark.sparkContext
  10. .hadoopConfiguration.set("fs.s3a.endpoint", "s3.amazonaws.com")
  11. spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true")
  12. val hdfsCSV = spark.read.option("header",true).csv("hdfs://localhost:19000/testCSV.csv")
  13. hdfsCSV.show()
  14. hdfsCSV.write.parquet("s3a://test/parquet/abcCSV")

使用这个简单的sbt文件:

  1. name := "spark-amazon-s3-parquet"
  2. scalaVersion := "2.12.12"
  3. val sparkVersion = "3.0.1"
  4. libraryDependencies += "log4j" % "log4j" % "1.2.17"
  5. libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion
  6. libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion
  7. libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.3.0"
  8. libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.0"
  9. updateOptions := updateOptions.value.withCachedResolution(true)

现在,当我试图编写拼图时,它抱怨缺少类或方法,比如org/apache/hadoop/tracing/spanreceiverhost(最后是完整的堆栈跟踪)
我曾经尝试过使用hadoopcommon和aws的2.7.3版本,但是s3抱怨了400个错误的请求(与之前的代码相同,只是在sbt中更改了common和aws的版本)
有人知道wtf正在使用hadoopcommon和hadoopaws吗?
满栈:

  1. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/tracing/SpanReceiverHost
  2. at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:634)
  3. at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
  4. at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
  5. at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
  6. at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
  7. at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
  8. at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
  9. at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
  10. at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
  11. at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
  12. at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
  13. at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
  14. at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
  15. at scala.Option.getOrElse(Option.scala:189)
  16. at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
  17. at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:723)
  18. at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:553)
  19. at HDFStoAWSExample$.delayedEndpoint$HDFStoAWSExample$1(HDFStoAWSExample.scala:16)
  20. at HDFStoAWSExample$delayedInit$body.apply(HDFStoAWSExample.scala:3)
  21. at scala.Function0.apply$mcV$sp(Function0.scala:39)
  22. at scala.Function0.apply$mcV$sp$(Function0.scala:39)
  23. at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
  24. at scala.App.$anonfun$main$1$adapted(App.scala:80)
  25. at scala.collection.immutable.List.foreach(List.scala:431)
  26. at scala.App.main(App.scala:80)
  27. at scala.App.main$(App.scala:78)
  28. at HDFStoAWSExample$.main(HDFStoAWSExample.scala:3)
  29. at HDFStoAWSExample.main(HDFStoAWSExample.scala)
  30. Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.tracing.SpanReceiverHost
  31. at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  32. at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  33. at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
  34. at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  35. ... 28 more

ps:我的hadoop配置没有问题,我可以读写它

ulmd4ohb

ulmd4ohb1#

如前所述,您可能需要提供 hadoop-client 作为一种依赖。

相关问题