java spark无法从spark sql中的本地文件系统加载文件

gstyhher  于 2021-05-17  发布在  Spark
关注(0)|答案(1)|浏览(609)

我是spark的新手,在没有显式集群的Ubuntu18.0上用java学习spark。我在java/main/resources文件夹的本地文件系统中保存了data.csv文件。
在执行下面的代码时,

SparkSession sparkSession = SparkSession.builder()
            .appName("sparksql").master("local[*]")
            .getOrCreate();

Dataset<Row> dataset = sparkSession.read()
                .option("header", true)
                .csv("/media/home/work/sparksamples/src/main/resources/exams/test.csv");

下面是错误:
20/11/23 16:07:46警告nativecodeloader:无法为您的平台加载本机hadoop库。。。在适用的情况下使用内置java类

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/DistributedFileSystem
    at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.listLeafFiles(InMemoryFileIndex.scala:316)
    at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.$anonfun$bulkListLeafFiles$1(InMemoryFileIndex.scala:195)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

Could 20/11/23 16:07:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/DistributedFileSystem
    at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.listLeafFiles(InMemoryFileIndex.scala:316)
    at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.$anonfun$bulkListLeafFiles$1(InMemoryFileIndex.scala:195)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

如何在ubuntu中不使用hdfs从本地文件系统加载文件?

thtygnil

thtygnil1#

这是由于最新版本3.3中缺少hadoop客户端jar。

<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.3.0</version>
  </dependency>

相关问题