spark-ml

xwbd5t1u  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(446)

我尝试在一个完全不包含hadoop的环境中运行sparkml算法。
我还没有从教程和其他帖子中弄清楚这是否可行:
我可以不使用任何版本的hadoop和hdfs运行spark吗?或者我应该安装hadoop来激发灵感?
运行spark shell时,我收到以下消息:

  1. C:\spark-2.2.0-bin-without-hadoop\bin>spark-shell
  2. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
  3. at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:124)
  4. at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:124)
  5. at scala.Option.getOrElse(Option.scala:121)
  6. at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:124)
  7. at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:110)
  8. at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
  9. at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  10. Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
  11. at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  12. at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  13. at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
  14. at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  15. ... 7 more

下面是我的示例程序:

  1. package com.example.spark_example;
  2. import org.apache.spark.SparkConf;
  3. import org.apache.spark.api.java.JavaRDD;
  4. import org.apache.spark.api.java.JavaSparkContext;
  5. import org.apache.spark.api.java.function.Function;
  6. public class Main {
  7. public static void main(String[] args) {
  8. String logFile = "C:\\spark-2.2.0-bin-without-hadoop\\README.md"; // Should be some file on your system
  9. SparkConf conf = new SparkConf().setAppName("Simple Application");
  10. JavaSparkContext sc = new JavaSparkContext(conf);
  11. JavaRDD<String> logData = sc.textFile(logFile).cache();
  12. long numAs = logData.filter((Function<String, Boolean>) s -> s.contains("a")).count();
  13. long numBs = logData.filter((Function<String, Boolean>) s -> s.contains("b")).count();
  14. System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
  15. sc.stop();
  16. }
  17. }

导致以下异常:

  1. 17/08/10 15:23:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  2. 17/08/10 15:23:35 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
  3. java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
f4t66c6m

f4t66c6m1#

我可以不使用任何版本的hadoop运行spark吗
你不能。虽然spark不需要hadoop集群(yarn,hdfs),但它依赖于hadoop库。如果您没有提供这些功能的hadoop安装,请使用完整的构建描述为apachehadoop的预构建。在您的情况下:

  1. spark-2.2.0-bin-hadoop2.7
tuwxkamq

tuwxkamq2#

如果您下载了带有预构建包类型的apachespark,那么您就拥有了所需的所有库。要解决您的问题,您需要安装winutils—一个用于hadoop的windows库。
只需将所有文件从文件夹复制到您的文件夹

  1. %SPARK_HOME%\bin

并添加环境变量%hadoop\u home%,值为%spark\u home%

相关问题