Spark 1.5.1从Cassandra创建RDD(类未找到异常:数据表、Spark、连接器、JAPI、RDD、CassandraTableScanJavaRDD)

c9qzyr3d  于 2022-11-05  发布在  Cassandra
关注(0)|答案(2)|浏览(181)

我正在尝试从cassandra获取记录并创建rdd。

  1. JavaRDD<Encounters> rdd = javaFunctions(ctx).cassandraTable("kesyspace1", "employee", mapRowTo(Employee.class));

我在Spark 1.5.1上提交作业时遇到此错误

  1. Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/japi/rdd/CassandraTableScanJavaRDD
  2. at java.lang.Class.forName0(Native Method)
  3. at java.lang.Class.forName(Class.java:274)
  4. at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
  5. at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:56)
  6. at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
  7. Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.japi.rdd.CassandraTableScanJavaRDD
  8. at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  9. at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  10. at java.security.AccessController.doPrivileged(Native Method)
  11. at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  12. at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  13. at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

当前依赖项:

  1. <dependency>
  2. <groupId>org.apache.spark</groupId>
  3. <artifactId>spark-core_2.11</artifactId>
  4. <version>1.5.1</version>
  5. </dependency>
  6. <dependency>
  7. <groupId>org.apache.spark</groupId>
  8. <artifactId>spark-sql_2.11</artifactId>
  9. <version>1.5.1</version>
  10. </dependency>
  11. <dependency>
  12. <groupId>org.apache.hadoop</groupId>
  13. <artifactId>hadoop-client</artifactId>
  14. <version>2.7.1</version>
  15. </dependency>
  16. <dependency>
  17. <groupId>com.datastax.spark</groupId>
  18. <artifactId>spark-cassandra-connector-java_2.11</artifactId>
  19. <version>1.5.0-M2</version>
  20. </dependency>
  21. <dependency>
  22. <groupId>com.datastax.cassandra</groupId>
  23. <artifactId>cassandra-driver-core</artifactId>
  24. <version>3.0.0-alpha4</version>
  25. </dependency>

Java程式码:

  1. import com.tempTable.Encounters;
  2. import org.apache.spark.SparkContext;
  3. import org.apache.spark.api.java.JavaRDD;
  4. import org.apache.spark.SparkConf;
  5. import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
  6. import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapRowTo;
  7. Long now = new Date().getTime();
  8. SparkConf conf = new SparkConf(true)
  9. .setAppName("SparkSQLJob_" + now)
  10. set("spark.cassandra.connection.host", "192.168.1.75")
  11. set("spark.cassandra.connection.port", "9042");
  12. SparkContext ctx = new SparkContext(conf);
  13. JavaRDD<Encounters> rdd = javaFunctions(ctx).cassandraTable("keyspace1", "employee", mapRowTo(Employee.class));
  14. System.out.println("rdd count = "+rdd.count());

依赖项中的版本是否存在问题?
请帮助解决此错误。提前感谢。

dxxyhpgq

dxxyhpgq1#

你需要添加带有SparkConf的jar文件

  1. .setJars(Seq(System.getProperty("user.dir") + "/target/scala-2.10/sparktest.jar"))

有关详细信息,请参阅http://www.datastax.com/dev/blog/common-spark-troubleshooting

dl5txlt9

dl5txlt92#

简单的答案就是“

  • 您需要将所有依赖项捆绑在jar文件中 *

  • 执行程序计算机应在其类路径中包含所有相关jar文件 *
    使用gradle构建fatJar的解决方案:
  1. buildscript {
  2. dependencies {
  3. classpath 'com.github.jengelman.gradle.plugins:shadow:1.2.2'
  4. }
  5. repositories {
  6. jcenter()
  7. }
  8. }
  9. apply plugin: 'com.github.johnrengelman.shadow'

然后调用"gradle shadowJar"来构建jar文件。在提交作业之后,问题应该得到解决。

展开查看全部

相关问题