我正试图用Spark读取 cassandra 的数据。
DataFrame rdf = sqlContext.read().option("keyspace", "readypulse")
.option("table", "ig_posts")
.format("org.apache.spark.sql.cassandra").load();
rdf.registerTempTable("cassandra_table");
System.out.println(sqlContext.sql("select count(external_id) from cassandra_table").collect()[0].getLong(0));
任务失败,出现以下错误。我不明白为什么调用ShuffleMaptask,以及为什么将其转换为Task会出现问题。
16/03/30 02:27:15 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, ip-10-165-180-22.ec2.internal):
java.lang.ClassCastException:
org.apache.spark.scheduler.ShuffleMapTask
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/03/30 02:27:15 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor ip-10-165-180-22.ec2.internal:
java.lang.ClassCastException (org.apache.spark.scheduler.Shuf
fleMapTask
cannot be cast to org.apache.spark.scheduler.Task) [duplicate 1]
16/03/30 02:27:15 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
我使用的是EMR 4.4、Spark 1.6、Cassandra 2.2(Datastax社区)和spark-cassandra-connector-java_2.10 1.6.0-M1(还尝试了1.5.0)
我也尝试了同样的代码,但得到了同样的错误。
CassandraJavaRDD<CassandraRow> cjrdd = functions.cassandraTable(
KEYSPACE, tableName).select(columns);
logger.info("Got rows from cassandra " + cjrdd.count());
JavaRDD<Double> jrdd2 = cjrdd.map(new Function<CassandraRow, Double>() {
@Override
public Double call(CassandraRow trainingRow) throws Exception {
Object fCount = trainingRow.getRaw("follower_count");
double count = 0;
if (fCount != null) {
count = (Long) fCount;
}
return count;
}
});
logger.info("Mapper done : " + jrdd2.count());
logger.info("Mapper done values : " + jrdd2.collect());
2条答案
按热度按时间goucqfw61#
由于
--conf spark.executor.userClassPathFirst=true
,我最近一直遇到类似的问题。引用Spark的官方文件:
spark.executor.userClassPathFirst(实验性)与
spark.driver.userClassPathFirst
功能相同,但应用于执行器示例。我认为这些例外是由于一些jar版本冲突,并且根据spark文档,“用户的jar不应该包括Hadoop或Spark库,但是,这些库将在运行时添加。”
gk7wooem2#
我也在为同样的错误而挣扎。但是设置userClassPathFirst=true对我没有帮助。我设置了“spark.driver.userClassPathFirst=false”和“spark.executor.userClassPathFirst=false”。它解决了我的ClassCastException问题。下面是我的命令。
部署模式客户端--类组织.sdrc.kspmis.dashboardservice. ksp Jmeter 板服务应用程序--主Spark://192.168.1.95:7077 Jmeter 板-0.0.1-SNAPSHOT-shaded.jar