当我使用java web运行应用程序时,出现了以下错误 cassandra 3.11.9和spark 3.0.1。
我的问题是,为什么只有在部署应用程序之后才会发生这种情况?在开发环境中,它没有发生。
2021-03-24 08:50:41.150信息19613---[uler event loop]org.apache.spark.scheduler.dagscheduler:shufflemapstage 0(falhaservice的collectaslist)。java:60)在7.513秒内失败,原因是由于阶段失败而中止作业:阶段0.0中的任务0失败1次,最近的失败:在阶段0.0(tid 0)中丢失任务0.0(gdbhml08执行器驱动程序):java.lang.arithmeticexception:在java.lang.math.tointexact(math)处整数溢出。java:1011)在org.apache.spark.sql.catalyst.util.datetimeutils$.fromjavadate(datetimeutils。scala:90)在org.apache.spark.sql.catalysttypeconverters$dateconverter$.tocatalystimpl(catalysttypeconverters)。scala:306)在org.apache.spark.sql.catalysttypeconverters$dateconverter$.tocatalystimpl(catalysttypeconverters)上。scala:305)在org.apache.spark.sql.catalyst.catalysttypeconverters$catalysttypeconverter.tocatalyst(catalysttypeconverters)。scala:107)在org.apache.spark.sql.catalyst.catalysttypeconverters$structconverter.tocatalystimpl(catalysttypeconverters。scala:252)在org.apache.spark.sql.catalyst.catalysttypeconverters$structconverter.tocatalystimpl(catalysttypeconverters)。scala:242)位于org.apache.spark.sql.catalyst.catalysttypeconverters$catalysttypeconverter.tocatalyst(catalysttypeconverters)。scala:107)在org.apache.spark.sql.catalysttypeconverters$.$anonfun$createtocatalystconverter$2(catalysttypeconverters)。scala:426)在com.datastax.spark.connector.datasource.unsaferowreader.read(unsaferowreaderfactory。scala:34)在com.datastax.spark.connector.datasource.unsaferowreader.read(unsaferowreaderfactory。scala:21)在com.datastax.spark.connector.datasource.cassandrapartitionreaderbase.$anonfun$getiterator$2(cassandrascanpartitionreaderfactory。scala:110)在scala.collection.iterator$$anon$10.next(iterator。scala:461)在scala.collection.iterator$$anon$11.next(iterator。scala:496)在com.datastax.spark.connector.datasource.cassandrapartitionreaderbase.next(cassandrascanpartitionreaderfactory。scala:66)位于org.apache.spark.sql.execution.datasources.v2.partitioniterator.hasnext(datasourcerdd。scala:79)在org.apache.spark.sql.execution.datasources.v2.metricsiterator.hasnext(datasourcerdd。scala:112)在org.apache.spark.interruptibleiterator.hasnext(interruptibleiterator。scala:37)在scala.collection.iterator$$anon$10.hasnext(iterator。scala:460)位于org.apache.spark.sql.catalyst.expressions.generatedclass$generatederatorforcodegenstage1.agg\u doaggregatewithkeys\u 0$(未知源代码)org.apache.spark.sql.catalyst.expressions.generatedclass$generateEditorForCodeGenStage1.processnext(未知源代码),位于org.apache.spark.sql.execution.bufferedrowiterator.hasnext(bufferedrowiterator)。java:43)在org.apache.spark.sql.execution.whistagecodegenexec$$anon$1.hasnext(whistagecodegenexec。scala:755)在scala.collection.iterator$$anon$10.hasnext(迭代器。scala:460)在org.apache.spark.shuffle.sort.bypassmergesortshufflewriter.write(bypassmergesortshufflewriter。java:132)在org.apache.spark.shuffle.shufflewriteprocessor.write(shufflewriteprocessor。scala:59)在org.apache.spark.scheduler.shufflemaptask.runtask(shufflemaptask。scala:99)在org.apache.spark.scheduler.shufflemaptask.runtask(shufflemaptask。scala:52)在org.apache.spark.scheduler.task.run(task。scala:131)在org.apache.spark.executor.executor$taskrunner.$anonfun$run$3(executor。scala:497)在org.apache.spark.util.utils$.trywithsafefinally(utils。scala:1439)在org.apache.spark.executor.executor$taskrunner.run(executor。scala:500)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1149)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:624)在java.lang.thread.run(线程。java:748)
驱动程序stacktrace:2021-03-24 08:50:41.189 info 19613---[nio-8080-exec-2]org.apache.spark.scheduler.dagscheduler:作业0失败:falhaservice的collectaslist。java:60,耗时8.160348秒
此错误中的行代码:
List<Row> rows = dataset.collectAsList();
代码块:
Dataset<Row> dataset = session.sql(sql.toString());
List<Row> rows = dataset.collectAsList();
ListIterator<Row> t = rows.listIterator();
while (t.hasNext()) {
Row row = t.next();
grafico = new EstGraficoRelEstTela();
grafico.setSuperficie(row.getLong(0));
grafico.setSubsea(row.getLong(1) + row.getLong(2));
grafico.setNomeTipoSensor(row.getString(3));
graficoLocalFalhas.add(grafico);
}
session.close();
谢谢,
1条答案
按热度按时间o4hqfura1#
看起来数据库中的数据不正确,某个日期字段是很遥远的将来。如果您查看源代码,您会发现它首先转换为毫秒,然后转换为天,这个转换会溢出整数。这也许可以解释为什么代码在开发环境中工作。。。
您可以要求管理员检查文件中是否有损坏的数据,例如,使用nodetool scrub命令。
p、 你确定你用的是spark 3.0.1吗?函数在错误中的位置与spark 3.1.1匹配。。。