spark从不停止处理第一批数据

r55awzrz  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(374)

我正在尝试运行在github上找到的一个应用程序,这个:https://github.com/csirt-mu/aida-framework
我在一个ubuntu18.04.1虚拟机上运行它。在它的数据处理管道的某个点上,它使用了spark,似乎在这一点上卡住了。我可以从webui看到,我发送到那里的一些数据是作为一个批接收的。但是,它似乎从未完成对第一批的处理(即使其中有0条记录)。不幸的是,我没有经验的Spark和不知道究竟是失败的。在搜索修复程序时,我遇到了这样的建议:可能没有足够的内核供所有执行者使用。我试图增加核心到3,但这没有帮助。
我已经提供了所有的网页界面,我希望他们显示的问题足够清楚。有人知道我做错了什么吗?
屏幕截图:spark 1spark 2spark 3spark 4spark 5spark 6
排队的和不完整的批处理作业的输出是

callForeachRDD at NativeMethodAccessorImpl.java:0
org.apache.spark.streaming.api.python.PythonDStream.callForeachRDD(PythonDStream.scala)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    java.lang.reflect.Method.invoke(Method.java:498)
    py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    py4j.Gateway.invoke(Gateway.java:282)
    py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    py4j.commands.CallCommand.execute(CallCommand.java:79)
    py4j.GatewayConnection.run(GatewayConnection.java:238)
    java.lang.Thread.run(Thread.java:748)

编辑:我注意到在进程启动时会记录错误。我现在才意识到,因为这个过程没有停止。错误包括:

May 11 18:13:04 aida bash[5323]: 20/05/11 18:13:04 ERROR Utils: Uncaught exception in thread stdout writer for python3
May 11 18:13:04 aida bash[5323]: java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:163)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:209)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:596)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:886)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:100)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
May 11 18:13:04 aida bash[5323]:         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
May 11 18:13:04 aida bash[5323]: Exception in thread "stdout writer for python3" java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:163)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:209)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:596)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:886)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:100)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
May 11 18:13:04 aida bash[5323]:         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
May 11 18:13:04 aida bash[5323]: 20/05/11 18:13:04 ERROR Utils: Uncaught exception in thread stdout writer for python3
May 11 18:13:04 aida bash[5323]: java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:163)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:209)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:596)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:886)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:100)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
May 11 18:13:04 aida bash[5323]:         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
May 11 18:13:04 aida bash[5323]: Exception in thread "stdout writer for python3" java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:163)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:209)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:596)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:886)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:100)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
May 11 18:13:04 aida bash[5323]:         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
May 11 18:13:04 aida bash[5323]: 20/05/11 18:13:04 ERROR Utils: Uncaught exception in thread stdout writer for python3
May 11 18:13:04 aida bash[5323]: java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:163)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:209)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:596)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:886)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:100)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
May 11 18:13:04 aida bash[5323]:         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
May 11 18:13:04 aida bash[5323]: Exception in thread "stdout writer for python3" java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:163)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:209)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:596)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:886)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:100)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
May 11 18:13:04 aida bash[5323]:         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
May 11 18:13:04 aida bash[5323]:         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
May 11 18:13:04 aida bash[5323]:         at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)

有人能帮我解决这些错误吗?

ecfdbz9o

ecfdbz9o1#

Kafka与spark jars之间有着矛盾的依赖关系。。。所以要么不使用lz压缩,要么使用snappy压缩,这样就可以了
或者按照这里的答案来解决冲突的jar。

相关问题