在运行pyspark的aws ec2示例上使用以下命令。
final_rdd.coalesce(1).saveAsTextFile('<Location for saving file>')
命令失败,日志如下。
[第1阶段:>(0+1)/1]19/06/12 05:08:41警告tasksetmanager:第1.0阶段中丢失任务0.0(tid 7,ip-10-145-62-182.ec2.internal,执行器2):org.apache.spark.sparkexception:在org.apache.spark.internal.io.sparkhadoopwriter$.org$apache$spark$internal$io$sparkhadoopwriter$$executetask(sparkhadoopwriter)写入行时任务失败。scala:155)在org.apache.spark.internal.io.sparkhadoopwriter$$anonfun$3.apply(sparkhadoopwriter。scala:83)在org.apache.spark.internal.io.sparkhadoopwriter$$anonfun$3.apply(sparkhadoopwriter。scala:78)在org.apache.spark.scheduler.resulttask.runtask(resulttask。scala:90)在org.apache.spark.scheduler.task.run(task。scala:121)在org.apache.spark.executor.executor$taskrunner$$anonfun$10.apply(executor。scala:402)在org.apache.spark.util.utils$.trywithsafefinally(utils。scala:1360)在org.apache.spark.executor.executor$taskrunner.run(executor。scala:408)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1149)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:624)在java.lang.thread.run(线程。java:748) 引起原因:org.apache.spark.api.python.python异常:traceback(最近一次调用):file“/mnt/yarn/usercache/hadoop/appcache/application\u 1556865500911\u 0446/container\u 1556865500911\u 0446\u 01\u000003/pyspark.zip/pyspark/worker.py”,第262行,main(“%d.%d”%sys.version\u info[:2],版本)异常:worker中的python版本2.7与驱动程序3.5中的版本不同,Pypark无法使用不同的次要版本运行。请检查环境变量Pypark\ u python和Pypark\ u driver\ u python是否正确设置。位于org.apache.spark.api.python.basepythonrunner$readeriterator.handlepythonexception(pythonrunner)。scala:452)在org.apache.spark.api.pythonrunner$$anon$1.read(pythonrunner。scala:588)在org.apache.spark.api.pythonrunner$$anon$1.read(pythonrunner。scala:571)在org.apache.spark.api.python.basepythonrunner$readeriterator.hasnext(pythonrunner。scala:406)在org.apache.spark.interruptibleiterator.hasnext(interruptibleiterator。scala:37)在scala.collection.iterator$$anon$11.hasnext(iterator。scala:409)在scala.collection.iterator$$anon$11.hasnext(iterator。scala:409)在org.apache.spark.internal.io.sparkhadoopwriter$$anonfun$4.apply(sparkhadoopwriter。scala:128)在org.apache.spark.internal.io.sparkhadoopwriter$$anonfun$4.apply(sparkhadoopwriter。scala:127)在org.apache.spark.util.utils$.trywithsafefinallyandfailurecallbacks(utils。scala:1394)在org.apache.spark.internal.io.sparkhadoopwriter$.org$apache$spark$internal$io$sparkhadoopwriter$$executetask(sparkhadoopwriter)。scala:139) ... 10多个
19/06/12 05:08:41错误tasksetmanager:阶段1.0中的任务0失败4次;正在中止作业
19/06/12 05:08:41错误sparkhadoopwriter:正在中止作业\u 20190612050833 \u 0014。org.apache.spark.sparkeexception:由于阶段失败而中止作业:阶段1.0中的任务0失败了4次,最近的失败:阶段1.0中的任务0.3丢失(tid 10,ip-10-145-62-182.ec2.internal,执行器2):org.apache.spark.sparkexception:在org.apache.spark.internal.io.sparkhadoopwriter$.org$apache$spark$internal$io$sparkhadoopwriter$$executetask(sparkhadoopwriter)写入行时任务失败。scala:155)在org.apache.spark.internal.io.sparkhadoopwriter$$anonfun$3.apply(sparkhadoopwriter。scala:83)在org.apache.spark.internal.io.sparkhadoopwriter$$anonfun$3.apply(sparkhadoopwriter。scala:78)在org.apache.spark.scheduler.resulttask.runtask(resulttask。scala:90)在org.apache.spark.scheduler.task.run(task。scala:121)在org.apache.spark.executor.executor$taskrunner$$anonfun$10.apply(executor。scala:402)在org.apache.spark.util.utils$.trywithsafefinally(utils。scala:1360)在org.apache.spark.executor.executor$taskrunner.run(executor。scala:408)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1149)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:624)在java.lang.thread.run(线程。java:748) 引起原因:org.apache.spark.api.python.python异常:traceback(最近一次调用):file“/mnt/yarn/usercache/hadoop/appcache/application\u 1556865500911\u 0446/container\u 1556865500911\u 0446\u 01\u000003/pyspark.zip/pyspark/worker.py”,第262行,main(“%d.%d”%sys.version\u info[:2],版本)异常:worker中的python版本2.7与驱动程序3.5中的版本不同,Pypark无法使用不同的次要版本运行。请检查环境变量Pypark\ u python和Pypark\ u driver\ u python是否正确设置。
1条答案
按热度按时间xdnvmnnf1#
您有python版本问题。工作节点python版本(2.7)与驱动节点python版本(3.5)不同。请安装正确的版本。