我正在尝试使用spark2 submit-on命令运行spark作业。安装在集群上的spark的版本是cloudera的spark2.1.0,我正在使用conf spark.yarn.jars为版本2.4.0指定jars,如下所示-
spark2-submit \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/virtualenv/path/bin/python \
--conf spark.yarn.jars=hdfs:///some/path/spark24/*\
--conf spark.yarn.maxAppAttempts=1\
--conf spark.task.cpus=2\
--executor-cores 2\
--executor-memory 4g\
--driver-memory 4g\
--archives /virtualenv/path \
--files /etc/hive/conf/hive-site.xml \
--name my_app\
test.py
这是我在test.py中的代码-
import os
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
print("Spark Session created")
在运行submit命令时,我看到如下消息-
yarn.Client: Source and destination file systems are the same. Not copying hdfs:///some/path/spark24/some.jar
然后我在创建spark会话的那一行得到了这个错误-
spark = SparkSession.builder.getOrCreate()
File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/sql/session.py", line 169, in getOrCreate
File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 310, in getOrCreate
File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 259, in _ensure_initialized
File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py", line 117, in launch_gateway
File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 175, in java_import
File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling None.None. Trace:
Authentication error: unexpected command.
错误中的py4j来自现有的spark,而不是我jar中的版本。我的spark24jar没捡起来吗?如果我删除jars的conf,同样的代码运行正常,但可能是从现有的sparkversion2.1.0中删除的。有什么线索可以帮你解决吗?谢谢。
1条答案
按热度按时间xriantvc1#
问题是python从错误的地方运行。我必须这样从正确的地方屈服-
pythonpath=./${virtualenv}/venv/lib/python3.6/site-packages/spark2提交