我在ssh服务器上工作,通过以下命令加载spark:
module load spark/2.3.0
我想创建一个配置单元表,将我的Dataframe分区保存到这个表中。
我的代码 mycode.py
具体如下:
if __name__ == "__main__":
warehouse_location = abspath('spark-warehouse')
conf = (SparkConf()
.setMaster("local[*]")
.setAppName(appName)
.set("spark.default.parallelism", 128)
.set("spark.sql.shuffle.partitions", 128)
)
spark = SparkSession.builder.config(conf=conf).config("spark.sql.warehouse.dir", warehouse_location).enableHiveSupport().getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sparkContext = sc)
sc.stop()
此代码生成以下异常:
py4j.protocol.Py4JJavaError: An error occurred while calling o41.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1064)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:141)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:140)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:140)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveSessionStateBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:235)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
... 16 more
如何解决这个问题?请问我的错在哪里?注意,我使用 spark-submit mycode.py
. 我不知道我是否需要添加任何参数到这个commond
1条答案
按热度按时间8qgya5xd1#
在我的例子中,这是因为spark缺少配置单元依赖项
我所做的是向pyspark依赖项添加jar