为什么我不能示例化“org.apache.spark.sql.hive.hivesessionstatebuilder”?

zd287kbt  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(399)

我在ssh服务器上工作,通过以下命令加载spark:

module load spark/2.3.0

我想创建一个配置单元表,将我的Dataframe分区保存到这个表中。
我的代码 mycode.py 具体如下:

if __name__ == "__main__":
    warehouse_location = abspath('spark-warehouse')
    conf = (SparkConf()
    .setMaster("local[*]")
    .setAppName(appName)
    .set("spark.default.parallelism", 128)
    .set("spark.sql.shuffle.partitions", 128)
    )

    spark = SparkSession.builder.config(conf=conf).config("spark.sql.warehouse.dir", warehouse_location).enableHiveSupport().getOrCreate()
    sc = spark.sparkContext
    sqlContext = SQLContext(sparkContext = sc)
    sc.stop()

此代码生成以下异常:

py4j.protocol.Py4JJavaError: An error occurred while calling o41.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1064)
        at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:141)
        at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:140)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:140)
        at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveSessionStateBuilder
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:235)
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
        ... 16 more

如何解决这个问题?请问我的错在哪里?注意,我使用 spark-submit mycode.py . 我不知道我是否需要添加任何参数到这个commond

8qgya5xd

8qgya5xd1#

在我的例子中,这是因为spark缺少配置单元依赖项
我所做的是向pyspark依赖项添加jar

submit_args = '--packages org.apache.spark:spark-hive_2.11:2.4.6 pyspark-shell'
if 'PYSPARK_SUBMIT_ARGS' not in os.environ:
    os.environ['PYSPARK_SUBMIT_ARGS'] = submit_args
else:
    os.environ['PYSPARK_SUBMIT_ARGS'] += submit_args

相关问题