pyspark hivecontext错误

yws3nbqq  于 2021-06-29  发布在  Java
关注(0)|答案(0)|浏览(377)

执行时出错

  1. airflow@41166b660d82:~$ spark-submit --master yarn --deploy-mode cluster --keytab keytab_name.keytab --principal --jars keytab_name@REALM --jars /path/to/spark-hive_2.11-2.3.0.jar sranje.py

来自不在cdh环境中的airflow docker容器(不由cdh cm管理)。sranje.py是简单的select*from hive table。
应用程序在cdhYarn上被接受并执行两次,出现以下错误:

  1. ...
  2. 2020-12-31 10:11:43 INFO StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
  3. Traceback (most recent call last):
  4. File "sranje.py", line 21, in <module>
  5. source_df = hiveContext.table(hive_source).na.fill("")
  6. File "/dfs/dn4/yarn/nm/usercache/etladmin/appcache/application_1608187067076_0150/container_e29_1608187067076_0150_02_000001/pyspark.zip/pyspark/sql/context.py", line 366, in table
  7. File "/dfs/dn4/yarn/nm/usercache/etladmin/appcache/application_1608187067076_0150/container_e29_1608187067076_0150_02_000001/pyspark.zip/pyspark/sql/session.py", line 721, in table
  8. File "/dfs/dn4/yarn/nm/usercache/etladmin/appcache/application_1608187067076_0150/container_e29_1608187067076_0150_02_000001/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
  9. File "/dfs/dn4/yarn/nm/usercache/etladmin/appcache/application_1608187067076_0150/container_e29_1608187067076_0150_02_000001/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
  10. pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':"
  11. 2020-12-31 10:11:43 ERROR ApplicationMaster:70 - User application exited with status 1
  12. 2020-12-31 10:11:43 INFO ApplicationMaster:54 - Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
  13. ...

我们假设缺少“一些.jar和java依赖项”。有什么想法吗?
细节
在执行spark cmd之前有一个有效的krb票证
如果我们同意的话 --jars /path/to/spark-hive_2.11-2.3.0.jar ,pyhton错误不同

  1. ...
  2. pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':"
  3. ...

spark(2.3.0)、hadoop(2.6.0)和java的版本与cdh相同
还提供了hive-site.xml、yarn-site.xml等,并且有效
这个spark提交应用程序从cdh集群内部的节点执行ok
我们试着添加额外的 --jars spark-hive_2.11-2.3.0.jar,spark-core_2.11-2.3.0.jar,spark-sql_2.11-2.3.0.jar,hive-hcatalog-core-2.3.0.jar,spark-hive-thriftserver_2.11-2.3.0.jar 开发人员以以下代码为例:

  1. # -*- coding: utf-8 -*-
  2. import sys
  3. reload(sys)
  4. sys.setdefaultencoding('utf-8')
  5. from pyspark.context import SparkContext
  6. from pyspark.sql import SparkSession, SQLContext, HiveContext, functions as F
  7. from pyspark.sql.utils import AnalysisException
  8. from datetime import datetime
  9. sc = SparkContext.getOrCreate()
  10. spark = SparkSession(sc)
  11. sqlContext = SQLContext(sc)
  12. hiveContext = HiveContext(sc)
  13. current_date = str(datetime.now().strftime('%Y-%m-%d'))
  14. hive_source = "lnz_ch.lnz_cfg_codebook"
  15. source_df = hiveContext.table(hive_source).na.fill("")
  16. print("Number of records: {}".format(source_df.count()))
  17. print("First 20 rows of the table:")
  18. source_df.show(20)

不同的脚本,相同的错误

  1. # -*- coding: utf-8 -*-
  2. import sys
  3. reload(sys)
  4. sys.setdefaultencoding('utf-8')
  5. from pyspark.sql.types import *
  6. from pyspark.sql.functions import *
  7. from pyspark.sql import SparkSession
  8. if __name__ == "__main__":
  9. spark = SparkSession.builder.appName("ZekoTest").enableHiveSupport().getOrCreate()
  10. data = spark.sql("SELECT * FROM lnz_ch.lnz_cfg_codebook")
  11. data.show(20)
  12. spark.close()

谢谢您。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题