上下文
我有一个应该使用pyspark在一些表上执行的操作。此操作包括访问spark metastore(在databricks中)以获取一些元数据。因为我有很多表,所以我正在使用rdd在集群工作进程中并行执行此操作,如下面的代码所示:
base_spark_context = SparkContext.getOrCreate()
rdd = base_spark_context.sc.parallelize(tables_list)
rdd.map(lambda table_name: sync_table(table_name)).collect()
自由民主党 sync_table()
在元存储上运行查询,类似于以下代码行:
spark_client.session.sql("select 1")
问题是这个sql执行不成功。相反,我得到了一些元存储相关的错误。回溯:
py4j.protocol.Py4JJavaError: An error occurred while calling o20.sql.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
(suppressed lines)
Caused by: java.lang.reflect.InvocationTargetException
(suppressed lines)
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader sun.misc.Launcher$AppClassLoader@16c0663d, see the next exception for details.
(suppressed lines)
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /databricks/spark/work/app-20210413201900-0000/0/metastore_db.
在以这种方式并行化操作之后,访问worker中的databricks元存储是否有任何限制?或者有可能进行这样的手术?
暂无答案!
目前还没有任何答案,快来回答吧!