我刚刚从1.3.1升级到spark 2.0.0,我编写了一个简单的代码,使用spark sql与hive(1.2.1)进行交互,我将hive-site.xml放入spark conf目录,从sql中得到了预期的结果,但它抛出了一个奇怪的alreadyexistsexception(message:database default 已经存在),如何忽略这个?
【代码】
from pyspark.sql import SparkSession
ss = SparkSession.builder.appName("test").master("local") \
.config("spark.ui.port", "4041") \
.enableHiveSupport()\
.getOrCreate()
ss.sparkContext.setLogLevel("INFO")
ss.sql("show tables").show()
【日志】
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/08 19:41:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/08 19:41:24 INFO execution.SparkSqlParser: Parsing command: show tables
16/08/08 19:41:25 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/08/08 19:41:26 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/08/08 19:41:26 INFO metastore.ObjectStore: ObjectStore, initialize called
16/08/08 19:41:26 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/08/08 19:41:26 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/08/08 19:41:26 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:27 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
16/08/08 19:41:27 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
16/08/08 19:41:27 INFO metastore.ObjectStore: Initialized ObjectStore
16/08/08 19:41:27 INFO metastore.HiveMetaStore: Added admin role in metastore
16/08/08 19:41:27 INFO metastore.HiveMetaStore: Added public role in metastore
16/08/08 19:41:27 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/08/08 19:41:27 INFO metastore.HiveMetaStore: 0: get_all_databases
16/08/08 19:41:27 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_all_databases
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/08/08 19:41:28 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/3fbc3578-fdeb-40a9-8469-7c851cb3733c_resources
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c/_tmp_space.db
16/08/08 19:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /user/hive/warehouse
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/8eaa63ec-9710-499f-bd50-6625bf4459f5_resources
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5/_tmp_space.db
16/08/08 19:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /user/hive/warehouse
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: create_database: Database(name:default, description:default database, locationUri:hdfs://localhost:9900/user/hive/warehouse, parameters:{})
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=create_database: Database(name:default, description:default database, locationUri:hdfs://localhost:9900/user/hive/warehouse, parameters:{})
16/08/08 19:41:28 ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy22.create_database(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy23.createDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:291)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:291)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:291)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:262)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:209)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:208)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:251)
at org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:290)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:99)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:98)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
at org.apache.spark.sql.hive.HiveSessionCatalog.<init>(HiveSessionCatalog.scala:51)
at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:49)
at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_database: default
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_database: default
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_database: default
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_database: default
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_tables: db=default pat=*
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_tables: db=default pat=*
16/08/08 19:41:28 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:-2
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:-2) with 1 output partitions
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:-2)
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Missing parents: List()
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:-2), which has no missing parents
16/08/08 19:41:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.9 KB, free 366.3 MB)
16/08/08 19:41:29 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.4 KB, free 366.3 MB)
16/08/08 19:41:29 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.68.80.25:58224 (size: 2.4 KB, free: 366.3 MB)
16/08/08 19:41:29 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/08/08 19:41:29 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:-2)
16/08/08 19:41:29 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/08/08 19:41:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 5827 bytes)
16/08/08 19:41:29 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
16/08/08 19:41:29 INFO codegen.CodeGenerator: Code generated in 152.42807 ms
16/08/08 19:41:29 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1279 bytes result sent to driver
16/08/08 19:41:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 275 ms on localhost (1/1)
16/08/08 19:41:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/08/08 19:41:29 INFO scheduler.DAGScheduler: ResultStage 0 (showString at NativeMethodAccessorImpl.java:-2) finished in 0.288 s
16/08/08 19:41:29 INFO scheduler.DAGScheduler: Job 0 finished: showString at NativeMethodAccessorImpl.java:-2, took 0.538913 s
16/08/08 19:41:29 INFO codegen.CodeGenerator: Code generated in 13.588415 ms
+-------------------+-----------+
| tableName|isTemporary|
+-------------------+-----------+
| app_visit_log| false|
| cms_article| false|
| p4| false|
| p_bak| false|
+-------------------+-----------+
16/08/08 19:41:29 INFO spark.SparkContext: Invoking stop() from shutdown hook
ps:当我用java测试它时,一切都很好。
任何帮助都将不胜感激。
2条答案
按热度按时间gcuhipw91#
如日志所示,此消息并不意味着发生了什么坏事,它只检查是否存在默认数据库;实际上,如果存在默认数据库,则不应显示这些异常日志。
x6h2sr282#
假设您没有现有的配置单元仓库之类的东西,请尝试在spark-defaults.xml中设置以下内容,然后重新启动spark master。