pyspark 2.0抛出alreadyexistsexception(message:database default 已存在)与Hive交互时

30byixjq  于 2021-06-28  发布在  Hive
关注(0)|答案(2)|浏览(544)

我刚刚从1.3.1升级到spark 2.0.0,我编写了一个简单的代码,使用spark sql与hive(1.2.1)进行交互,我将hive-site.xml放入spark conf目录,从sql中得到了预期的结果,但它抛出了一个奇怪的alreadyexistsexception(message:database default 已经存在),如何忽略这个?
【代码】

  1. from pyspark.sql import SparkSession
  2. ss = SparkSession.builder.appName("test").master("local") \
  3. .config("spark.ui.port", "4041") \
  4. .enableHiveSupport()\
  5. .getOrCreate()
  6. ss.sparkContext.setLogLevel("INFO")
  7. ss.sql("show tables").show()

【日志】

  1. Setting default log level to "WARN".
  2. To adjust logging level use sc.setLogLevel(newLevel).
  3. 16/08/08 19:41:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  4. 16/08/08 19:41:24 INFO execution.SparkSqlParser: Parsing command: show tables
  5. 16/08/08 19:41:25 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
  6. 16/08/08 19:41:26 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
  7. 16/08/08 19:41:26 INFO metastore.ObjectStore: ObjectStore, initialize called
  8. 16/08/08 19:41:26 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
  9. 16/08/08 19:41:26 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
  10. 16/08/08 19:41:26 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
  11. 16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
  12. 16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
  13. 16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
  14. 16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
  15. 16/08/08 19:41:27 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
  16. 16/08/08 19:41:27 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
  17. 16/08/08 19:41:27 INFO metastore.ObjectStore: Initialized ObjectStore
  18. 16/08/08 19:41:27 INFO metastore.HiveMetaStore: Added admin role in metastore
  19. 16/08/08 19:41:27 INFO metastore.HiveMetaStore: Added public role in metastore
  20. 16/08/08 19:41:27 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
  21. 16/08/08 19:41:27 INFO metastore.HiveMetaStore: 0: get_all_databases
  22. 16/08/08 19:41:27 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_all_databases
  23. 16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
  24. 16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_functions: db=default pat=*
  25. 16/08/08 19:41:28 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
  26. 16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/3fbc3578-fdeb-40a9-8469-7c851cb3733c_resources
  27. 16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c
  28. 16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c
  29. 16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c/_tmp_space.db
  30. 16/08/08 19:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /user/hive/warehouse
  31. 16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/8eaa63ec-9710-499f-bd50-6625bf4459f5_resources
  32. 16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5
  33. 16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5
  34. 16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5/_tmp_space.db
  35. 16/08/08 19:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /user/hive/warehouse
  36. 16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: create_database: Database(name:default, description:default database, locationUri:hdfs://localhost:9900/user/hive/warehouse, parameters:{})
  37. 16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=create_database: Database(name:default, description:default database, locationUri:hdfs://localhost:9900/user/hive/warehouse, parameters:{})
  38. 16/08/08 19:41:28 ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
  39. at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891)
  40. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  41. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  42. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  43. at java.lang.reflect.Method.invoke(Method.java:497)
  44. at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
  45. at com.sun.proxy.$Proxy22.create_database(Unknown Source)
  46. at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644)
  47. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  48. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  49. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  50. at java.lang.reflect.Method.invoke(Method.java:497)
  51. at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
  52. at com.sun.proxy.$Proxy23.createDatabase(Unknown Source)
  53. at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306)
  54. at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:291)
  55. at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:291)
  56. at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:291)
  57. at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:262)
  58. at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:209)
  59. at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:208)
  60. at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:251)
  61. at org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:290)
  62. at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:99)
  63. at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
  64. at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
  65. at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
  66. at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:98)
  67. at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147)
  68. at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
  69. at org.apache.spark.sql.hive.HiveSessionCatalog.<init>(HiveSessionCatalog.scala:51)
  70. at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:49)
  71. at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
  72. at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
  73. at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
  74. at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
  75. at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
  76. at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  77. at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  78. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  79. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  80. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  81. at java.lang.reflect.Method.invoke(Method.java:497)
  82. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
  83. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
  84. at py4j.Gateway.invoke(Gateway.java:280)
  85. at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
  86. at py4j.commands.CallCommand.execute(CallCommand.java:79)
  87. at py4j.GatewayConnection.run(GatewayConnection.java:211)
  88. at java.lang.Thread.run(Thread.java:745)
  89. 16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_database: default
  90. 16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_database: default
  91. 16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_database: default
  92. 16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_database: default
  93. 16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_tables: db=default pat=*
  94. 16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_tables: db=default pat=*
  95. 16/08/08 19:41:28 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:-2
  96. 16/08/08 19:41:28 INFO scheduler.DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:-2) with 1 output partitions
  97. 16/08/08 19:41:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:-2)
  98. 16/08/08 19:41:28 INFO scheduler.DAGScheduler: Parents of final stage: List()
  99. 16/08/08 19:41:28 INFO scheduler.DAGScheduler: Missing parents: List()
  100. 16/08/08 19:41:28 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:-2), which has no missing parents
  101. 16/08/08 19:41:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.9 KB, free 366.3 MB)
  102. 16/08/08 19:41:29 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.4 KB, free 366.3 MB)
  103. 16/08/08 19:41:29 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.68.80.25:58224 (size: 2.4 KB, free: 366.3 MB)
  104. 16/08/08 19:41:29 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
  105. 16/08/08 19:41:29 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:-2)
  106. 16/08/08 19:41:29 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
  107. 16/08/08 19:41:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 5827 bytes)
  108. 16/08/08 19:41:29 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
  109. 16/08/08 19:41:29 INFO codegen.CodeGenerator: Code generated in 152.42807 ms
  110. 16/08/08 19:41:29 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1279 bytes result sent to driver
  111. 16/08/08 19:41:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 275 ms on localhost (1/1)
  112. 16/08/08 19:41:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
  113. 16/08/08 19:41:29 INFO scheduler.DAGScheduler: ResultStage 0 (showString at NativeMethodAccessorImpl.java:-2) finished in 0.288 s
  114. 16/08/08 19:41:29 INFO scheduler.DAGScheduler: Job 0 finished: showString at NativeMethodAccessorImpl.java:-2, took 0.538913 s
  115. 16/08/08 19:41:29 INFO codegen.CodeGenerator: Code generated in 13.588415 ms
  116. +-------------------+-----------+
  117. | tableName|isTemporary|
  118. +-------------------+-----------+
  119. | app_visit_log| false|
  120. | cms_article| false|
  121. | p4| false|
  122. | p_bak| false|
  123. +-------------------+-----------+
  124. 16/08/08 19:41:29 INFO spark.SparkContext: Invoking stop() from shutdown hook

ps:当我用java测试它时,一切都很好。
任何帮助都将不胜感激。

gcuhipw9

gcuhipw91#

如日志所示,此消息并不意味着发生了什么坏事,它只检查是否存在默认数据库;实际上,如果存在默认数据库,则不应显示这些异常日志。

x6h2sr28

x6h2sr282#

假设您没有现有的配置单元仓库之类的东西,请尝试在spark-defaults.xml中设置以下内容,然后重新启动spark master。

  1. spark.sql.warehouse.dir=file:///usr/lib/spark/..... (spark install dir)

相关问题