通过spark应用程序在hdfs的不同仓库目录之间保存配置单元表

uqdfh47h  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(366)

到目前为止,我正在研究如何正确地保存从特定数据库中的Map源表派生的特定配置单元表。假设测试人员和开发人员都有一个单独的数据库。如何将它们可以访问的表列表彼此隔离?
现在,我通过hue监视两个数据库的状态。现在,我有一个spark程序,它在一个yarn集群上运行,它创建一个表来存储,这取决于他是开发人员还是测试人员。
我刚刚创建的spark程序是一个简单的应用程序,它从当前仓库位置读取一个表,并保存一个名为new\u table的新表
我有以下配置单元配置xml,例如:

  1. <configuration>
  2. <property>
  3. <name>hive.metastore.uris</name>
  4. <value>thrift://xxxx:9083</value>
  5. </property>
  6. <property>
  7. <name>hive.metastore.client.socket.timeout</name>
  8. <value>300</value>
  9. </property>
  10. <!--<property>
  11. <name>hive.metastore.warehouse.dir</name>
  12. <value>/user/yyyy/warehouse</value>
  13. </property>-->
  14. <property>
  15. <name>hive.warehouse.subdir.inherit.perms</name>
  16. <value>true</value>
  17. </property>
  18. <property>
  19. <name>hive.auto.convert.join</name>
  20. <value>true</value>
  21. </property>
  22. <property>
  23. <name>hive.auto.convert.join.noconditionaltask.size</name>
  24. <value>20971520</value>
  25. </property>
  26. <property>
  27. <name>hive.optimize.bucketmapjoin.sortedmerge</name>
  28. <value>false</value>
  29. </property>
  30. <property>
  31. <name>hive.smbjoin.cache.rows</name>
  32. <value>10000</value>
  33. </property>
  34. <property>
  35. <name>hive.server2.logging.operation.enabled</name>
  36. <value>true</value>
  37. </property>
  38. <property>
  39. <name>hive.server2.logging.operation.log.location</name>
  40. <value>/var/log/hive/operation_logs</value>
  41. </property>
  42. <property>
  43. <name>mapred.reduce.tasks</name>
  44. <value>-1</value>
  45. </property>
  46. <property>
  47. <name>hive.exec.reducers.bytes.per.reducer</name>
  48. <value>67108864</value>
  49. </property>
  50. <property>
  51. <name>hive.exec.copyfile.maxsize</name>
  52. <value>33554432</value>
  53. </property>
  54. <property>
  55. <name>hive.exec.reducers.max</name>
  56. <value>1099</value>
  57. </property>
  58. <property>
  59. <name>hive.vectorized.groupby.checkinterval</name>
  60. <value>4096</value>
  61. </property>
  62. <property>
  63. <name>hive.vectorized.groupby.flush.percent</name>
  64. <value>0.1</value>
  65. </property>
  66. <property>
  67. <name>hive.compute.query.using.stats</name>
  68. <value>false</value>
  69. </property>
  70. <property>
  71. <name>hive.vectorized.execution.enabled</name>
  72. <value>true</value>
  73. </property>
  74. <property>
  75. <name>hive.vectorized.execution.reduce.enabled</name>
  76. <value>false</value>
  77. </property>
  78. <property>
  79. <name>hive.merge.mapfiles</name>
  80. <value>true</value>
  81. </property>
  82. <property>
  83. <name>hive.merge.mapredfiles</name>
  84. <value>false</value>
  85. </property>
  86. <property>
  87. <name>hive.cbo.enable</name>
  88. <value>false</value>
  89. </property>
  90. <property>
  91. <name>hive.fetch.task.conversion</name>
  92. <value>minimal</value>
  93. </property>
  94. <property>
  95. <name>hive.fetch.task.conversion.threshold</name>
  96. <value>268435456</value>
  97. </property>
  98. <property>
  99. <name>hive.limit.pushdown.memory.usage</name>
  100. <value>0.1</value>
  101. </property>
  102. <property>
  103. <name>hive.merge.sparkfiles</name>
  104. <value>true</value>
  105. </property>
  106. <property>
  107. <name>hive.merge.smallfiles.avgsize</name>
  108. <value>16777216</value>
  109. </property>
  110. <property>
  111. <name>hive.merge.size.per.task</name>
  112. <value>268435456</value>
  113. </property>
  114. <property>
  115. <name>hive.optimize.reducededuplication</name>
  116. <value>true</value>
  117. </property>
  118. <property>
  119. <name>hive.optimize.reducededuplication.min.reducer</name>
  120. <value>4</value>
  121. </property>
  122. <property>
  123. <name>hive.map.aggr</name>
  124. <value>true</value>
  125. </property>
  126. <property>
  127. <name>hive.map.aggr.hash.percentmemory</name>
  128. <value>0.5</value>
  129. </property>
  130. <property>
  131. <name>hive.optimize.sort.dynamic.partition</name>
  132. <value>false</value>
  133. </property>
  134. <property>
  135. <name>hive.execution.engine</name>
  136. <value>mr</value>
  137. </property>
  138. <property>
  139. <name>spark.executor.memory</name>
  140. <value>996461772</value>
  141. </property>
  142. <property>
  143. <name>spark.driver.memory</name>
  144. <value>966367641</value>
  145. </property>
  146. <property>
  147. <name>spark.executor.cores</name>
  148. <value>4</value>
  149. </property>
  150. <property>
  151. <name>spark.yarn.driver.memoryOverhead</name>
  152. <value>102</value>
  153. </property>
  154. <property>
  155. <name>spark.yarn.executor.memoryOverhead</name>
  156. <value>167</value>
  157. </property>
  158. <property>
  159. <name>spark.dynamicAllocation.enabled</name>
  160. <value>true</value>
  161. </property>
  162. <property>
  163. <name>spark.dynamicAllocation.initialExecutors</name>
  164. <value>1</value>
  165. </property>
  166. <property>
  167. <name>spark.dynamicAllocation.minExecutors</name>
  168. <value>1</value>
  169. </property>
  170. <property>
  171. <name>spark.dynamicAllocation.maxExecutors</name>
  172. <value>2147483647</value>
  173. </property>
  174. <property>
  175. <name>hive.metastore.execute.setugi</name>
  176. <value>true</value>
  177. </property>
  178. <property>
  179. <name>hive.support.concurrency</name>
  180. <value>true</value>
  181. </property>
  182. <property>
  183. <name>hive.zookeeper.quorum</name>
  184. <value>xxxx,xxxx</value>
  185. </property>
  186. <property>
  187. <name>hive.zookeeper.client.port</name>
  188. <value>2181</value>
  189. </property>
  190. <property>
  191. <name>hbase.zookeeper.quorum</name>
  192. <value>xxxx,xxxx</value>
  193. </property>
  194. <property>
  195. <name>hbase.zookeeper.property.clientPort</name>
  196. <value>2181</value>
  197. </property>
  198. <property>
  199. <name>hive.zookeeper.namespace</name>
  200. <value>hive_zookeeper_namespace_hive</value>
  201. </property>
  202. <property>
  203. <name>hive.cluster.delegation.token.store.class</name>
  204. <value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
  205. </property>
  206. <property>
  207. <name>hive.server2.enable.doAs</name>
  208. <value>true</value>
  209. </property>
  210. <property>
  211. <name>hive.server2.use.SSL</name>
  212. <value>false</value>
  213. </property>
  214. <property>
  215. <name>spark.shuffle.service.enabled</name>
  216. <value>true</value>
  217. </property>
  218. </configuration>

根据我目前的理解,如果我在通过提交Yarn集群上的spark应用程序时将仓库位置更改为其他位置 hive.warehouse.dir 使用 --files /file/hive-site.xml 例如hdfs:/user/diff/warehouse的值,spark应用程序上的配置单元配置应该检测特定目录中存在的以下配置单元表。
但是,执行此操作后,它仍然保持在 hive.metastore.uris 指向目录hdfs:/user/hive/warehouse。根据我的理解,hive.metastore.uris会覆盖中的数据库位置 hive.metastore.dir .
在这一点上我做错了什么?是否需要在hive-site.xml中正确配置某些内容?如有任何答复,将不胜感激。谢谢您。对于spark和hadoop,我目前是一个新手开发人员。

wdebmtf2

wdebmtf21#

创建单独的数据库

演示

创建数据库是一次性的事情

  1. hive> create database dev_db location '/user/hive/my_databases/dev';
  2. hive> create database tst_db location '/user/hive/my_databases/tst';

创建表时,请选择要使用的数据库

  1. hive> create table dev_db.my_dev_table (i int);
  2. hive> create table tst_db.my_tst_table (i int);
  1. hive> desc formatted dev_db.my_dev_table;
  1. # col_name data_type comment
  2. i int
  3. # Detailed Table Information
  4. Database: dev_db
  5. ...
  6. Location: hdfs://quickstart.cloudera:8020/user/hive/my_databases/dev/my_dev_table
  7. ...
  1. hive> desc formatted tst_db.my_tst_table;
  1. Database: tst_db
  2. ...
  3. Location: hdfs://quickstart.cloudera:8020/user/hive/my_databases/tst/my_tst_table
  4. ...
展开查看全部

相关问题