使用oozie添加配置单元分区

hof1towb  于 2021-06-29  发布在  Hive
关注(0)|答案(2)|浏览(605)

我正在使用hpd-2.4.2,并尝试使用oozie协调器作业向外部配置单元表添加分区。我创建了一个协调器,每天执行以下工作流程:

  1. <workflow-app name="addPartition" xmlns="uri:oozie:workflow:0.4">
  2. <start to="hive"/>
  3. <action name="hive">
  4. <hive2 xmlns="uri:oozie:hive2-action:0.1">
  5. <job-tracker>${jobTracker}</job-tracker>
  6. <name-node>${nameNode}</name-node>
  7. <jdbc-url>jdbc:hive2://${jdbcPath}</jdbc-url>
  8. <password>yarn</password>
  9. <script>${appPath}/addPartition.q</script>
  10. <param>nameNode=${nameNode}</param>
  11. <param>dt=${dt}</param>
  12. <param>path=${path}</param>
  13. </hive2>
  14. <ok to="end" />
  15. <error to="fail" />
  16. </action>
  17. <kill name="fail">
  18. <message>
  19. Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
  20. </message>
  21. </kill>
  22. <end name="end" />
  23. </workflow-app>

执行的脚本包含

  1. CREATE EXTERNAL TABLE IF NOT EXISTS visits (sid BIGINT, os STRING, browser STRING, visit_time TIMESTAMP)
  2. PARTITIONED BY (dt STRING)
  3. STORED AS PARQUET;
  4. ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';

如果我运行该作业,则会创建表,但不会添加分区。我发现:

  1. Beeline command arguments :
  2. -u
  3. jdbc:hive2://localhost:10000/default
  4. -n
  5. yarn
  6. -p
  7. yarn
  8. -d
  9. org.apache.hive.jdbc.HiveDriver
  10. --hivevar
  11. nameNode=hdfs://bigdata01.local:8020
  12. --hivevar
  13. dt=2016-01-05
  14. --hivevar
  15. path=2016/01/05
  16. -f
  17. addPartition.q
  18. -a
  19. delegationToken
  20. --hiveconf
  21. mapreduce.job.tags=oozie-1b3b2ee664df7ac9ee436379d784955a
  22. Fetching child yarn jobs
  23. tag id : oozie-1b3b2ee664df7ac9ee436379d784955a
  24. Child yarn jobs are found -
  25. =================================================================
  26. >>> Invoking Beeline command line now >>>
  27. [...]
  28. 0: jdbc:hive2://localhost:10000/default> ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';

似乎alter表中的参数没有被替换,为了检查这一点,我尝试直接从cli调用beeline:

  1. beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar nameNode=hdfs://bigdata01.local:8020 --hivevar dt="2016-01-03" --hivevar path="2016/01/03" -e "ALTER TABLE visits ADD PARTITION(dt='${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';"

从而导致错误:

  1. Connecting to jdbc:hive2://localhost:10000/default
  2. Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
  3. Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
  4. Transaction isolation: TRANSACTION_REPEATABLE_READ
  5. Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. partition spec is invalid; field dt does not exist or is empty (state=08S01,code=1)

如果我不带参数运行alter语句

  1. beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver -e "ALTER TABLE visits ADD PARTITION(dt='2016-01-03') LOCATION 'hdfs://bigdata01.local:8020/data/parquet/visitors/2016/01/03';"

或者打开一个定义了hivevars的beeline控制台并执行alter语句

  1. beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar nameNode=hdfs://bigdata01.local:8020 --hivevar dt="2016-01-03" --hivevar path="2016/01/03"
  2. 0: jdbc:hive2://localhost:10000/default> ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';

分区已创建。
我错在哪里?
更新:
hive2操作中参数的值在oozie.properties文件和coordinator.xml中定义

  1. <property>
  2. <name>nameNode</name>
  3. <value>${nameNode}</value>
  4. </property>
  5. <property>
  6. <name>dt</name>
  7. <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1,'DAY'),'yyyy-MM-dd')}</value>
  8. </property>
  9. <property>
  10. <name>path</name>
  11. <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1,'DAY'),'yyyy/MM/dd')}</value>
  12. </property>

在你找到的

  1. Parameters:
  2. ------------------------
  3. nameNode=hdfs://bigdata01.local:8020
  4. dt=2016-01-05
  5. path=2016/01/05

在hive2操作的直线调用中将它们设置为hivevars之前。

yh2wf1be

yh2wf1be1#

谢谢你的帮助,但我放弃了。我将使用ssh操作来执行带有静态alter语句的beeline,而不是hive2操作。

  1. <ssh xmlns="uri:oozie:ssh-action:0.1">
  2. <host>${sshUser}@${sshHost}</host>
  3. <command>"beeline"</command>
  4. <args>-u</args>
  5. <args>jdbc:hive2://localhost:10000/default</args>
  6. <args>-n</args>
  7. <args>yarn</args>
  8. <args>-p</args>
  9. <args>yarn</args>
  10. <args>-d</args>
  11. <args>org.apache.hive.jdbc.HiveDriver</args>
  12. <args>-e</args>
  13. <args>"ALTER TABLE visits ADD PARTITION(dt='${dt}') LOCATION '${nameNode}/data/raw/parquet/visitors/${path}';"</args>
  14. <capture-output />
  15. </ssh>
jfgube3f

jfgube3f2#

终于发现了问题。必须使用双引号而不是单引号;-)

  1. $ beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -e "SELECT '${foo}' as foo;
  2. +------+--+
  3. | foo |
  4. +------+--+
  5. | |
  6. +------+--+
  7. beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -e 'SELECT "${foo}" as foo;'
  8. +------+--+
  9. | foo |
  10. +------+--+
  11. | bar |
  12. +------+--+
  13. beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -f selectFoo.q
  14. +------+--+
  15. | foo |
  16. +------+--+
  17. | bar |
  18. +------+--+
展开查看全部

相关问题