我使用的是hive0.13.0,我希望它能够处理具有非字母数字字符的表和列名,正如文档中所说的,但事实并非如此。
我已经能够创建一个具有带点的列名的表,例如:
hive> create external table frb_test (recvTime string, fiwareServicePath string, entityId string, entityType string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad` string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md` array<struct<name:string,type:string,value:string>>) row format serde 'org.openx.data.jsonserde.JsonSerDe' location '/user/frb/test';
OK
Time taken: 0.286 seconds
如你所见,我正在使用https://github.com/rcongiu/hive-json-serde 然而,下面是 hdfs:///user/frb/test
:
$ hadoop fs -cat /user/frb/test/deleteme
{"recvTime":"2016-02-09T18:03:48.986Z","fiwareServicePath":"orl_sou","entityId":"ORL.SOU.DH.SSTA10","entityType":"ETS", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad":"10.673299789428711", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md":[{"name":"dofTimestamp","type":"ms","value":"2016-02-08T23:00:00.000Z"},{"name":"tag","type":"text","value":"ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad"},{"name":"description","type":"text","value":"Electrical heat load"},{"name":"quality","type":"0:GOOD, +0:ERROR","value":"10813440"},{"name":"max","type":"max","value":"null"},{"name":"min","type":"min","value":"null"},{"name":"lcl","type":"lcl","value":"null"},{"name":"ucl","type":"ucl","value":"null"}]}
我无法选择 orl.sou.dh.ssta10.t.hvac.heatload
列:
hive> add jar /home/frb/json-serde-1.3.7-jar-with-dependencies.jar;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test; Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0008, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0008/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job -kill job_1455032234756_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-02-11 17:05:56,150 Stage-1 map = 0%, reduce = 0%
2016-02-11 17:06:23,653 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1455032234756_0008 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1455032234756_0008_m_000000 (and more) from job job_1455032234756_0008
Task with the most failures(4):
-----
Task ID:
task_1455032234756_0008_m_000000
URL:
http://namenode.fiware.org:8088/taskdetails.jsp?jobid=job_1455032234756_0008&tipid=task_1455032234756_0008_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157)
... 22 more
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:424)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:136)
... 22 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
我已经看到了关于非字母数字字符的hive属性 hive.support.quoted.identifiers
,它可以 none
(然后配置单元的行为与0.12.0版本相同)或 column
,我猜它是0.13.0的默认值;尽管如此,我试过设置,但没有结果:
hive> set hive.support.quoted.identifiers=column;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0009, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0009/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job -kill job_1455032234756_0009
...
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
...
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
1条答案
按热度按时间yk9xbfzb1#
我敢打赌hql解析器认为“点”字符是访问结构内部字段的一种方式,而不是其他方式。
我敢打赌,在所有参与支持hive中“引用的标识符”的人中,没有人想到在列名中有一个“点”的测试用例。毕竟,谁会疯狂到在列名中使用“点”??
好吧,也许吧。那么谁会疯狂到定义一个名为“点”的结构列,出于反常,只是为了在混合中添加一个额外的“点”??
好吧,让我们假设这可能发生。那么,假想的人会坚持使用第一个支持“引用标识符”的Hive版本吗?在实际生产系统中没有对该特性进行作战测试?没有机会从最终的错误修复中获益??
我的2美分:既然你显然无法控制你收到的垃圾json,那就快跑吧
sed
在它上面(或者一个缓慢的java正则表达式,如果您愿意的话)用合理的列名替换这些虚线怪物。从此快乐。