在pig中使用hcatalog加载配置单元表时出错

ckx4rj1h  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(347)

我试图在pig中使用hcatalog加载我的配置单元表,因为我已经编写了下面的代码,但是我得到了一个错误。我正在打开我的Pig壳 pig -useHCatalog 代码:

  1. A = LOAD 'patient_info' USING org.apache.hive.hcatalog.pig.HCatLoader();

错误:
错误hive.ql.metadata.table-无法从serde获取字段:com.ibm.spss.hive.serde2.xml.xmlserde java.lang.runtimeexception:metaexception(message:java.lang.classnotfoundexception class com.ibm.spss.hive.serde2.xml.xmlserde未找到),位于org.apache.hadoop.hive.ql.metadata.table.getdeserializerfrommetastore(表。java:275)在org.apache.hadoop.hive.ql.metadata.table.getdeserializer(表。java:255)在org.apache.hadoop.hive.ql.metadata.table.getcols(表。java:602)在org.apache.hive.hcatalog.common.hcatutil.gettableschemawithptncols(hcatutil。java:184)在org.apache.hive.hcatalog.pig.hcatloader.getschema(hcatloader。java:216)在org.apache.pig.newplan.logical.relational.loload.getschemafrommetadata(loload。java:175)在org.apache.pig.newplan.logical.relational.loload。java:89)位于org.apache.pig.parser.logicalplanbuilder.buildloadop(logicalplanbuilder)。java:866)在org.apache.pig.parser.logicalplangerator.load\子句(logicalplangerator。java:3568)在org.apache.pig.parser.logicalplangerator.op\子句(logicalplangerator。java:1625)在org.apache.pig.parser.logicalplangerator.general\语句(logicalplangerator。java:1102)位于org.apache.pig.parser.logicalplangerator.statement(logicalplangerator。java:560)在org.apache.pig.parser.logicalplangerator.query(logicalplangerator。java:421)在org.apache.pig.parser.queryparserdriver.parse(queryparserdriver。java:188)在org.apache.pig.pigserver$graph.parsequery(pigserver。java:1688)在org.apache.pig.pigserver$graph.registerquery(pigserver。java:1635)在org.apache.pig.pigserver.registerquery(pigserver。java:587)在org.apache.pig.tools.grunt.gruntparser.processpig(gruntparser。java:1093)在org.apache.pig.tools.pigscript.parser.pigscriptparser.parse(pigscriptparser。java:501)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:198)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:173)在org.apache.pig.tools.grunt.grunt.run(grunt。java:69)在org.apache.pig.main.run(main。java:547)在org.apache.pig.main.main(main。java:158)位于sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:606)在org.apache.hadoop.util.runjar.run(runjar。java:221)在org.apache.hadoop.util.runjar.main(runjar。java:136)原因:元异常(message:java.lang.classnotfoundexception class com.ibm.spss.hive.serde2.xml.xmlserde未找到),位于org.apache.hadoop.hive.metastore.metastoreutils.getdeserializer(metastoreutils)。java:400)
更新:
下面给出了我在hive中存储数据的命令。

  1. add jar /home/cloudera/hivexmlserde-1.0.5.3.jar;
  2. CREATE EXTERNAL TABLE patient_info (
  3. statusCode string,
  4. title string,
  5. startTime string,
  6. endTime string,
  7. frequencyValue string,
  8. frequencyUnits string
  9. )
  10. ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
  11. WITH SERDEPROPERTIES (
  12. "column.xpath.statusCode"="medicationsInfo/entryInfo/statusCode/text()",
  13. "column.xpath.title"="medications/code/code/text()",
  14. "column.xpath.startTime"="medications/xxx/startTime/text()",
  15. "column.xpath.endTime"="medications/xxx/endTime/text()",
  16. "column.xpath.frequencyValue"="medications/xxx/frequencyValue/text()",
  17. "column.xpath.frequencyUnits"="medications/xxx/frequencyUnits/text()",
  18. )
  19. STORED AS
  20. INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
  21. OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
  22. TBLPROPERTIES (
  23. "xmlinput.start"="<medicationsInfo",
  24. "xmlinput.end"="</medicationsInfo>");
  25. load data inpath '/user/cloudera/xml' into table patient_info ;

样品:

  1. <Document>
  2. <ProductCode>
  3. <code>10160-0</code>
  4. <entryInfo>
  5. <statusCode>completed</statusCode>
  6. <startTime>20110729</startTime>
  7. <endTime>20110822</endTime>
  8. <strengthValue>24</strengthValue>
  9. <strengthUnits>h</strengthUnits>
  10. </entryInfo>
  11. <entryInfo>
  12. <statusCode>completed</statusCode>
  13. <startTime>20120130</startTime>
  14. <endTime>20120326</endTime>
  15. <strengthValue>12</strengthValue>
  16. <strengthUnits>h</strengthUnits>
  17. </entryInfo>
  18. <entryInfo>
  19. <statusCode>completed</statusCode>
  20. <startTime>20100412</startTime>
  21. <endTime>20110822</endTime>
  22. <strengthValue>8</strengthValue>
  23. <strengthUnits>d</strengthUnits>
  24. </entryInfo>
  25. </ProductCode>
  26. <ProductCode>
  27. <code>10160-0</code>
  28. <entryInfo>
  29. <statusCode>completed</statusCode>
  30. <startTime>20110729</startTime>
  31. <endTime>20110822</endTime>
  32. <strengthValue>24</strengthValue>
  33. <strengthUnits>h</strengthUnits>
  34. </entryInfo>
  35. <entryInfo>
  36. <statusCode>completed</statusCode>
  37. <startTime>20120130</startTime>
  38. <endTime>20120326</endTime>
  39. <strengthValue>12</strengthValue>
  40. <strengthUnits>h</strengthUnits>
  41. </entryInfo>
  42. <entryInfo>
  43. <statusCode>completed</statusCode>
  44. <startTime>20100412</startTime>
  45. <endTime>20110822</endTime>
  46. <strengthValue>8</strengthValue>
  47. <strengthUnits>d</strengthUnits>
  48. </entryInfo>
  49. </ProductCode>
  50. <Medicationsinfo>
  51. <code>10160-0</code>
  52. <entryInfo>
  53. <statusCode>completed</statusCode>
  54. <startTime>20110729</startTime>
  55. <endTime>20110822</endTime>
  56. <strengthValue>24</strengthValue>
  57. <strengthUnits>h</strengthUnits>
  58. </entryInfo>
  59. <entryInfo>
  60. <statusCode>completed</statusCode>
  61. <startTime>20120130</startTime>
  62. <endTime>20120326</endTime>
  63. <strengthValue>12</strengthValue>
  64. <strengthUnits>h</strengthUnits>
  65. </entryInfo>
  66. <entryInfo>
  67. <statusCode>completed</statusCode>
  68. <startTime>20100412</startTime>
  69. <endTime>20110822</endTime>
  70. <strengthValue>8</strengthValue>
  71. <strengthUnits>d</strengthUnits>
  72. </entryInfo>
  73. </Medicationsinfo>
  74. <Medicationsinfo>
  75. <code>10160-0</code>
  76. <entryInfo>
  77. <statusCode>completed</statusCode>
  78. <startTime>20110729</startTime>
  79. <endTime>20110822</endTime>
  80. <strengthValue>24</strengthValue>
  81. <strengthUnits>h</strengthUnits>
  82. </entryInfo>
  83. <entryInfo>
  84. <statusCode>completed</statusCode>
  85. <startTime>20120130</startTime>
  86. <endTime>20120326</endTime>
  87. <strengthValue>12</strengthValue>
  88. <strengthUnits>h</strengthUnits>
  89. </entryInfo>
  90. <entryInfo>
  91. <statusCode>completed</statusCode>
  92. <startTime>20100412</startTime>
  93. <endTime>20110822</endTime>
  94. <strengthValue>8</strengthValue>
  95. <strengthUnits>d</strengthUnits>
  96. </entryInfo>
  97. </Medicationsinfo>
  98. </Document>
xuo3flqw

xuo3flqw1#

外部表的定义无效。以下是一些选项:

方案1

  1. create external table patient_info
  2. (
  3. code string
  4. ,entryInfo string
  5. )
  6. row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
  7. with serdeproperties
  8. (
  9. "column.xpath.code" = "/Medicationsinfo/code/text()"
  10. ,"column.xpath.entryInfo" = "/Medicationsinfo/entryInfo"
  11. )
  12. stored as
  13. inputformat 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
  14. outputformat 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
  15. location '/user/hive/warehouse/patient_info'
  16. tblproperties
  17. (
  18. "xmlinput.start" = "<Medicationsinfo"
  19. ,"xmlinput.end" = "</Medicationsinfo>"
  20. )
  21. ;
  22. select * from patient_info
  23. ;
  1. +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  2. | patient_info.code | patient_info.entryinfo |
  3. +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  4. | 10160-0 | <string><entryInfo><statusCode>completed</statusCode><startTime>20110729</startTime><endTime>20110822</endTime><strengthValue>24</strengthValue><strengthUnits>h</strengthUnits></entryInfo><entryInfo><statusCode>completed</statusCode><startTime>20120130</startTime><endTime>20120326</endTime><strengthValue>12</strengthValue><strengthUnits>h</strengthUnits></entryInfo><entryInfo><statusCode>completed</statusCode><startTime>20100412</startTime><endTime>20110822</endTime><strengthValue>8</strengthValue><strengthUnits>d</strengthUnits></entryInfo></string> |
  5. +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  6. | 10160-0 | <string><entryInfo><statusCode>completed</statusCode><startTime>20110729</startTime><endTime>20110822</endTime><strengthValue>24</strengthValue><strengthUnits>h</strengthUnits></entryInfo><entryInfo><statusCode>completed</statusCode><startTime>20120130</startTime><endTime>20120326</endTime><strengthValue>12</strengthValue><strengthUnits>h</strengthUnits></entryInfo><entryInfo><statusCode>completed</statusCode><startTime>20100412</startTime><endTime>20110822</endTime><strengthValue>8</strengthValue><strengthUnits>d</strengthUnits></entryInfo></string> |
  7. +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

方案2

  1. create external table patient_info
  2. (
  3. code string
  4. ,entryInfo array<map<string,map<string,string>>>
  5. )
  6. row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
  7. with serdeproperties
  8. (
  9. "column.xpath.code" = "/Medicationsinfo/code/text()"
  10. ,"column.xpath.entryInfo" = "/Medicationsinfo/entryInfo"
  11. )
  12. stored as
  13. inputformat 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
  14. outputformat 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
  15. location '/user/hive/warehouse/patient_info'
  16. tblproperties
  17. (
  18. "xmlinput.start" = "<Medicationsinfo"
  19. ,"xmlinput.end" = "</Medicationsinfo>"
  20. )
  21. ;
  22. select * from patient_info
  23. ;
  1. +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  2. | patient_info.code | patient_info.entryinfo |
  3. +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  4. | 10160-0 | [{"entryInfo":{"statusCode":"completed","startTime":"20110729","strengthUnits":"h","endTime":"20110822","strengthValue":"24"}},{"entryInfo":{"statusCode":"completed","startTime":"20120130","strengthUnits":"h","endTime":"20120326","strengthValue":"12"}},{"entryInfo":{"statusCode":"completed","startTime":"20100412","strengthUnits":"d","endTime":"20110822","strengthValue":"8"}}] |
  5. | 10160-0 | [{"entryInfo":{"statusCode":"completed","startTime":"20110729","strengthUnits":"h","endTime":"20110822","strengthValue":"24"}},{"entryInfo":{"statusCode":"completed","startTime":"20120130","strengthUnits":"h","endTime":"20120326","strengthValue":"12"}},{"entryInfo":{"statusCode":"completed","startTime":"20100412","strengthUnits":"d","endTime":"20110822","strengthValue":"8"}}] |
  6. +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

方案3

  1. create external table patient_info
  2. (
  3. code string
  4. ,entryInfo array<map<string,struct<statusCode:string,startTime:string,endTime:string,strengthValue:int,strengthUnits:string>>>
  5. )
  6. row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
  7. with serdeproperties
  8. (
  9. "column.xpath.code" = "/Medicationsinfo/code/text()"
  10. ,"column.xpath.entryInfo" = "/Medicationsinfo/entryInfo"
  11. )
  12. stored as
  13. inputformat 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
  14. outputformat 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
  15. location '/user/hive/warehouse/patient_info'
  16. tblproperties
  17. (
  18. "xmlinput.start" = "<Medicationsinfo"
  19. ,"xmlinput.end" = "</Medicationsinfo>"
  20. )
  21. ;
  22. select * from patient_info
  23. ;
  1. +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  2. | patient_info.code | patient_info.entryinfo |
  3. +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  4. | 10160-0 | [{"entryInfo":{"statuscode":"completed","starttime":"20110729","endtime":"20110822","strengthvalue":24,"strengthunits":"h"}},{"entryInfo":{"statuscode":"completed","starttime":"20120130","endtime":"20120326","strengthvalue":12,"strengthunits":"h"}},{"entryInfo":{"statuscode":"completed","starttime":"20100412","endtime":"20110822","strengthvalue":8,"strengthunits":"d"}}] |
  5. | 10160-0 | [{"entryInfo":{"statuscode":"completed","starttime":"20110729","endtime":"20110822","strengthvalue":24,"strengthunits":"h"}},{"entryInfo":{"statuscode":"completed","starttime":"20120130","endtime":"20120326","strengthvalue":12,"strengthunits":"h"}},{"entryInfo":{"statuscode":"completed","starttime":"20100412","endtime":"20110822","strengthvalue":8,"strengthunits":"d"}}] |
  6. +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

分解选项3

  1. select pi.code
  2. ,ei.i + 1 as i
  3. ,ei.entryInfo["entryInfo"].statusCode
  4. ,ei.entryInfo["entryInfo"].startTime
  5. ,ei.entryInfo["entryInfo"].endTime
  6. ,ei.entryInfo["entryInfo"].strengthValue
  7. ,ei.entryInfo["entryInfo"].strengthUnits
  8. from patient_info pi
  9. lateral view posexplode (entryInfo) ei as i,entryInfo
  10. ;
  1. +---------+---+------------+-----------+----------+---------------+---------------+
  2. | pi.code | i | statuscode | starttime | endtime | strengthvalue | strengthunits |
  3. +---------+---+------------+-----------+----------+---------------+---------------+
  4. | 10160-0 | 1 | completed | 20110729 | 20110822 | 24 | h |
  5. +---------+---+------------+-----------+----------+---------------+---------------+
  6. | 10160-0 | 2 | completed | 20120130 | 20120326 | 12 | h |
  7. +---------+---+------------+-----------+----------+---------------+---------------+
  8. | 10160-0 | 3 | completed | 20100412 | 20110822 | 8 | d |
  9. +---------+---+------------+-----------+----------+---------------+---------------+
  10. | 10160-0 | 1 | completed | 20110729 | 20110822 | 24 | h |
  11. +---------+---+------------+-----------+----------+---------------+---------------+
  12. | 10160-0 | 2 | completed | 20120130 | 20120326 | 12 | h |
  13. +---------+---+------------+-----------+----------+---------------+---------------+
  14. | 10160-0 | 3 | completed | 20100412 | 20110822 | 8 | d |
  15. +---------+---+------------+-----------+----------+---------------+---------------+
展开查看全部

相关问题