使用pig查询avro数据时出错,utf8不能转换为java.lang.string

cld4siwp  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(504)

我已经使用flume将twitter数据下载到hdfs中,但是当我尝试使用pig查询它时,我得到了一个类转换异常,无法从utf-8转换为string。

  1. grunt> A= LOAD '/apps/hive/warehouse/twtr_uk.db/twitterdata_09062015/' USING AvroStorage ('{
  2. >> "type" : "record",
  3. >> "name" : "Doc",
  4. >> "doc" : "adoc",
  5. >> "fields" : [
  6. >> {
  7. >> "name" : "id",
  8. >> "type" : "string"
  9. >> },
  10. >> {
  11. >> "name" : "user_friends_count",
  12. >> "type" : [ "int", "null" ]
  13. >> },
  14. >> {
  15. >> "name" : "user_location",
  16. >> "type" : [ "string", "null" ]
  17. >> },
  18. >> {
  19. >> "name" : "user_description",
  20. >> "type" : [ "string", "null" ]
  21. >> }, {
  22. >> "name" : "user_statuses_count",
  23. >> "type" : [ "int", "null" ]
  24. >> }, {
  25. >> "name" : "user_followers_count",
  26. >> "type" : [ "int", "null" ]
  27. >> }, {
  28. >> "name" : "user_name",
  29. >> "type" : [ "string", "null" ]
  30. >> }, {
  31. >> "name" : "user_screen_name",
  32. >> "type" : [ "string", "null" ]
  33. >> }, {
  34. >> "name" : "created_at",
  35. >> "type" : [ "string", "null" ]
  36. >> }, {
  37. >> "name" : "text",
  38. >> "type" : [ "string", "null" ]
  39. >> }, {
  40. >> "name" : "retweet_count",
  41. >> "type" : [ "long", "null" ]
  42. >> }, {
  43. >> "name" : "retweeted",
  44. >> "type" : [ "boolean", "null" ]
  45. >> }, {
  46. >> "name" : "in_reply_to_user_id",
  47. >> "type" : [ "long", "null" ]
  48. >> }, {
  49. >> "name" : "source",
  50. >> "type" : [ "string", "null" ]
  51. >> }, {
  52. >> "name" : "in_reply_to_status_id",
  53. >> "type" : [ "long", "null" ]
  54. >> }, {
  55. >> "name" : "media_url_https",
  56. >> "type" : [ "string", "null" ]
  57. >> }, {
  58. >> "name" : "expanded_url",
  59. >> "type" : [ "string", "null" ]
  60. >> } ]
  61. >> }');
  62. grunt> illustrate A;
  63. 2015-06-11 10:07:05,361 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
  64. 2015-06-11 10:07:05,382 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
  65. 2015-06-11 10:07:05,382 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[ConstantCalculator, LoadTypeCastInserter, PredicatePushdownOptimizer, StreamTypeCastInserter], RULES_DISABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter]}
  66. 2015-06-11 10:07:05,383 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
  67. 2015-06-11 10:07:05,384 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
  68. 2015-06-11 10:07:05,384 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
  69. 2015-06-11 10:07:05,385 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
  70. 2015-06-11 10:07:05,385 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
  71. 2015-06-11 10:07:05,426 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
  72. 2015-06-11 10:07:05,426 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[123,3] C: R:
  73. 2015-06-11 10:07:05,436 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 6
  74. 2015-06-11 10:07:05,436 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 6
  75. java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.String
  76. at org.apache.pig.impl.util.avro.AvroTupleWrapper.getMemorySize(AvroTupleWrapper.java:201)
  77. at org.apache.pig.impl.util.avro.AvroTupleWrapper.getMemorySize(AvroTupleWrapper.java:178)
  78. at org.apache.pig.pen.util.ExampleTuple.getMemorySize(ExampleTuple.java:97)
  79. at org.apache.pig.data.DefaultAbstractBag.sampleContents(DefaultAbstractBag.java:101)

错误2997:遇到ioexception。例外

oipij1gg

oipij1gg1#

如果hdfs中有avro数据,则不需要显式指定avro模式,请尝试如下运行。
a=使用avrostorage()加载“/apps/hive/warehouse/twtrïu uk.db/twitterdataï09062015/”;

相关问题