avroruntimeexception在hive中执行某些hql时发生

des4xlb0  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(505)

我在用flume(1.5.2)和hive(0.14.0)做hadoop(2.6.0)twitter的例子。我通过flume成功地从twitter获取了数据,并将它们存储到我自己的hdfs中。
但是当我想使用hive来处理这些数据进行一些分析(只从一个表中选择一个字段)时,“java.io.io”异常失败exception:org.apache.avro.avroruntimeexception:java.io.eofexception“发生了异常,我几乎找不到与此异常相关的有用信息。
实际上,我可以成功地获取文件的大多数记录(如下面的信息,我成功地获取了5100行),但最终会失败。因此,我不能处理所有的推文文件一起。

  1. Time taken: 1.512 seconds, Fetched: 5100 row(s)
  2. Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
  3. 15/04/15 19:59:18 [main]: ERROR CliDriver: Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
  4. java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.EOFException
  5. at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)
  6. at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)
  7. at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
  8. at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
  9. at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
  10. at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
  11. at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
  12. at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
  13. at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
  14. at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
  15. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  16. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  17. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  18. at java.lang.reflect.Method.invoke(Method.java:606)
  19. at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  20. at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  21. Caused by: org.apache.avro.AvroRuntimeException: java.io.EOFException
  22. at org.apache.avro.file.DataFileStream.next(DataFileStream.java:222)
  23. at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:153)
  24. at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:52)
  25. at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:629)
  26. ... 15 more
  27. Caused by: java.io.EOFException
  28. at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
  29. at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
  30. at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
  31. at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
  32. at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
  33. at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:341)
  34. at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
  35. at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
  36. at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
  37. at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
  38. at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
  39. at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
  40. ... 18 more

我使用下面的hql创建一个表

  1. CREATE TABLE tweets
  2. ROW FORMAT SERDE
  3. 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  4. STORED AS INPUTFORMAT
  5. 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  6. OUTPUTFORMAT
  7. 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  8. TBLPROPERTIES ('avro.schema.url'='file:///home/hduser/hive-0.14.0-bin/tweetsdoc_new.avsc');

然后从hdfs加载tweets文件

  1. LOAD DATA INPATH '/user/flume/tweets/FlumeData.1429098355304' OVERWRITE INTO TABLE tweets;

有谁能告诉我可能的原因,或者一个有效的方法来找到更多的例外细节吗?

eeq64g8w

eeq64g8w1#

我也有这个恼人的问题。
我查看了生成的二进制文件并调试了位的avro反序列化。
这个eofexception的原因是flume在每个事件之后插入新行字符字节(您可以在每个记录之后注意到0x0a)。
avro反序列化程序认为文件尚未完成,并将该字符解释为要读取的块数,但如果不命中eof,则无法读取该块数。

相关问题