在hadoop上运行nutch时出现eofexception

6rqinv9w  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(400)

我在hadoop2.5.2上运行nutch2.3,在hbase 0.98.12上运行gora 0.6,在执行nutch生成过程时,hadoop抛出了一个eofexception。欢迎任何建议。
2015-05-18 15:22:06578信息[main]mapreduce.job(作业。java:monitorandprintjob(1362))-Map100%减少0%2015-05-18 15:22:13697信息[main]mapreduce.job(作业。java:monitorandprintjob(1362))-Map100%减少50%2015-05-18 15:22:14720信息[main]mapreduce.job(作业。java:printtaskevents(1441))-任务id:尝试\u 1431932258783 \u 0006 \u r \u000001 \u 0,状态:失败错误:java.io.eofexception at org.apache.avro.io.binarydecoder.Ensus(binarydecoder)。java:473)在org.apache.avro.io.binarydecover.readint(binarydecover。java:128)在org.apache.avro.io.binarydecoder.readindex(binarydecoder。java:423)在org.apache.avro.io.resolvingdecoder.doaction(resolvingdecoder。java:229)在org.apache.avro.io.parsing.parser.advance(解析器。java:88)在org.apache.avro.io.resolvingdecoder.readindex(resolvingdecoder。java:206)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:152)在org.apache.avro.generic.genericdatumreader.readrecord(genericdatumreader。java:177)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:148)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:139)在org.apache.hadoop.io.serializer.avro.avroserialization$avrodeserializer.deserialize(avroserialization)。java:127)在org.apache.hadoop.mapreduce.task.reducecontextimpl.nextkeyvalue(reducecontextimpl。java:146)在org.apache.hadoop.mapreduce.task.reducecontextimpl.nextkey(reducecontextimpl。java:121)在org.apache.hadoop.mapreduce.lib.reduce.wrappedreducer$context.nextkey(wrappedreducer。java:302)在org.apache.hadoop.mapreduce.reducer.run(reducer。java:170)在org.apache.hadoop.mapred.reducetask.runnewreducer(reducetask。java:627)在org.apache.hadoop.mapred.reducetask.run(reducetask。java:389)在org.apache.hadoop.mapred.yarnchild$2.run(yarnchild。java:168)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:415)在org.apache.hadoop.security.usergroupinformation.doas(用户组信息。java:1614)在org.apache.hadoop.mapred.yarnchild.main(yarnchild。java:163)
2015-05-18 15:22:21901信息[main]mapreduce.job(作业。java:printtaskevents(1441))-任务id:尝试\u 1431932258783 \u 0006 \u r \u000001 \u 1,状态:失败错误:java.io.eofexception at org.apache.avro.io.binarydecoder.Ensus(binarydecoder)。java:473)在org.apache.avro.io.binarydecover.readint(binarydecover。java:128)在org.apache.avro.io.binarydecoder.readindex(binarydecoder。java:423)在org.apache.avro.io.resolvingdecoder.doaction(resolvingdecoder。java:229)在org.apache.avro.io.parsing.parser.advance(解析器。java:88)在org.apache.avro.io.resolvingdecoder.readindex(resolvingdecoder。java:206)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:152)在org.apache.avro.generic.genericdatumreader.readrecord(genericdatumreader。java:177)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:148)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:139)在org.apache.hadoop.io.serializer.avro.avroserialization$avrodeserializer.deserialize(avroserialization)。java:127)在org.apache.hadoop.mapreduce.task.reducecontextimpl.nextkeyvalue(reducecontextimpl。java:146)在org.apache.hadoop.mapreduce.task.reducecontextimpl.nextkey(reducecontextimpl。java:121)在org.apache.hadoop.mapreduce.lib.reduce.wrappedreducer$context.nextkey(wrappedreducer。java:302)在org.apache.hadoop.mapreduce.reducer.run(reducer。java:170)在org.apache.hadoop.mapred.reducetask.runnewreducer(reducetask。java:627)在org.apache.hadoop.mapred.reducetask.run(reducetask。java:389)在org.apache.hadoop.mapred.yarnchild$2.run(yarnchild。java:168)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:415)在org.apache.hadoop.security.usergroupinformation.doas(用户组信息。java:1614)在org.apache.hadoop.mapred.yarnchild.main(yarnchild。java:163)
2015-05-18 15:22:28986信息[main]mapreduce.job(作业。java:printtaskevents(1441))-任务id:尝试\u 1431932258783 \u 0006 \u r \u000001 \u 2,状态:失败错误:java.io.eofexception at org.apache.avro.io.binarydecoder.Ensus(binarydecoder)。java:473)在org.apache.avro.io.binarydecover.readint(binarydecover。java:128)在org.apache.avro.io.binarydecoder.readindex(binarydecoder。java:423)在org.apache.avro.io.resolvingdecoder.doaction(resolvingdecoder。java:229)在org.apache.avro.io.parsing.parser.advance(解析器。java:88)在org.apache.avro.io.resolvingdecoder.readindex(resolvingdecoder。java:206)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:152)在org.apache.avro.generic.genericdatumreader.readrecord(genericdatumreader。java:177)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:148)在org.apache.avro.generic.genericdatumreader.read(genericdatumreader。java:139)在org.apache.hadoop.io.serializer.avro.avroserialization$avrodeserializer.deserialize(avroserialization)。java:127)在org.apache.hadoop.mapreduce.task.reducecontextimpl.nextkeyvalue(reducecontextimpl。java:146)在org.apache.hadoop.mapreduce.task.reducecontextimpl.nextkey(reducecontextimpl。java:121)在org.apache.hadoop.mapreduce.lib.reduce.wrappedreducer$context.nextkey(wrappedreducer。java:302)在org.apache.hadoop.mapreduce.reducer.run(reducer。java:170)在org.apache.hadoop.mapred.reducetask.runnewreducer(reducetask。java:627)在org.apache.hadoop.mapred.reducetask.run(reducetask。java:389)在org.apache.hadoop.mapred.yarnchild$2.run(yarnchild。java:168)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:415)在org.apache.hadoop.security.usergroupinformation.doas(用户组信息。java:1614)在org.apache.hadoop.mapred.yarnchild.main(yarnchild。java:163)
2015-05-18 15:22:37078信息[main]mapreduce.job(作业。java:monitorandprintjob(1362))-map 100%reduce 100%2015-05-18 15:22:37109信息[main]mapreduce.job(作业。java:monitorandprintjob(1375))-作业作业\u 1431932258783 \u 0006失败,状态为failed,原因是:task failed task \u 1431932258783 \u 0006 \u r\000001 job failed as tasks failed。failedmaps:0 failedreduces:1
2015-05-18 15:22:37,256 info[main]mapreduce.job(作业。java:monitorandprintjob(1380))-计数器:50文件系统计数器文件:读取字节数=22文件:写入字节数=232081文件:读取操作数=0文件:大读取操作数=0文件:写入操作数=0 hdfs:读取字节数=612 hdfs:写入字节数=0 hdfs:读取操作数=1 hdfs:大读取操作数=0 hdfs:写入操作数=0作业计数器失败reduce tasks=4 launched map tasks=1 launched reduce tasks=5 rack local map tasks=1占用插槽中所有Map所花费的总时间(ms)=10399占用插槽中所有reduce所花费的总时间(ms)=23225占用插槽中所有Map所花费的总时间map tasks(ms)=10399所有reduce tasks花费的总时间(ms)=23225所有map tasks占用的总vcore秒数=10399所有reduce tasks占用的总vcore秒数=23225所有map tasks占用的总兆字节秒数=10648576所有reduce tasks占用的总兆字节秒数=23782400 map reduce framework map input records=1 map output records=1 map输出字节=32Map输出物化字节=62输入分割字节=612合并输入记录=0合并输出记录=0减少输入组=0减少无序字节=14减少输入记录=0减少输出记录=0溢出记录=1无序Map=1失败无序Map=0合并Map输出=1 gc所用时间(ms)=175 cpu所用时间(ms)=6860物理内存(字节)快照=628305920虚拟内存(字节)快照=3198902272总提交堆使用率(字节)=481820672乱序错误错误\u id=0连接=0 io错误=0错误\u长度=0错误\uMap=0错误\u缩减=0文件输入格式计数器字节读取=0文件输出格式计数器字节写入=0 2015-05-18 15:22:37,266错误[main]crawl.generatorjob(generatorjob。java:run(310))-generatorjob:java.lang.runtimeexception:job failed:name=[t2]generate:1431933684-12185,jobid=job\u 1431932258783\u 0006,位于org.apache.nutch.util.nutchjob.waitforcompletion(nutchjob)。java:54)在org.apache.nutch.crawl.generatorjob.run(generatorjob。java:213)在org.apache.nutch.crawl.generatorjob.generate(generatorjob。java:241)在org.apache.nutch.crawl.generatorjob.run(generatorjob。java:308)在org.apache.hadoop.util.toolrunner.run(toolrunner。java:70)在org.apache.nutch.crawl.generatorjob.main(generatorjob。java:316)在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)在sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl)。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:606)在org.apache.hadoop.util.runjar.main(runjar。java:212)
运行时出错:/usr/pro/nutch2.3/deploy/bin/nutch generate-d mapred.reduce.tasks=2-d mapred.child.java.opts=-xmx1000m-d mapred.reduce.tasks.sculative.execution=false-d mapred.map.tasks.sculative.execution=false-d mapred.compress.map.output=true-topn 50000-nonorm-nofilter-adddays 0-crawlid t2-batchid 1431933684-12185

mo49yndu

mo49yndu1#

我对同一个配置也有同样的问题。我的问题通过添加

<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization</value>
  <description>A list of serialization classes that can be used for
  obtaining serializers and deserializers.</description>
</property>

到nutch-site.xml。多亏了http://quabr.com/26180364/cant-run-nutch2-on-hadoop2-nutch-2-x-hadoop-2-4-0-hbase-0-94-18-gora-0-5

ftf50wuq

ftf50wuq2#

遵循这个过程也许你的问题会得到解决!!
编辑ivy.xml-​ 小心非常重要的一步

<dependency org=”org.apache.gora” name=”gora-hbase” rev=”0.6.1′′ conf=”*->default” />
<dependency org=”org.apache.solr” name=”solr-solrj” rev=”4.1.0′′ conf=”*->default” />

添加此行

<dependency org=”org.apache.hbase” name=”hbase-common” rev=”0.98.8-hadoop2′′
conf=”*->default” />

转到stack/apache-nutch-2.3.1/conf edit gora.properties

gora.datastore.default=org.apache.gora.hbase.store.HBaseStore

编辑hbase.xml

<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<!–Here you have to set the path where you want HBase to store its built in zookeeper files.–>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://localhost:9000/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>

编辑nutch-site.xml

<configuration>
<property>
<name>http.agent.name</name>
<value>NutchSpider</value>
</property><property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-op
ic|urlnormalizer-(pass|regex|basic)</value>
</property>
</configuration>

清理nutch-ant的构建清理构建nutch-ant运行时

相关问题