我正在使用hadoop1.1.2、hbase0.9nutch2.2.1和solr。当我使用nutch而不使用hadoop时,一切都很好。我可以启动一个没有任何问题的单节点集群,当我尝试用hadoop在单节点模式下爬行时,我得到这个警告
17/12/08 14:42:30 WARN snappy.LoadSnappy: Snappy native library not loaded
然后,在减少工作的过程中,我得到了这些错误
17/12/08 14:42:57 INFO mapred.JobClient: map 100% reduce 33%
17/12/08 14:42:59 INFO mapred.JobClient: map 100% reduce 50%
17/12/08 14:43:00 INFO mapred.JobClient: map 100% reduce 66%
17/12/08 14:43:08 INFO mapred.JobClient: Task Id :
attempt_201712081441_0002_r_000000_0, Status : FAILED
java.lang.NullPointerException
at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
at
org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
我使用的是java6,因为java8会出现同样的错误和其他警告。要执行爬网,我使用以下命令
hadoop jar apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -solr http://localhost:8983/solr/ -depth 2
1条答案
按热度按时间3pmvbmvn1#
我有三个日志文件
它们是:
log.index
LOG_DIR:$HADOOP_HOME/libexec/../logs/userlogs/job_201712081441_0002/attempt_201712081441_0002_r_000000_0
```stdout:0 -1
stderr:0 -1
syslog:0 -1