我读过很多关于为什么在生成/注入/解析/获取时需要这么长时间(或挂起)的so线程,但运气不好。下面几个线程中的解决方案我尝试过实现,但没有成功。
1) nutch 2.1 URL注入需要很长时间
2) 喷油器工作后nutch 2.2.1不继续
以及其他各种螺纹。
我使用的是nutch2.3.1和hbase0.94.27。我一直在遵循这个和这个教程,我能够成功地建立。但当我发出任何命令时,它就会挂断。
下面是我在发射时得到的日志commands:-
inject命令
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt
生成命令
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
fetch命令
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
parse命令
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming: false
ParserJob: forced reparse: false
ParserJob: parsing all
update命令
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all
下面是hbaselogs:-
client /0:0:0:0:0:0:0:1:45216
2016-05-04 10:00:47,214 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000e, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,215 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:45216 which had sessionid 0x1547b2be4bc000e
2016-05-04 10:00:47,215 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000d, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,216 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:59934 which had sessionid 0x1547b2be4bc000d
2016-05-04 10:01:10,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000c, timeout of 40000ms exceeded
2016-05-04 10:01:10,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000c
2016-05-04 10:01:22,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000b, timeout of 40000ms exceeded
2016-05-04 10:01:22,003 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000b
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000e, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000d, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000e
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000d
2016-05-04 10:02:25,195 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59938
2016-05-04 10:02:25,202 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59938
2016-05-04 10:02:25,204 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc000f with negotiated timeout 40000 for client /127.0.0.1:59938
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59940
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59940
2016-05-04 10:02:25,825 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc0010 with negotiated timeout 40000 for client /127.0.0.1:59940
2016-05-04 10:04:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
2016-05-04 10:04:28,372 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@25e5c862
2016-05-04 10:04:28,379 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 0 catalog row(s) and gc'd 0 unreferenced parent region(s)
2016-05-04 10:09:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
hadoop.log文件
2016-05-04 10:42:18,132 INFO crawl.InjectorJob - InjectorJob: starting at 2016-05-04 10:42:18
2016-05-04 10:42:18,134 INFO crawl.InjectorJob - InjectorJob: Injecting urlDir: seed/urls.txt
2016-05-04 10:42:18,527 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
到底是什么问题。我已经正确配置了一切,但它仍然挂断。为什么?
暂无答案!
目前还没有任何答案,快来回答吧!