我正在尝试用nutch使用协议selenium和phantomjs驱动程序来抓取基于ajax的站点。我使用的是从nutch的github存储库编译的apache-nutch-1.13。这些爬网作为任务在mesos管理的系统中启动。当我从服务器的终端启动nutch的crawl脚本时,一切都很顺利,站点按照我的要求进行了爬网。但是,当我在mesos任务中使用相同的参数执行相同的爬网脚本时,nutch引发了一个异常:
fetch of http://XXXXX failed with: java.lang.RuntimeException: org.openqa.selenium.NoSuchElementException: {"errorMessage":"Unable to find element with tag name 'body'","request":{"headers":{"Accept-Encoding":"gzip,deflate","Connection":"Keep-Alive","Content-Length":"35","Content-Type":"application/json; charset=utf-8","Host":"localhost:12215","User-Agent":"Apache-HttpClient/4.3.5 (java 1.5)"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"tag name\",\"value\":\"body\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7f98ec0-b8aa-11e6-8b84-232b0d8e1024/element"}}
我的第一印象是环境变量(hadoop\u home,path,classpath…)有些奇怪,但我在nutch脚本和终端中使用了相同的变量,结果仍然相同。
你知道我做错了什么吗?
暂无答案!
目前还没有任何答案,快来回答吧!