我一直在使用stormcrawler爬网网站。作为https协议,我在stormcrawler中设置了默认https协议。但是,当我爬网一些网站时,我收到以下例外情况:
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) ~[?:1.8.0_131]
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) ~[?:1.8.0_131]
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) ~[?:1.8.0_131]
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382) ~[?:1.8.0_131]
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292) ~[?:1.8.0_131]
at sun.security.validator.Validator.validate(Validator.java:260) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) ~[?:1.8.0_131]
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496) ~[?:1.8.0_131]
... 20 more
是否有自动下载证书和设置爬虫程序的机制?我应该如何设置爬虫程序的配置?
1条答案
按热度按时间cs7cruho1#
这个问题不是特定于stormcrawler的。这个答案说明,您可以手动导入证书,这实际上不是一个选项,除非您专门对该站点进行爬网。另一个选项是禁用证书验证。这需要修改协议实现,但应该是可行的。
您尝试过okhttp实现吗?它的行为可能与apachehttclient不同。请参阅http wiki。