我想在Heroku上部署我的应用程序。我的应用程序抓取公寓网站的数据。对于一个URL,我有多个选择器。应用程序使用APSceduler运行。日志显示以下错误:
2020-08-10T11:02:56.259319+00:00 app[clock.1]: Running main
2020-08-10T11:04:34.374167+00:00 app[clock.1]: Job "main (trigger: interval[3:00:00], next run at: 2020-08-10 14:02:56 UTC)" raised an exception
2020-08-10T11:04:34.374183+00:00 app[clock.1]: Traceback (most recent call last):
2020-08-10T11:04:34.374184+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/apscheduler/executors/base.py", line 125, in run_job
2020-08-10T11:04:34.374184+00:00 app[clock.1]: retval = job.func(*job.args, **job.kwargs)
2020-08-10T11:04:34.374185+00:00 app[clock.1]: File "/app/scraper/common.py", line 70, in main
2020-08-10T11:04:34.374186+00:00 app[clock.1]: driver.get(listing.url)
2020-08-10T11:04:34.374187+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
2020-08-10T11:04:34.374188+00:00 app[clock.1]: self.execute(Command.GET, {'url': url})
2020-08-10T11:04:34.374188+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
2020-08-10T11:04:34.374189+00:00 app[clock.1]: self.error_handler.check_response(response)
2020-08-10T11:04:34.374189+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
2020-08-10T11:04:34.374190+00:00 app[clock.1]: raise exception_class(message, screen, stacktrace)
2020-08-10T11:04:34.374191+00:00 app[clock.1]: selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=netTimeout&u=x&d=The%20server%20at%20x%20is%20taking%20too%20long%20to%20respond.
已解码:
about:netror?e=netTimeout&u=&d=位于x的服务器响应时间过长。
如果我转到链接,我就可以访问它。我已经禁用了JavaScript和图像,以便更快地加载链接。
我不确定这里有什么问题。
3条答案
按热度按时间tpgth1q71#
结果发现,目标网站屏蔽了Heroku。解决方法是使用代理
yzuktlbb2#
我认为您希望等到您正在寻找的元素等待:
b4wnujal3#
遇到同样的问题,也许其他人也可以防止同样的错误。如果网站使用
http
,但你输入https
,它也会有这个确切的错误示例:
正确的网站:
http://some-website.com
driver.get('http://some-website.com')
错误