我正在开发一个应用程序,它的主要任务是从LinkedIn的个人资料中截取文本,处理这些文本,并返回一个包含个人资料中最常见单词的字典。在本地机器上一切都运行得很好,但是当我决定在Heroku上部署这个应用程序时,出现了一些问题。我的截取过程似乎需要5-7分钟,所以我在Heroku上达到了请求超时。为了避免这种情况,我将Celery应用到我的项目中,在后台运行这个过程。现在我有一个问题,要在Heroku上顺利部署这个过程。
项目结构:
web-sourcing-tools
├── app
│ ├── agents
│ │ ├── __init__.py
│ │ ├── data_processing.py
│ │ ├── scraper.py
│ │ └── string_builder.py
│ ├── library
│ │ └── helpers.py
│ ├── pages
│ │ ├── __init__.py
│ │ └── home.md
│ └── __init__.py
├── static
│ ├── css
│ │ ├── mystyle.css
│ │ └── style3.css
│ └── images
│ └── favicon.ico
├── templates
│ ├── include
│ │ ├── sidebar.html
│ │ └── topnav.html
│ ├── base.html
│ ├── form.html
│ └── page.html
├── .gitignore
├── __init__.py
├── main.py
├── nltk.txt
├── Procfile
├── README.md
├── requirements.txt
├── runtime.txt
└── tasks.py
过程文件:
web: gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app
worker: celery worker --app=tasks.app
运行时间.txt
runtime.txt
任务.py
from celery import Celery
import os
from app.agents.scraper import Scraper
app = Celery(__name__)
app.conf.update(
BROKER_URL=os.environ["REDIS_URL"],
CELERY_RESULT_BACKEND=os.environ["REDIS_URL"]
)
@app.task(name="scraper")
def scraper(username, password, query, n_pages):
results = Scraper(username, password, query, n_pages)
return results
主文件名.py
from fastapi import FastAPI, Request, Form
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from fastapi.staticfiles import StaticFiles
from app.library.helpers import *
from app.agents.string_builder import string_builder
from tasks import scraper
LOGIN = os.environ.get("LOGIN")
PASS = os.environ.get("PASS")
app = FastAPI()
templates = Jinja2Templates(directory="templates")
app.mount("/static", StaticFiles(directory="static"), name="static")
@app.get("/", response_class=HTMLResponse)
async def home(request: Request):
data = openfile("home.md")
return templates.TemplateResponse("page.html", {"request": request, "data": data})
@app.post("/common-words")
def form_post(
request: Request,
string_or: str = Form(...),
string_and: str = Form(...),
string_not: str = Form(...),
):
query = string_builder(OR=string_or, AND=string_and, NOT=string_not)
n_page = 2
task = scraper.delay(LOGIN, PASS, query, n_page)
return templates.TemplateResponse(
"form.html", context={"request": request, "result": task.get()}
)
@app.get("/common-words")
def form_post(request: Request):
result = ""
return templates.TemplateResponse(
"form.html", context={"request": request, "result": result}
)
if __name__ == "__main__":
app.run()
来自heroku控制台的错误:
2022-01-17T23:42:38.383531+00:00 heroku[router]: at=info method=GET path="/common-words" host=web-sourcing-tools.herokuapp.com request_id=21dd948a-b7e5-46f8-8c1c-9b5a3f091592 fwd="95.175.20.47" dyno=web.1 connect=0ms service=7ms status=200 bytes=6691 protocol=https
2022-01-17T23:43:11.505703+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=POST path="/common-words" host=web-sourcing-tools.herokuapp.com request_id=59465f6f-27d0-4583-83e8-40e6e6e5bd8d fwd="95.175.20.47" dyno=web.1 connect=0ms service=30000ms status=503 bytes=0 protocol=https
2022-01-17T23:43:12.148229+00:00 app[web.1]: 95.175.20.47:0 - "GET /favicon.ico HTTP/1.1" 404
2022-01-17T23:43:12.149208+00:00 heroku[router]: at=info method=GET path="/favicon.ico" host=web-sourcing-tools.herokuapp.com request_id=e079f8a2-a58b-4b3c-8bda-c2d4acd362ef fwd="95.175.20.47" dyno=web.1 connect=0ms service=3ms status=404 bytes=173 protocol=https
2022-01-17T23:44:10.922495+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=POST path="/common-words" host=web-sourcing-tools.herokuapp.com request_id=2664e65d-30a9-485f-8048-f67515d624a4 fwd="95.175.20.47" dyno=web.1 connect=0ms service=30000ms status=503 bytes=0 protocol=https
2022-01-17T23:44:15.101837+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=POST path="/common-words" host=web-sourcing-tools.herokuapp.com request_id=7c88d428-e3e5-4b0e-88f9-4769ac229c24 fwd="95.175.20.47" dyno=web.1 connect=0ms service=30000ms status=503 bytes=0 protocol=https
2022-01-17T23:42:56.000000+00:00 app[heroku-redis]: source=REDIS addon=redis-closed-93849 sample#active-connections=5 sample#load-avg-1m=0.16 sample#load-avg-5m=0.205 sample#load-avg-15m=0.215 sample#read-iops=0 sample#write-iops=0 sample#memory-total=15619140kB sample#memory-free=10414152kB sample#memory-cached=2560180kB sample#memory-redis=433568bytes sample#hit-rate=0.21569 sample#evicted-keys=0
2022-01-17T23:46:40.000000+00:00 app[heroku-redis]: source=REDIS addon=redis-closed-93849 sample#active-connections=8 sample#load-avg-1m=0.095 sample#load-avg-5m=0.15 sample#load-avg-15m=0.185 sample#read-iops=0 sample#write-iops=0 sample#memory-total=15619140kB sample#memory-free=10413852kB sample#memory-cached=2560192kB sample#memory-redis=499248bytes sample#hit-rate=0.21053 sample#evicted-keys=0
2022-01-17T23:50:40.000000+00:00 app[heroku-redis]: source=REDIS addon=redis-closed-93849 sample#active-connections=4 sample#load-avg-1m=0.175 sample#load-avg-5m=0.14 sample#load-avg-15m=0.17 sample#read-iops=0 sample#write-iops=0 sample#memory-total=15619140kB sample#memory-free=10414380kB sample#memory-cached=2560276kB sample#memory-redis=415400bytes sample#hit-rate=0.21053 sample#evicted-keys=0
2022-01-17T23:54:36.000000+00:00 app[heroku-redis]: source=REDIS addon=redis-closed-93849 sample#active-connections=4 sample#load-avg-1m=0.09 sample#load-avg-5m=0.1 sample#load-avg-15m=0.145 sample#read-iops=0 sample#write-iops=0 sample#memory-total=15619140kB sample#memory-free=10418696kB sample#memory-cached=2560544kB sample#memory-redis=415400bytes sample#hit-rate=0.21053 sample#evicted-keys=0
2022-01-17T23:58:20.000000+00:00 app[heroku-redis]: source=REDIS addon=redis-closed-93849 sample#active-connections=4 sample#load-avg-1m=0.18 sample#load-avg-5m=0.135 sample#load-avg-15m=0.145 sample#read-iops=0 sample#write-iops=0 sample#memory-total=15619140kB sample#memory-free=10418720kB sample#memory-cached=2560560kB sample#memory-redis=415400bytes sample#hit-rate=0.21053 sample#evicted-keys=0
2022-01-18T00:02:16.000000+00:00 app[heroku-redis]: source=REDIS addon=redis-closed-93849 sample#active-connections=4 sample#load-avg-1m=0.355 sample#load-avg-5m=0.315 sample#load-avg-15m=0.215 sample#read-iops=0 sample#write-iops=0.063241 sample#memory-total=15619140kB sample#memory-free=10421644kB sample#memory-cached=2560532kB sample#memory-redis=415400bytes sample#hit-rate=0.21053 sample#evicted-keys=0
在www.example.com中task.py,我从from app.agents.scraper import Scraper
导入我的主脚本-该类返回dict包含的值- word和quantity。
在Heroku中,我添加了如下配置变量:
我犯错误的时候你有什么想法吗?
2条答案
按热度按时间pkmbmrz71#
这可能是LinkedIn阻止/限制云托管服务的IP范围。https://github.com/spinlud/linkedin-jobs-scraper/issues/10#issuecomment-692537789
bxfogqkk2#
你在heroku的资源页面激活了worker dyno吗?在我的例子中,我错过了。
你也可以像这样更新你的
Procfile
: