rq(redis queue)工作马意外终止，如何调试的建议？

nqwrtyyt 于 2021-06-09 发布在 Redis

关注(0)|答案(1)|浏览(609)

我用一个rq工人来处理大量的工作，我遇到了问题。

观察

工作回报 work-horse terminated unexpectedly; waitpid returned None 作业连接到一个数据库并简单地运行几个sql语句，比如简单的insert或delete语句。
错误消息几乎立即发生：在启动的几秒钟内。
有时工作运行良好，没有问题。
在其中一个作业上，我可以看到它执行了一个insert，但随后只是返回错误。
在rq worker上，我看到以下日志条目。

{"message": "my_queue: my_job() (dcf797c4-1434-4b77-a344-5bbb1f775113)"}
{"message": "Killed horse pid 8451"}
{"message": "Moving job to FailedJobRegistry (work-horse terminated unexpectedly; waitpid returned None)"}

挖掘rq代码(https://github.com/rq/rq)“，”killed horse pid…“行是rq故意杀死作业本身的提示。作业终止代码发生的唯一地方是在下面的代码段中。到达 self.kill_horse() 线路，a HorseMonitorTimeoutException 必须发生 utcnow - job.started_at 区别必须是>job.timeout（顺便说一句，超时是巨大的）。

while True:
            try:
                with UnixSignalDeathPenalty(self.job_monitoring_interval, HorseMonitorTimeoutException):
                    retpid, ret_val = os.waitpid(self._horse_pid, 0)
                break
            except HorseMonitorTimeoutException:
                # Horse has not exited yet and is still running.
                # Send a heartbeat to keep the worker alive.
                self.heartbeat(self.job_monitoring_interval + 5)

                # Kill the job from this side if something is really wrong (interpreter lock/etc).
                if job.timeout != -1 and (utcnow() - job.started_at).total_seconds() > (job.timeout + 1):
                    self.kill_horse()
                    break

有时候，在工人真正找到工作之前，这些工作会在队列中停留很长时间。我本以为启动时会被重置。这种假设可能是错误的。
这些作业是使用rq\u调度程序创建的，它们使用cron字符串定期启动（每天晚上11点，等等）
我下一步该怎么做？

来源：https://stackoverflow.com/questions/61764915/rq-redis-queue-work-horse-terminated-unexpectedly-suggestions-on-how-to-debug

1条答案

按热度按时间

3hvapo4f1#

我认为rq的最新版本(https://github.com/rq/rq/releases/tag/v1.4.0)有解决办法。 Fixed a bug that may cause early termination of scheduled or requeued jobs. Thanks @rmartin48!

赞(0）回复(0）举报 2021-06-09

我来回答

rq(redis queue)工作马意外终止，如何调试的建议？

观察

1条答案

相关问题

热门标签

最新问答