aim 运行状态如何处理?

qoefvg9y  于 25天前  发布在  其他
关注(0)|答案(1)|浏览(21)

❓Question

It's extremely unclear to me how run status (active, finished, failed etc...) is determined - specifically whether a run is active. In my code, I'm calling report_successful_finish when my model has finished training and testing and I've uploaded the figures I want to, but I can't tell if this actually impacts the state? Most of my runs automatically transition to the finished state, but not always. Does this happen automatically when the process exits? When the run object is destroyed?
My dashboard is littered with week-old runs that still show as in progress. In some cases, maybe the processes crashed? I can't tell. I've tried using the CLI to "close" them with little success - usually it reports no errors but the run still shows as in progress.
I've searched extensively through the documentation but I hardly see anything about this.

rekjcdws

rekjcdws1#

嘿,@gpascale!抱歉回复晚了,感谢你的问题。我们尝试在进程退出时自动将运行状态转换为已完成状态(即使抛出异常)。但是有些情况下,进程会挂起或被杀死,在这些情况下我们无能为力。
然而,我们在aim up命令中还有一个后台任务作为备份计划,用于检查仍处于活动状态的运行,并且没有其他进程持有该运行的锁(这是进程被杀死的情况)。因此,唯一未处理的情况应该是进程挂起。如果你能提供更多关于这种情况如何发生的详细信息,也许我可以提供更多帮助或在我这边尝试重现以查看出了什么问题。

相关问题