我正在学习Elixir和Erlang/OTP,想了解在构建高可用性系统中拥有监督树的重要性。
我明白了管理员在管理工作进程生命周期中的重要性。但是,我仍然想知道为什么一些应用程序需要以层次结构的形式组织管理员,而不是只有一个管理员来管理所有的工作进程?拥有这样的结构有什么实际的好处吗?
借用Programming Elixir书中的一个例子,在哪种情况下,我们更喜欢第一种结构而不是第二种结构?
1. MainSupervisor
├── StashWorker
└── SubSupervisor
└──SequenceWorker
2. MainSupervisor
├── StashWorker
└── SequenceWorker
1条答案
按热度按时间um6iljoc1#
What you probably overlook is the famous “let it crash” philosophy, which makes process crashes and restarts the first-class citizen in OTP. We don’t treat process crashes as failures, but rather as an opportunity to redo it properly without the necessity to manually handle errors.
The main reason is to allow more grained control on what should have been restarted on failure. For that, we have
strategies
. Or, as @Andree restated it in comments:by organizing supervisions in hierarchies, we allow finer-grained control over how the system should respond should a subset of the system fails
Imagine the application that has a process responsible for a remote connection, and a bunch of processes, all using this resource. When the connection process crashes, it’s, in any case, being restarted by its supervisor but its
pid
changes. Meaning all the process that relied on thispid
should have been restarted as well. With:rest_for_one
strategy it’s easy out of the box.Another approach to this particular example would be to manage a connection in a process, supervised in another part of the tree, and upon connection issues manually crash the supervisor of pools using this connection to reinitialize all of them.
Even more, we might want to manually crash the process handling this connection to reinitialize it, instead of writing defensive code like
if no_conn, do: reload_config_and_restart_connection
we just let it crash and get reinitialized by the supervision tree with new proper config.Last but not least, if the supervisor does not trap exits, it would crash as well, propagating it up. That way we might reinitialize the whole branch of supervision tree without writing a line of code.