分布式Erlang -“防止重叠分区”算法是如何工作的?

cgfeq70w  于 2022-12-08  发布在  Erlang
关注(0)|答案(1)|浏览(231)

引用Erlang文档
自OTP 25起,全局默认情况下将通过主动断开报告与其他节点失去连接的节点来防止由于网络问题而导致的分区重叠。这将导致形成完全连接的分区,而不是使网络处于分区重叠的状态。
现在,我运行了一组实验,其中3个节点A、B、C形成了一个完全连接的网络(例如,A上的nodes()将评估为[B,C]),B和C之间的链路出现故障。
然后,我观察到,在某些情况下,得到的全连通网络是由A和B构成的,而在另一些情况下,它是由A和C构成的。
图形方式:

  1. Fully connected After fault Scenario 1 Scenario 2
  2. A A A A
  3. / \ / \ / \
  4. / \ / \ / \
  5. / \ / \ / \
  6. B---------C B C B C B C

我找不到算法的规格说明,所以问题是:你能给我提供一个或多或少正式的吗?2或者如果它已经存在于文档中,你能给我指出来吗?
先谢谢你。

dm7nw8vv

dm7nw8vv1#

在对源代码进行了大量的挖掘之后,我找到了一个关于算法如何工作的非正式解释。

  1. %% ----------------------------------------------------------------
  2. %% Prevent Overlapping Partitions Algorithm
  3. %% ========================================
  4. %%
  5. %% 1. When a node lose connection to another node it sends a
  6. %% {lost_connection, LostConnNode, OtherNode} message to all
  7. %% other nodes that it knows of.
  8. %% 2. When a lost_connection message is received the receiver
  9. %% first checks if it has seen this message before. If so, it
  10. %% just ignores it. If it has not seen it before, it sends the
  11. %% message to all nodes it knows of. This in order to ensure
  12. %% that all connected nodes will receive this message. It then
  13. %% sends a {remove_connection, LostConnRecvNode} message (where
  14. %% LostConnRecvNode is its own node name) to OtherNode and
  15. %% clear all information about OtherNode so OtherNode wont be
  16. %% part of ReceiverNode's cluster anymore. When this information
  17. %% has been cleared, no lost_connection will be triggered when
  18. %% a nodedown message for the connection to OtherNode is
  19. %% received.
  20. %% 3. When a {remove_connection, LostConnRecvNode} message is
  21. %% received, the receiver node takes down the connection to
  22. %% LostConnRecvNode and clears its information about
  23. %% LostConnRecvNode so it is not part of its cluster anymore.
  24. %% Both nodes will receive a nodedown message due to the
  25. %% connection being closed, but none of them will send
  26. %% lost_connection messages since they have cleared information
  27. %% about the other node.
展开查看全部

相关问题