分布式Erlang -“防止重叠分区”算法是如何工作的?

cgfeq70w  于 2022-12-08  发布在  Erlang
关注(0)|答案(1)|浏览(218)

引用Erlang文档
自OTP 25起,全局默认情况下将通过主动断开报告与其他节点失去连接的节点来防止由于网络问题而导致的分区重叠。这将导致形成完全连接的分区,而不是使网络处于分区重叠的状态。
现在,我运行了一组实验,其中3个节点A、B、C形成了一个完全连接的网络(例如,A上的nodes()将评估为[B,C]),B和C之间的链路出现故障。
然后,我观察到,在某些情况下,得到的全连通网络是由A和B构成的,而在另一些情况下,它是由A和C构成的。
图形方式:

Fully connected          After fault           Scenario 1            Scenario 2     
     A                        A                     A                     A
   /   \                     / \                   /                       \
  /     \                   /   \                 /                         \
 /       \                 /     \               /                           \
B---------C               B       C             B       C             B       C

我找不到算法的规格说明,所以问题是:你能给我提供一个或多或少正式的吗?2或者如果它已经存在于文档中,你能给我指出来吗?
先谢谢你。

dm7nw8vv

dm7nw8vv1#

在对源代码进行了大量的挖掘之后,我找到了一个关于算法如何工作的非正式解释。

%% ----------------------------------------------------------------
%% Prevent Overlapping Partitions Algorithm
%% ========================================
%%
%% 1. When a node lose connection to another node it sends a
%%    {lost_connection, LostConnNode, OtherNode} message to all
%%    other nodes that it knows of.
%% 2. When a lost_connection message is received the receiver
%%    first checks if it has seen this message before. If so, it
%%    just ignores it. If it has not seen it before, it sends the
%%    message to all nodes it knows of. This in order to ensure
%%    that all connected nodes will receive this message. It then
%%    sends a {remove_connection, LostConnRecvNode} message (where
%%    LostConnRecvNode is its own node name) to OtherNode and
%%    clear all information about OtherNode so OtherNode wont be
%%    part of ReceiverNode's cluster anymore. When this information
%%    has been cleared, no lost_connection will be triggered when
%%    a nodedown message for the connection to OtherNode is
%%    received.
%% 3. When a {remove_connection, LostConnRecvNode} message is
%%    received, the receiver node takes down the connection to
%%    LostConnRecvNode and clears its information about
%%    LostConnRecvNode so it is not part of its cluster anymore.
%%    Both nodes will receive a nodedown message due to the
%%    connection being closed, but none of them will send
%%    lost_connection messages since they have cleared information
%%    about the other node.

相关问题