erlang net_adm:ping失败非常奇怪

6ljaweal  于 2022-12-08  发布在  Erlang
关注(0)|答案(1)|浏览(191)

Dears,
I am getting an issue regards to Erlang cluster. After a long time my cluster working, one day, I can't make any connection more to a specific node (e.g.SickNode@X.X.X.X) in the cluster, net_adm:ping(SickNode@X.X.X.X) returns a pang answer. Even using:

erlang -name abc@X.X.X.X -setcookie MYCOOKIE -remsh SickNode@X.X.X.X

return a failure result too.
The strange is, the SickNode@X.X.X.X is working well to other nodes in the cluster. The problem just has happened when a new node joining to the cluster and ping to SickNode.
There isn't any firewall here because all nodes are working well within the cluster. Is there anybody has got this bad situation? Erlang is not stable for cluster using?
PS: I am using Erlang/OTP 20 with Centos 6.8
Many Thanks!!!

kkih6yb8

kkih6yb81#

Not a straight up answer, but a theory and a way to reproduce your issue. It's complicated because it involves multiple nodes, but let's see if you can follow me.
TL;DR: SickNode@X.X.X.X changed its cookie after it was connected to the cluster.
So, this is what I did… First, on a terminal I started node1 with cookie x

$ erl -name node1 -setcookie x
(node1@my.computer)1>

Then, on another terminal I started node2 with cookie x , connected it to node1 and changed its cookie to y

$ erl -name node2 -setcookie x
(node2@my.computer)1> net_adm:ping('node1@my.computer').
pong
(node2@my.computer)2> erlang:set_cookie(node(), 'y').
true
(node2@my.computer)3>

Then, in yet another terminal I started node3 with cookie x and pinged node1 (which resulted in a connection attempt to node2 as well, as you will see below) and then explicitely tried to connect to node2

$ erl -name node3 -setcookie x
(node3@my.computer)1> net_adm:ping('node1@my.computer').
pong
(node3@my.computer)2>
=WARNING REPORT==== 21-Nov-2018::15:09:07 ===
global: 'node3@my.computer' failed to connect to 'node2@my.computer'

=ERROR REPORT==== 21-Nov-2018::15:09:26 ===
** Connection attempt from disallowed node 'node2@my.computer' **
(node3@my.computer)2> net_adm:ping('node2@FERNANDO-BENAVIDES.Conyfero').
pang

What happened so far? Well, since node1 's cookie was x and node3 's cookie was x as well, they could connect. node2 was still connected to node1 but, since the cookie there was y , node3 could not connect to it.
Erlang tries to establish a fully connected mesh of nodes, so when you connect to one of them, it automatically tries to connect you to all the others.
But I wanted to be thorough so I pinged node2 from node3 and, as expected I got a pang . Also, these messages popped up on node2 :

(node2@my.computer)3>
=ERROR REPORT==== 21-Nov-2018::15:09:07 ===
** Connection attempt from disallowed node 'node3@my.computer' **

=WARNING REPORT==== 21-Nov-2018::15:09:07 ===
global: 'node2@my.computer' failed to connect to 'node3@my.computer'

And, of course, when I tried to ping node3 from node2

(node2@my.computer)3> net_adm:ping('node3@my.computer').
pang

But… if I try to ping node1

(node2@my.computer)4> net_adm:ping('node1@my.computer').
pong

That's because they're already connected and Erlang only validates the sharing of the cookie on the initial handshake.
Finally, if I try to ping nodes from node1 , I get the expected results…

(node1@my.computer)1> net_adm:ping('node2@my.computer').
pong
(node1@my.computer)2> net_adm:ping('node3@my.computer').
pong
(node1@my.computer)3>

Hope this helps.

相关问题