This question already has an answer here:
a bug in erlang node after an error at another node (1 answer)
Closed 2 years ago.
i created 2 erlang nodes in the same Windows machine with two cmd windows:'unclient@MYPC' and 'unserveur@MYPC' , the server code is very simple :
-module(serveur).
-export([start/0,recever/0,inverse/1]).
%%%%
start() ->
Pid=spawn(serveur,recever,[]),
register(ownServer, Pid).
%%%%
recever() ->
receive
{From, X} ->From ! {ownServer,1/X} end.
%%%%
inverse(X) ->
ownServer!{self(),1/X}
receive
{ownServer, Reply} ->Reply end.
so at the server node cmd i start this module
c(serveur).
serveur:start()
at the client node i used the rpc call function to try the connection and all is fine, for example i try :
rpc:call(unserveur@MYPC,serveur,inverse,[2]).
and i received 0.5 now i use an atom to send it to the server for causing an error
rpc:call(unserveur@MYPC,serveur,inverse,[a]).
at the client cmd node : i waited for the response from the server but i didn't receive anything and there is no more the client prompt :
unclient@MYPC 1>
i can write but the shell does not execute my instructions anymore and there is not any prompt.
I searched about and i found that rpc:call
trigger the rex server at the destination node to spawns and monitors a process who execute the (M,F,A) is that true ? if yes why i had this bug on the client node ?
2条答案
按热度按时间ogsagwnx1#
yeah, Finally i resolved this bug and the most important i understood what happens : when i called
rpc:call('unserveur@MYPC',serveur,inverse,[a])
the client node process(the main shell process) send this message to the serveur node process(the main shell process) , the serveur node process send this message to the rex server of the serveur node, the rex server spawns and monitors a new process who will runapply(serveur, inverse, [a])
, this new process run the function and the serveur process who runrecever()
will crash and no reply to the new process who will wait forever and all the processes behind him will wait forever including the main shell client node process and that explains the desapearing of the prompt and writing normally. This is exactely what Pascal said so you have answered my question. i resolved this problem by addingat the head of the
inverse
function and i addat the head of the
receive
session of theinverse
function and when i called the rpc call with an atom i can see sorry at the client node shell and the server returns automatically to work again so when i called for the second time rpc call inverse with an integer i had the right answer. i see that i coded a lot for this call so may be the rpc call is not a good choice and replacing it with spawning processes manually will be better, what do you think ?vfhzx4xs2#
On unclient side,
rpc:call(Node,serveur,inverse,[a])
builds a message for the Node rpc server and wait a response.on unserveur side, the RPC server receives the message and start a process to call the function
serveur:inverse(a)
.the
inverse
function send a message to theserveur:recever()
which execute the instruction1/a
and crashes.Therefore, the reply message cannot be sent back to inverse. The inverse function will wait the answer forever, as well as the rpc:call on unclient node since you did not define any timeout.
You could define a time out in the inverse function:
In addition, it is a good idea to use a timeout in the remote procedure call using
rpc:call(Node, Module, Function, Args, Timeout)
In a previous post you were trying to get a response using the trap_exit flag. There were several mistakes there. First as explained by @legoscia, in case of error, the exit message is sent to any linked process. The second is that you were expecting that your process will continue to execute its code. On error, the process stops immediately and the system issues the exit message which kills or will be received by all the linked process depending on the flag trap_exit value.
I wrote a version that works as you expected:
In fact this code works more or less like a
catch
statement, which is exactly what you wanted to do, and it is what you should use there. In erlang, it is a very good idea to let processes crash when something unexpected happens, specially using the Erlang OTP mechanisms, but when the error is probable (user interface for example) I think it is more adapted to use catch or try/catch at the right level.[edit]
It is cool that you want to fully understand the behavior of the system. To answer to your question, I am sorry but I never use rpc, and I don't know in which cases it is well suited.
For this case I use the global library that allow the communication between the nodes of a cluster (see erlang distribution from learnyousomeerlang a very good site to learn and understand erlang).
As you say the way you solved the issue use a lot of code (and I am not sure that it works in local now). In my opinion, it is because the flag trap_exit is not meant for this usage but for the OTP supervisor trees and all the otp behaviors (see What is OTP from the same site ). In your case, you should use a catch statement and add timeouts to handle the possible errors. Here is a code which handle bad arguments and overloaded server. I have added a few interfaces to simulate the different use cases.
The test on local node
and (almost) the same test from the client node