erlang 从rpc调用到其他节点错误?[duplicate]

lx0bsm1f  于 2022-12-08  发布在  Erlang
关注(0)|答案(2)|浏览(175)

This question already has an answer here:

a bug in erlang node after an error at another node (1 answer)
Closed 2 years ago.
i created 2 erlang nodes in the same Windows machine with two cmd windows:'unclient@MYPC' and 'unserveur@MYPC' , the server code is very simple :

-module(serveur).
-export([start/0,recever/0,inverse/1]). 
%%%%
start() ->
Pid=spawn(serveur,recever,[]), 
register(ownServer, Pid). 
%%%%
recever() -> 
receive 
{From, X} ->From ! {ownServer,1/X} end. 
%%%%
inverse(X) -> 
ownServer!{self(),1/X}
 receive
{ownServer, Reply} ->Reply end.

so at the server node cmd i start this module

c(serveur). 
serveur:start()

at the client node i used the rpc call function to try the connection and all is fine, for example i try :

rpc:call(unserveur@MYPC,serveur,inverse,[2]).

and i received 0.5 now i use an atom to send it to the server for causing an error

rpc:call(unserveur@MYPC,serveur,inverse,[a]).

at the client cmd node : i waited for the response from the server but i didn't receive anything and there is no more the client prompt :

unclient@MYPC 1>

i can write but the shell does not execute my instructions anymore and there is not any prompt.
I searched about and i found that rpc:call trigger the rex server at the destination node to spawns and monitors a process who execute the (M,F,A) is that true ? if yes why i had this bug on the client node ?

ogsagwnx

ogsagwnx1#

yeah, Finally i resolved this bug and the most important i understood what happens : when i called rpc:call('unserveur@MYPC',serveur,inverse,[a]) the client node process(the main shell process) send this message to the serveur node process(the main shell process) , the serveur node process send this message to the rex server of the serveur node, the rex server spawns and monitors a new process who will run apply(serveur, inverse, [a]) , this new process run the function and the serveur process who run recever() will crash and no reply to the new process who will wait forever and all the processes behind him will wait forever including the main shell client node process and that explains the desapearing of the prompt and writing normally. This is exactely what Pascal said so you have answered my question. i resolved this problem by adding

process_flag(trap_exit, true),
link(whereis(ownServer)),

at the head of the inverse function and i add

{'EXIT', _, _} -> start(),
                         sorry;

at the head of the receive session of the inverse function and when i called the rpc call with an atom i can see sorry at the client node shell and the server returns automatically to work again so when i called for the second time rpc call inverse with an integer i had the right answer. i see that i coded a lot for this call so may be the rpc call is not a good choice and replacing it with spawning processes manually will be better, what do you think ?

vfhzx4xs

vfhzx4xs2#

On unclient side, rpc:call(Node,serveur,inverse,[a]) builds a message for the Node rpc server and wait a response.
on unserveur side, the RPC server receives the message and start a process to call the function serveur:inverse(a) .
the inverse function send a message to the serveur:recever() which execute the instruction 1/a and crashes.
Therefore, the reply message cannot be sent back to inverse. The inverse function will wait the answer forever, as well as the rpc:call on unclient node since you did not define any timeout.
You could define a time out in the inverse function:

inverse(X) -> 
    ownServer!{self(),1/X}
    receive
        {ownServer, Reply} -> {ok,Reply} 
    after 100 -> % define a timout of 100 ms
        {error,timeout}
    end.

In addition, it is a good idea to use a timeout in the remote procedure call using rpc:call(Node, Module, Function, Args, Timeout)
In a previous post you were trying to get a response using the trap_exit flag. There were several mistakes there. First as explained by @legoscia, in case of error, the exit message is sent to any linked process. The second is that you were expecting that your process will continue to execute its code. On error, the process stops immediately and the system issues the exit message which kills or will be received by all the linked process depending on the flag trap_exit value.
I wrote a version that works as you expected:

-module(serveur).
-export([start/0,recever/0,inverse/1]). 
-export([do_op/3]).

%%%%
start() -> 
    Pid=spawn(serveur,recever,[]), 
    register(ownServer, Pid). 

%%%%
recever() -> 
    process_flag(trap_exit,true), 
    receive
        stop -> stopped;
        {'EXIT',_,_} -> recever(); % necessary to throw the {EXIT,_,normal} messages
        {From, Op, X} -> 
            spawn_link(serveur, do_op, [self(),Op,X]),
            receive 
                Reply -> From ! {ownServer, Reply}  
            end,
            recever()
    end. 

do_op(From, inverse, X) ->
    From ! {result,1/X}.

%%%%
inverse(X) -> 
    ownServer!{self(), inverse, X}, 
    receive 
        {ownServer, Reply} ->Reply 
    end.

In fact this code works more or less like a catch statement, which is exactly what you wanted to do, and it is what you should use there. In erlang, it is a very good idea to let processes crash when something unexpected happens, specially using the Erlang OTP mechanisms, but when the error is probable (user interface for example) I think it is more adapted to use catch or try/catch at the right level.

[edit]

It is cool that you want to fully understand the behavior of the system. To answer to your question, I am sorry but I never use rpc, and I don't know in which cases it is well suited.
For this case I use the global library that allow the communication between the nodes of a cluster (see erlang distribution from learnyousomeerlang a very good site to learn and understand erlang).
As you say the way you solved the issue use a lot of code (and I am not sure that it works in local now). In my opinion, it is because the flag trap_exit is not meant for this usage but for the OTP supervisor trees and all the otp behaviors (see What is OTP from the same site ). In your case, you should use a catch statement and add timeouts to handle the possible errors. Here is a code which handle bad arguments and overloaded server. I have added a few interfaces to simulate the different use cases.

-module(serveur).

-export([start/0,recever/0,inverse/1,lock/0,unlock/0,stop/0,wait10s/0]). 

%%%%
start() ->
    Pid=spawn(serveur,recever,[]), 
    register(ownServer, Pid). 

%%%%
recever() -> 
    receive 
        {From, X} ->
            From ! {ownServer,(catch 1/X)},
            recever();
        waitForUnlock ->
            ok = wait_for_unlock(),
            recever();
        stop -> server_stopped
    end.

%%%%
inverse(X) -> 
    ownServer ! {self(),X},
    receive
        {ownServer, {'EXIT',{Reply,_}}} ->
            {error,Reply};      
        {ownServer, Reply} ->
            {ok,Reply}
    after 100 ->
        {error,timeout}
    end.

%%%% use this interface to simulate an overloaded server 
lock() ->
    ownServer ! waitForUnlock.

%%%% use this interface to unlock the server
unlock() ->
    ownServer ! unlock.

%%%% use this interface to simulate a very long answer from server
wait10s() ->
    timer:sleep(10000),
    iAmAwake.

%%%% use this interface to stop the server
stop() ->
    ownServer ! stop.

%%%% private function used to hang the server
wait_for_unlock() ->
    receive
        unlock -> ok
    end.

The test on local node

(unserveur@MyPc)1> c(serveur).
{ok,serveur}
(unserveur@MyPc)2> serveur:start().
true
(unserveur@MyPc)3> serveur:inverse(2).
{ok,0.5}
(unserveur@MyPc)4> serveur:inverse(a).
{error,badarith}
(unserveur@MyPc)5> serveur:lock().   
waitForUnlock
(unserveur@MyPc)6> serveur:inverse(2).
{error,timeout}
(unserveur@MyPc)7> serveur:inverse(a).
{error,timeout}
(unserveur@MyPc)8> serveur:unlock().  
unlock
(unserveur@MyPc)9> serveur:inverse(2).
{ok,0.5}
(unserveur@MyPc)10> serveur:wait10s(). 
iAmAwake
(unserveur@MyPc)11> serveur:stop().   
stop
(unserveur@MyPc)12> serveur:inverse(2).
** exception error: bad argument
     in function  serveur:inverse/1 (serveur.erl, line 60)
(unserveur@MyPc)13>

and (almost) the same test from the client node

(unclient@MyPc)1> net_adm:ping(unserveur@MyPc).
pong
(unclient@MyPc)2> rpc:call(unserveur@MyPc,serveur,start,[]).
true
(unclient@MyPc)3> rpc:call(unserveur@MyPc,serveur,inverse,[2]).
{ok,0.5}
(unclient@MyPc)4> rpc:call(unserveur@MyPc,serveur,inverse,[a]).
{error,badarith}
(unclient@MyPc)5> rpc:call(unserveur@MyPc,serveur,lock,[]).    
waitForUnlock
(unclient@MyPc)6> rpc:call(unserveur@MyPc,serveur,inverse,[2]).
{error,timeout}
(unclient@MyPc)7> rpc:call(unserveur@MyPc,serveur,inverse,[a]).
{error,timeout}
(unclient@MyPc)8> rpc:call(unserveur@MyPc,serveur,unlock,[]).  
unlock
(unclient@MyPc)9> rpc:call(unserveur@MyPc,serveur,inverse,[2]).
{ok,0.5}
(unclient@MyPc)10> rpc:call(unserveur@MyPc,serveur,wait10s,[]). 
iAmAwake
(unclient@MyPc)11> rpc:call(unserveur@MyPc,serveur,wait10s,[],1000).
{badrpc,timeout}
(unclient@MyPc)12> rpc:call(unserveur@MyPc,serveur,stop,[]).        
stop
(unclient@MyPc)13> rpc:call(unserveur@MyPc,serveur,inverse,[2]).    
{badrpc,{'EXIT',{badarg,[{serveur,inverse,1,
                                  [{file,"serveur.erl"},{line,60}]},
                         {rpc,'-handle_call_call/6-fun-0-',5,
                              [{file,"rpc.erl"},{line,197}]}]}}}
(unclient@MyPc)14>

相关问题