erlang 为什么我的主管在start_child上使用undef失败?

fv2wmkja  于 2022-12-08  发布在  Erlang
关注(0)|答案(2)|浏览(189)

I am trying to run a simple_one_for_one supervisor where the supervisor and worker are placed in separate modules and i keep getting the below error when using supervisor:start_child :

>A=sup:start_link().
>B=supervisor:start_child(A,[]).
{error,{'EXIT',{undef,[{worker,start_link,[],[]},
                       {supervisor,do_start_child_i,3,
                                   [{file,"supervisor.erl"},{line,379}]},
                       {supervisor,handle_call,3,
                                   [{file,"supervisor.erl"},{line,404}]},
                       {gen_server,try_handle_call,4,
                                   [{file,"gen_server.erl"},{line,661}]},
                       {gen_server,handle_msg,6,
                                   [{file,"gen_server.erl"},{line,690}]},
                       {proc_lib,init_p_do_apply,3,
                                 [{file,"proc_lib.erl"},{line,249}]}]}}}

Supervisor

-module(sup).
-behaviour(supervisor).
-compile([export_all]).

start_link()->
    {ok,Pid}=supervisor:start_link(?MODULE,[]),
    io:format("sugi pl"),
    Pid.


init(_Args) ->
     RestartStrategy = {simple_one_for_one, 10, 60},
     ChildSpec = {
                  worker, 
                  {worker, start_link, []},  //tried adding here a parameter in the A
                  permanent,
                  brutal_kill, 
                  worker,
                  [sup]
                },
    {ok, {RestartStrategy,[ChildSpec]}}.

Worker

-module(worker).
-compile(export_all).

start_link([Arg])->  //tried both [Arg] and Arg
    {ok,Pid}=spawn_link(?MODULE,init,[]),
    Pid.

init([Time])->
    receive->
        {From,Msg}->From !{Time,Msg},
                     init(Time)
    end.

Command

>c("[somepath]/sup.erl"),A=sup:start_link(),B=supervisor:start_child(A,[]).

I can see clearly that the problem is when trying to add a child.Somehow the init function is not called properly but i do not see why.(Badmatch)
I have tried adding a parameter in the A of the MFA of the ChildSpec to no avail

yh2wf1be

yh2wf1be1#

There are a number of problems with your code.

  • The return value of sup:start_link/0 is wrong; you're returning Pid instead of {ok, Pid} .
  • While it's not really incorrect, you're using supervisor:start_link/2 , which doesn't register the supervisor name. Having the name is handy, so it's better to use supervisor:start_link/3 :
supervisor:start_link({local, ?MODULE}, ?MODULE, []),

This associates the module name with its process id, allowing you to use the process name in your shell commands rather than using pid variables.

  • You have an io:format/2 call in sup:start_link/0 , presumably for debugging. A much better way of debugging is to call sys:trace(sup, true) from your shell once you've started the sup supervisor. You can turn it off from the shell as well, by specifying false instead of true as the second argument.

Fixing the above issues leaves use with the following definition of sup:start_link/0 :

start_link() ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

Let's recompile, start the supervisor, then compile worker (after fixing its syntax errors), and then trace the supervisor as we try to start a child:

1> c(sup).
sup.erl:3: Warning: export_all flag enabled - all functions will be exported
{ok,sup}
2> sup:start_link().
{ok,<0.94.0>}
3> sys:trace(sup, true).
ok
4> c(worker).
worker.erl:2: Warning: export_all flag enabled - all functions will be exported
worker.erl:5: Warning: variable 'Arg' is unused
{ok,worker}
5> supervisor:start_child(sup, []).
*DBG* sup got call {start_child,[]} from <0.81.0>
*DBG* sup sent {error,
                   {'EXIT',
                       {undef,
                           [{worker,start_link,[],[]},
                            {supervisor,do_start_child_i,3,
                                [{file,"supervisor.erl"},{line,379}]},
...

This abbreviated trace output shows that sup died when it tried to start a worker by calling worker:start_link/0 (the [] indicates there are zero arguments). The child specification tells sup to start it this way, since it contains

{worker, start_link, []}

and we started the child via supervisor:start_child(sup, []) . For a simple_one_for_one child, the arguments sent to its start function are composed of the argument list from the child specification combined with the arguments specified in the call to supervisor:start_child/2 ; in this case, that's the equivalent of [] ++ [] which is the same as [] , indicating no arguments. Let's change the worker:start_link/1 function to worker:start_link/0 instead, recompile it, and try again:

6> c(worker).
worker.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,worker}
7> supervisor:start_child(sup, []).
*DBG* sup got call {start_child,[]} from <0.81.0>
*DBG* sup sent {error,
                   {'EXIT',
                       {{badmatch,<0.94.0>},
                        [{worker,start_link,0,[{file,"worker.erl"},{line,6}]},
...

This time, the abbreviated output shows a badmatch . This is because spawn_link/3 returns a pid, but worker:start_link/0 is expecting it to return {ok, Pid} . Let's fix that, and also fix the return value to be {ok, Pid} instead of just Pid :

start_link()->
    Pid = spawn_link(?MODULE,init,[]),
    {ok, Pid}.

Then let's recompile and try again:

8> c(worker).
worker.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,worker}
9> supervisor:start_child(sup, []).
*DBG* sup got call {start_child,[]} from <0.81.0>
*DBG* sup sent {ok,<0.106.0>} to <0.81.0>, new state {state,
                                                      {local,sup},
                                                      simple_one_for_one,
                                                      {[worker],
                                                       #{worker =>
                                                          {child,undefined,
                                                           worker,
                                                           {worker,
                                                            start_link,[]},
                                                           permanent,
                                                           brutal_kill,worker,
                                                           [sup]}}},
                                                      {maps,
                                                       #{<0.106.0> => []}},
                                                      10,60,[],0,sup,[]}
*DBG* sup got {'EXIT',<0.106.0>,{undef,[{worker,init,[],[]}]}}

OK, this time the supervisor actually started the child, but it died immediately because it's trying to call worker:init/0 but only worker:init/1 is defined. Because the child dies immediately, the supervisor tries repeatedly to start it, based on its restart strategy:

RestartStrategy = {simple_one_for_one, 10, 60},

Because this is a hard error, it fails instantly every time, and the supervisor dies after 10 failed restarts in 60 seconds or less, just like it's supposed to:

=SUPERVISOR REPORT==== 20-Apr-2020::10:43:43.557307 ===
    supervisor: {local,sup}
    errorContext: shutdown
    reason: reached_max_restart_intensity
    offender: [{pid,<0.117.0>},
               {id,worker},
               {mfargs,{worker,start_link,[]}},
               {restart_type,permanent},
               {shutdown,brutal_kill},
               {child_type,worker}]
** exception error: shutdown

It appears from your code that you're attempting to pass some sort of Time argument to worker:init/1 , so let's change start_link/0 to pass a timestamp:

start_link()->
    Pid = spawn_link(?MODULE,init,[os:timestamp()]),
    {ok, Pid}.

Let's also fix init/1 to take the argument directly, rather than in a list:

init(Time) ->
    receive
        {From,Msg} ->
            From ! {Time,Msg},
            init(Time)
end.

Let's restart the supervisor, recompile worker , and try again:

10> sup:start_link().
{ok,<0.119.0>}
11> sys:trace(sup, true).
ok
12> c(worker).
worker.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,worker}
13> {ok, Child} = supervisor:start_child(sup, []).
*DBG* sup got call {start_child,[]} from <0.118.0>
*DBG* sup sent {ok,<0.127.0>} to <0.118.0>, new state {state,
                                                       {local,sup},
                                                       simple_one_for_one,
                                                       {[worker],
                                                        #{worker =>
                                                           {child,undefined,
                                                            worker,
                                                            {worker,
                                                             start_link,[]},
                                                            permanent,
                                                            brutal_kill,
                                                            worker,
                                                            [sup]}}},
                                                       {maps,
                                                        #{<0.127.0> => []}},
                                                       10,60,[],0,sup,[]}
{ok,<0.127.0>}

It looks like it worked. Let's see if the supervisor agrees, by asking it how many children it has:

14> supervisor:count_children(sup).
...
[{specs,1},{active,1},{supervisors,0},{workers,1}]

It has one worker, just as we expected. Finally, let's send a message to the worker and see if it responds as expected:

15> Child ! {self(), "are you there?"}.
{<0.118.0>,"are you there?"}
16> flush().
Shell got {{1587,394860,258120},"are you there?"}
ok

Now it all seems to work.
One final fix is to change the modules in your child specification; rather than [sup] it should be the module itself, [worker] . With that change, your revised working modules are below. You might also want to reconsider whether you want to use permanent for the children, since this is a simple_one_for_one supervisor; transient is likely a better choice, but I've left it as originally written. Consider reviewing the Supervisor Behavior documentation for more information.

Supervisor

-module(sup).
-behaviour(supervisor).
-compile([export_all]).

start_link()->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

init(_Args) ->
     RestartStrategy = {simple_one_for_one, 10, 60},
     ChildSpec = {
                  worker,
                  {worker, start_link, []},
                  permanent,
                  brutal_kill,
                  worker,
                  [worker]
                },
    {ok, {RestartStrategy,[ChildSpec]}}.

Worker

-module(worker).
-compile(export_all).

start_link()->
    Pid = spawn_link(?MODULE,init,[os:timestamp()]),
    {ok, Pid}.

init(Time) ->
    receive
        {From,Msg} ->
            From ! {Time,Msg},
            init(Time)
    end.
3gtaxfhh

3gtaxfhh2#

In addition to the excellent answer already posted, I would like add a few points to explain the behavior observed in the question.
The start value in childspec is a tuple {Mod, Fun, ArgsList} and child process can be spawned by calling supervisor:start_child(Supervisor, List) . The supervisor start's the child process by calling erlang:apply(Mod, Fun, List++ArgsList) .
In this case, the start value is {worker, start_link, []} and child was spawned by calling supervisor:start_child(A, []) . The supervisor attempted to call erlang:apply(worker, start_link, []) . This means the supervisor was expecting worker:start_link/0 to be defined in the worker module. But the worker module defined worker:start_link/1 . Hence the undef error.
Given the function definition

start_link([Arg]) ->
  %% do stuff

The best was to spawn the child process is

  • let start value in child spec be {worker, start_link, []}
  • call supervisor:start_child(A, [[Value]])

It maybe simpler to define the function as

start_link(Arg) ->
  %% so stuff

and call supervisor:start_child(A, [Value])

相关问题