It is recommended to read this section alongside supervisor
in STDLIB.
A supervisor is responsible for starting, stopping, and monitoring its child processes. The basic idea of a supervisor is that it is to keep its child processes alive by restarting them when necessary.
Which child processes to start and monitor is specified by a list of child specifications. The child processes are started in the order specified by this list, and are terminated in the reverse order.
ExampleThe callback module for a supervisor starting the server from gen_server Behaviour can look as follows:
-module(ch_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(ch_sup, []).
init(_Args) ->
SupFlags = #{strategy => one_for_one, intensity => 1, period => 5},
ChildSpecs = [#{id => ch3,
start => {ch3, start_link, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [ch3]}],
{ok, {SupFlags, ChildSpecs}}.
The SupFlags
variable in the return value from init/1
represents the supervisor flags.
The ChildSpecs
variable in the return value from init/1
is a list of child specifications.
This is the type definition for the supervisor flags:
sup_flags() = #{strategy => strategy(), % optional
intensity => non_neg_integer(), % optional
period => pos_integer(), % optional
auto_shutdown => auto_shutdown()} % optional
strategy() = one_for_all
| one_for_one
| rest_for_one
| simple_one_for_one
auto_shutdown() = never
| any_significant
| all_significant
strategy
specifies the restart strategy.intensity
and period
specify the maximum restart intensity.auto_shutdown
specifies whether and when a supervisor should automatically shut itself down.The restart strategy is specified by the strategy
key in the supervisor flags map returned by the callback function init
:
SupFlags = #{strategy => Strategy, ...}
The strategy
key is optional in this map. If it is not given, it defaults to one_for_one
.
For simplicity, the diagrams shown in this section display a setup where all the depicted children are assumed to have a restart type of permanent
.
If a child process terminates, only that process is restarted.
---
title: One For One Supervision
---
flowchart TD
subgraph Legend
direction LR
t(( )) ~~~ l1[Terminated Process]
p(( )) ~~~ l2[Process Restarted by the Supervisor]
end
subgraph graph[" "]
s[Supervisor]
s --- p1((P1))
s --- p2((P2))
s --- p3((P3))
s --- pn((Pn))
end
classDef term fill:#ff8888,color:black;
classDef restarted stroke:#00aa00,stroke-width:3px;
classDef legend fill-opacity:0,stroke-width:0px;
class p2,t term;
class p2,p restarted;
class l1,l2 legend;
one_for_all
If a child process terminates, all remaining child processes are terminated. Subsequently, all child processes, including the terminated one, are restarted.
---
title: One For All Supervision
---
flowchart TD
subgraph Legend
direction LR
t(( )) ~~~ l1[Terminated Process]
st(( )) ~~~ l2[Process Terminated by the Supervisor]
p(( )) ~~~ l3[Process Restarted by the Supervisor]
l4["Note:
Processes are terminated right to left
Processes are restarted left to right"]
end
subgraph graph[" "]
s[Supervisor]
s --- p1((P1))
s --- p2((P2))
s --- p3((P3))
s --- pn((Pn))
end
classDef term fill:#ff8888,color:black;
classDef sterm fill:#ffaa00,color:black;
classDef restarted stroke:#00aa00,stroke-width:3px;
classDef legend fill-opacity:0,stroke-width:0px;
class p2,t term;
class p1,p3,pn,st sterm;
class p1,p2,p3,pn,p restarted;
class l1,l2,l3,l4 legend;
rest_for_one
If a child process terminates, the child processes after the terminated process in start order are terminated. Subsequently, the terminated child process and the remaining child processes are restarted.
---
title: Rest For One Supervision
---
flowchart TD
subgraph Legend
direction LR
t(( )) ~~~ l1[Terminated Process]
st(( )) ~~~ l2[Process Terminated by the Supervisor]
p(( )) ~~~ l3[Process Restarted by the Supervisor]
l4["Note:
Processes are terminated right to left
Processes are restarted left to right"]
end
subgraph graph[" "]
s[Supervisor]
s --- p1((P1))
s --- p2((P2))
s --- p3((P3))
s --- pn((Pn))
end
classDef term fill:#ff8888,color:black;
classDef sterm fill:#ffaa00,color:black;
classDef restarted stroke:#00aa00,stroke-width:3px;
classDef legend fill-opacity:0,stroke-width:0px;
class p2,t term;
class p3,pn,st sterm;
class p2,p3,pn,p restarted;
class l1,l2,l3,l4 legend;
simple_one_for_one
See simple-one-for-one supervisors.
Maximum Restart IntensityThe supervisors have a built-in mechanism to limit the number of restarts which can occur in a given time interval. This is specified by the two keys intensity
and period
in the supervisor flags map returned by the callback function init
:
SupFlags = #{intensity => MaxR, period => MaxT, ...}
If more than MaxR
number of restarts occur in the last MaxT
seconds, the supervisor terminates all the child processes and then itself. The termination reason for the supervisor itself in that case will be shutdown
.
When the supervisor terminates, then the next higher-level supervisor takes some action. It either restarts the terminated supervisor or terminates itself.
The intention of the restart mechanism is to prevent a situation where a process repeatedly dies for the same reason, only to be restarted again.
The keys intensity
and period
are optional in the supervisor flags map. If they are not given, they default to 1
and 5
, respectively.
The default values were chosen to be safe for most systems, even with deep supervision hierarchies, but you will probably want to tune the settings for your particular use case.
First, the intensity decides how big bursts of restarts you want to tolerate. For example, you might want to accept a burst of at most 5 or 10 attempts, even within the same second, if it results in a successful restart.
Second, you need to consider the sustained failure rate, if crashes keep happening but not often enough to make the supervisor give up. If you set intensity to 10 and set the period as low as 1, the supervisor will allow child processes to keep restarting up to 10 times per second, forever, filling your logs with crash reports until someone intervenes manually.
You should therefore set the period to be long enough that you can accept that the supervisor keeps going at that rate. For example, if an intensity value of 5 is chosen, setting the period to 30 seconds will give you at most one restart per 6 seconds for any longer period of time, which means that your logs will not fill up too quickly, and you will have a chance to observe the failures and apply a fix.
These choices depend a lot on your problem domain. If you do not have real time monitoring and ability to fix problems quickly, for example in an embedded system, you might want to accept at most one restart per minute before the supervisor should give up and escalate to the next level to try to clear the error automatically. On the other hand, if it is more important that you keep trying even at a high failure rate, you might want a sustained rate of as much as 1-2 restarts per second.
Avoiding common mistakes:
Do not forget to consider the burst rate. If you set intensity to 1 and period to 6, it gives the same sustained error rate as 5/30 or 10/60, but will not allow even 2 restart attempts in quick succession. This is probably not what you wanted.
Do not set the period to a very high value if you want to tolerate bursts. If you set intensity to 5 and period to 3600 (one hour), the supervisor will allow a short burst of 5 restarts, but then gives up if it sees another single restart almost an hour later. You probably want to regard those crashes as separate incidents, so setting the period to 5 or 10 minutes will be more reasonable.
If your application has multiple levels of supervision, do not set the restart intensities to the same values on all levels. Keep in mind that the total number of restarts (before the top level supervisor gives up and terminates the application) will be the product of the intensity values of all the supervisors above the failing child process.
For example, if the top level allows 10 restarts, and the next level also allows 10, a crashing child below that level will be restarted 100 times, which is probably excessive. Allowing at most 3 restarts for the top level supervisor might be a better choice in this case.
A supervisor can be configured to automatically shut itself down when significant children terminate.
This is useful when a supervisor represents a work unit of cooperating children, as opposed to independent workers. When the work unit has finished its work, that is, when any or all significant child processes have terminated, the supervisor should then shut down by terminating all remaining child processes in reverse start order according to the respective shutdown specifications, and then itself.
Automatic shutdown is specified by the auto_shutdown
key in the supervisor flags map returned by the callback function init
:
SupFlags = #{auto_shutdown => AutoShutdown, ...}
The auto_shutdown
key is optional in this map. If it is not given, it defaults to never
.
The automatic shutdown facility only applies when significant children terminate by themselves, not when their termination was caused by the supervisor. Specifically, neither the termination of a child as a consequence of a sibling's termination in the one_for_all
or rest_for_one
strategies nor the manual termination of a child by supervisor:terminate_child/2
will trigger an automatic shutdown.
Automatic shutdown is disabled.
In this mode, specifying significant children is not accepted. If the child specs returned from init
contain significant children, the supervisor will refuse to start. Attempts to start significant children dynamically will be rejected.
This is the default setting.
any_significantThe supervisor will automatically shut itself down when any significant child terminates, that is, when a transient significant child terminates normally or when a temporary significant child terminates normally or abnormally.
all_significantThe supervisor will automatically shut itself down when all significant children have terminated, that is, when the last active significant child terminates. The same rules as for any_significant
apply.
The automatic shutdown feature was introduced in OTP 24.0, but applications using this feature will also compile and run with older OTP versions.
However, such applications, when compiled with an OTP version that predates the appearance of the automatic shutdown feature, will leak processes because the automatic shutdowns they rely on will not happen.
It is up to implementors to take proper precautions if they expect that their applications may be compiled with older OTP versions.
WarningTop supervisors of Applications should not be configured for automatic shutdown, because when the top supervisor exits, the application terminates. If the application is permanent
, all other applications and the runtime system are terminated as well.
Supervisors configured for automatic shutdown should not be made permanent children of their respective parent supervisors, as they would be restarted immediately after having automatically shut down, only to shut down automatically again after a while, and may thus exhaust the Maximum Restart Intensity of the parent supervisor.
Child SpecificationThe type definition for a child specification is as follows:
child_spec() = #{id => child_id(), % mandatory
start => mfargs(), % mandatory
restart => restart(), % optional
significant => significant(), % optional
shutdown => shutdown(), % optional
type => worker(), % optional
modules => modules()} % optional
child_id() = term()
mfargs() = {M :: module(), F :: atom(), A :: [term()]}
modules() = [module()] | dynamic
restart() = permanent | transient | temporary
significant() = boolean()
shutdown() = brutal_kill | timeout()
worker() = worker | supervisor
id
is used to identify the child specification internally by the supervisor.
The id
key is mandatory.
Note that this identifier occasionally has been called "name". As far as possible, the terms "identifier" or "id" are now used but in order to keep backwards compatibility, some occurrences of "name" can still be found, for example in error messages.
start
defines the function call used to start the child process. It is a module-function-arguments tuple used as apply(M, F, A)
.
It is to be (or result in) a call to any of the following:
supervisor:start_link/2,3
gen_server:start_link/3,4
gen_statem:start_link/3,4
gen_event:start_link/0,1,2
supervisor
.The start
key is mandatory.
restart
defines when a terminated child process is to be restarted.
permanent
child process is always restarted.temporary
child process is never restarted (not even when the supervisor restart strategy is rest_for_one
or one_for_all
and a sibling death causes the temporary process to be terminated).transient
child process is restarted only if it terminates abnormally, that is, with an exit reason other than normal
, shutdown
, or {shutdown,Term}
.The restart
key is optional. If it is not given, the default value permanent
will be used.
significant
defines whether a child is considered significant for automatic self-shutdown of the supervisor.
It is invalid to set this option to true
for a child with restart type permanent
or in a supervisor with auto_shutdown set to never
.
shutdown
defines how a child process is to be terminated.
brutal_kill
means that the child process is unconditionally terminated using exit(Child, kill)
.exit(Child, shutdown)
and then waits for an exit signal back. If no exit signal is received within the specified time, the child process is unconditionally terminated using exit(Child, kill)
.infinity
to give the subtree enough time to shut down. It is also allowed to set it to infinity
if the child process is a worker.Setting the shutdown time to anything other than infinity
for a child of type supervisor
can cause a race condition where the child in question unlinks its own children, but fails to terminate them before it is killed.
Be careful when setting the shutdown time to infinity
when the child process is a worker. Because, in this situation, the termination of the supervision tree depends on the child process; it must be implemented in a safe way and its cleanup procedure must always return.
The shutdown
key is optional. If it is not given, and the child is of type worker
, the default value 5000
will be used; if the child is of type supervisor
, the default value infinity
will be used.
type
specifies whether the child process is a supervisor or a worker.
The type
key is optional. If it is not given, the default value worker
will be used.
modules
has to be a list consisting of a single element. The value of that element depends on the behaviour of the process:
gen_event
, the element has to be the atom dynamic
.Module
, where Module
is the name of the callback module.This information is used by the release handler during upgrades and downgrades; see Release Handling.
The modules
key is optional. If it is not given, it defaults to [M]
, where M
comes from the child's start {M,F,A}
.
Example: The child specification to start the server ch3
in the previous example look as follows:
#{id => ch3,
start => {ch3, start_link, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [ch3]}
or simplified, relying on the default values:
#{id => ch3,
start => {ch3, start_link, []},
shutdown => brutal_kill}
Example: A child specification to start the event manager from the chapter about gen_event:
#{id => error_man,
start => {gen_event, start_link, [{local, error_man}]},
modules => dynamic}
Both server and event manager are registered processes which can be expected to be always accessible. Thus they are specified to be permanent
.
ch3
does not need to do any cleaning up before termination. Thus, no shutdown time is needed, but brutal_kill
is sufficient. error_man
can need some time for the event handlers to clean up, thus the shutdown time is set to 5000 ms (which is the default value).
Example: A child specification to start another supervisor:
#{id => sup,
start => {sup, start_link, []},
restart => transient,
type => supervisor} % will cause default shutdown=>infinity
Starting a Supervisor
In the previous example, the supervisor is started by calling ch_sup:start_link()
:
start_link() ->
supervisor:start_link(ch_sup, []).
ch_sup:start_link
calls function supervisor:start_link/2
, which spawns and links to a new process, a supervisor.
ch_sup
, is the name of the callback module, that is, the module where the init
callback function is located.[]
, is a term that is passed as is to the callback function init
. Here, init
does not need any data and ignores the argument.In this case, the supervisor is not registered. Instead its pid must be used. A name can be specified by calling supervisor:start_link({local, Name}, Module, Args)
or supervisor:start_link({global, Name}, Module, Args)
.
The new supervisor process calls the callback function ch_sup:init([])
. init
has to return {ok, {SupFlags, ChildSpecs}}
:
init(_Args) ->
SupFlags = #{},
ChildSpecs = [#{id => ch3,
start => {ch3, start_link, []},
shutdown => brutal_kill}],
{ok, {SupFlags, ChildSpecs}}.
Subsequently, the supervisor starts its child processes according to the child specifications in the start specification. In this case there is a single child process, called ch3
.
supervisor:start_link/3
is synchronous. It does not return until all child processes have been started.
In addition to the static supervision tree as defined by the child specifications, dynamic child processes can be added to an existing supervisor by calling supervisor:start_child(Sup, ChildSpec)
.
Sup
is the pid, or name, of the supervisor. ChildSpec
is a child specification.
Child processes added using start_child/2
behave in the same way as the other child processes, with one important exception: if a supervisor dies and is recreated, then all child processes that were dynamically added to the supervisor are lost.
Any child process, static or dynamic, can be stopped in accordance with the shutdown specification by calling supervisor:terminate_child(Sup, Id)
.
Stopping a significant child of a supervisor configured for automatic shutdown will not trigger an automatic shutdown.
The child specification for a stopped child process is deleted by calling supervisor:delete_child(Sup, Id)
.
Sup
is the pid, or name, of the supervisor. Id
is the value associated with the id
key in the child specification.
As with dynamically added child processes, the effects of deleting a static child process are lost if the supervisor itself restarts.
Simplified one_for_one SupervisorsA supervisor with restart strategy simple_one_for_one
is a simplified one_for_one
supervisor, where all child processes are dynamically added instances of the same process.
The following is an example of a callback module for a simple_one_for_one
supervisor:
-module(simple_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(simple_sup, []).
init(_Args) ->
SupFlags = #{strategy => simple_one_for_one,
intensity => 0,
period => 1},
ChildSpecs = [#{id => call,
start => {call, start_link, []},
shutdown => brutal_kill}],
{ok, {SupFlags, ChildSpecs}}.
When started, the supervisor does not start any child processes. Instead, all child processes need to be added dynamically by calling supervisor:start_child(Sup, List)
.
Sup
is the pid, or name, of the supervisor. List
is an arbitrary list of terms, which are added to the list of arguments specified in the child specification. If the start function is specified as {M, F, A}
, the child process is started by calling apply(M, F, A++List)
.
For example, adding a child to simple_sup
above:
supervisor:start_child(Pid, [id1])
The result is that the child process is started by calling apply(call, start_link, []++[id1])
, or actually:
call:start_link(id1)
A child under a simple_one_for_one
supervisor can be terminated with the following:
supervisor:terminate_child(Sup, Pid)
Sup
is the pid, or name, of the supervisor and Pid
is the pid of the child.
Because a simple_one_for_one
supervisor can have many children, it shuts them all down asynchronously. This means that the children will do their cleanup in parallel and therefore the order in which they are stopped is not defined.
Starting, restarting, and manually terminating children are synchronous operations which are executed in the context of the supervisor process. This means that the supervisor process will be blocked while it is performing any of those operations. Child processes are responsible for keeping their start and shutdown phases as short as possible.
StoppingSince the supervisor is part of a supervision tree, it is automatically terminated by its supervisor. When asked to shut down, a supervisor terminates all child processes in reverse start order according to the respective shutdown specifications before terminating itself.
If the supervisor is configured for automatic shutdown on termination of any or all significant children, it will shut down itself when any or the last active significant child terminates, respectively. The shutdown itself follows the same procedure as described above, that is, the supervisor terminates all remaining child processes in reverse start order before terminating itself.
Manual stopping versus Automatic ShutdownFor several reasons, a supervisor should not be stopped manually via supervisor:terminate_child/2
from a child located in its own tree.
supervisor:terminate_child/2
is a blocking call that will only return after the parent supervisor has finished the shutdown of the supervisor that should be stopped. Unless the call is made from a spawned process, this will result in a deadlock, as the supervisor waits for the child to exit as part of its shutdown procedure, whereas the child waits for the supervisor to shut down. If the child is trapping exits, this deadlock will last until the shutdown timeout for the child expires.Instead, it is generally a better approach to rely on Automatic Shutdown.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3