This EEP introduces a way of automatically terminating supervisors based on the termination of specifically marked significant children.
This document is based on the discussion in OTP-PR 4521.
Children under a supervisor often represent a work unit, that means, a group of cooperating processes, as opposed to just a single process. Such work unit supervisors (called group supervisors in the context of this document) are themselves typically hosted by a simple_one_for_one supervisor, via which they are started as needed.
At the time of this writing, however, there is no good, canonical way of stopping such group supervisors once the work unit they represent has finished it’s work and the respective child processes have terminated, meaning the group supervisors will hang around, idle forever unless stopped manually one way or another.
This has been addressed in applications in a variety of ways, none of which can be called truly good, straightforward, or canonical:
supervisor:terminate_child/2
. As this is a
blocking call, a process has to be spawned for it. Also, this will cause
the top supervisor to be blocked until the group supervisor has been shut
down, so it will not accept other requests until then.Both of the above approaches suffer from the fact that the children responsible for the shutdown have to know things about their surroundings, namely…:
This may be tackled by having a dedicated overseer child that watches the other children and acts according to their behavior. However, this requires considerable boilerplate code for tasks that would be better suited in the supervisor. Also, there is the problem that the overseer process must keep the list of children it watches up to date should any of them be restarted, either by enabling the children to register with it on start (for which they in turn must know the overseer process’ pid), or asking the supervisor for it.
Another approach that is often used is to make the children responsible for the shutdown of the group supervisor permanent and the supervisor’s restart intensity to 0. This has the downside that the child will not be restarted but cause the supervisor to shut down if it exits abnormally but could be restarted. Another downside to this approach is that it produces error messages (crash reports), even if the shutdown is intended.
Last but not least, some people have taken the approach to clone the OTP supervisor and customize it to their needs, for reasons outlined here and others.
This EEP provides a means to alleviate the problems outlined in the
motivation by introducing a way to mark specific children as significant
via a new child spec flag, and a way to configure supervisors to shut down
automatically depending on the exit of significant children via a new
supervisor flag.
In order to keep backwards compatibility, the new flags will only be usable in the map forms of child specs and supervisor flags, and for the same reason the default values for the new flags are chosen such that, in their absence, the supervisor behaves the same as it does to date.
The new child spec flag is named significant
with possible values true
and
false
, with false
being the default.
The new supervisor flag is named auto_shutdown
with possible values never
,
any_significant
and all_significant
, with never
being the default.
With the supervisor auto_shutdown
flag set to never
, the child spec flag
significant
is not allowed to be true
. The never
value and the restriction
on the significant
value is intended as a safety means to defend against
unintended automatic shutdowns, for example by the exit of a significant child
which was added later via supervisor:start_child/2
. As the spec for such a
child would not be present in the supervisor:init/1
callback code but
somewhere else, debugging such unexplained supervisor shutdowns might be
difficult.
Otherwise, the following rules apply when a significant child exits on its own:
A transient
child will be restarted (not cause a supervisor shutdown)
if it exits abnormally. If it exits normally…
auto_shutdown
flag is any_significant
, the supervisor
will shut downauto_shutdown
flag is all_significant
, the supervisor
will shut down if the child was the last active significant childA temporary
child will never be restarted. If it exits normally or
abnormally, the same rules as for transient
children apply, in regard to
the supervisor auto_shutdown
flag.
If the restart type is permanent
, the significant
flag is not allowed to
be true
, as this combination does not make sense.
To be clear, the above rules only apply when significant children exit
by themselves, that is, not when being terminated manually via
supervisor:terminate_child/2
, not when other non-significant children exit,
and not when being terminated as a consequence of a sibling’s death in the
one_for_all
or rest_for_one
strategies.
The approach proposed here could also be used to the effect of “shutdown when
empty” by marking all children as significant
and setting the supervisor
auto_shutdown
flag to all_significant
.
It is worth mentioning that the simple_one_for_one
strategy poses a special
case, as it can have only a single child spec that applies to all children.
That means that either all children are significant ones, or none is.
Using temporary significant children in one_for_all
and rest_for_one
supervisors may lead to an edge case scenario in which an intended automatic
shutdown will not happen. Temporary children will not be restarted, not even
when their termination was caused by a sibling’s death. On the other hand,
the automic shutdown of a supervisor is not triggered when a significant
child is terminated as a consequence of a sibling’s death. Thus, a temporary
significant child intended to automatically shut down it’s supervisor will
be lost if it is terminated as a consequence of a sibling’s death.
The changes proposed in this document introduce no incompatible changes, as the new child spec and supervisor flags are optional and default to values that result in the current behavior. Also, all the current workarounds outlined in the Motivation will still work.
Although the proposed changes are backwards compatible, applications using this enhancement may not be compatible when compiled with previous OTP versions unless proper care is taken. Such an application compiled with older OTP versions will leak processes, as the automatic supervisor shutdowns it relies on to remove unused parts of it’s supervision tree will not happen. Taking care of this issue is at the discretion of implementors if they expect an application which uses the significant child behavior to be compiled with an OTP version that predates it’s appearance.
A reference implementation which will be updated to reflect the state of this document can be found in OTP-PR 4638.
This document has been placed in the public domain.