You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Slurm uses su -l to acquire the identity of a job submitter when executing a batch on the compute node. By default, su -l opens a PAM session, which in turn makes systemd to create a user session. This is bad for two reasons:
‧ It is an overhead. The session includes user services, such as gpg-agent etc. There is no reason to do that.
‧ It potentially interferes with Slurm's cgroup management (no evidence that this in fact happens), as both Slurm and systemd isolate the user processes in cgroups, essentially nesting them.
The analysis is incorrect. The whole PAM session and systemd session thing is done to "obtain user's clean environment", by running a script on the target node, essentially boiling down to either su - <username> env or su -c env <username> (configurable with --get-user-env[=...], q. v.). This is not the whole truth: --get-user-env is implicit in some cases.
Another option, --export, controls which environment variables are inherited from sbatch by the batch script. The documentation on it is also ambiguous, but the behavior has not changed in the most recent Slurm release (20.11.1-1), so it's apparently by design. The documentation is:
correct that --get-user-env is respected in the case of the --export switch omitted or given as --export=ALL or --export, which are synonyms, and all three are equivalent semantically. Omitting --get-user-env causes all batch environment be set from that of the sbatch command. This is the only case when omitting --get-user-env does not result in running the su thing.
incorrect about the case --export=NONE, quoth “--get-user-env will be ignored.” This is not true: if --get-user-env[=...] is not specified, the behavior is same as if --get-user-env was specified, deferring to the default timeout and method of obtaining the “clean” environment.
silent about other cases: --export=[ALL,]{<variable>,...}{<variable>=<value>,...} The actual behavior is identical to the above case.
All this is evident from the code fragment in sbatch. Also notable is the behavior of the --export-file=... option, which has a lower precedence w.r.t the environment setting, and is hardly practically useful (trumped either by the current complete environment in the default case of --export=ALL, or the "clean" environment obtained on the node with the su trick.
It almost reads that the condition opt.get_user_env_time >= 0 is a bug, that should have been opt.get_user_env_time > 0, which at least would true the false documentation statements, but it has been now duplicated in the scrontab command and in the REST API, both new for version 20, which implies the behavior is intended and the manual is incorrect and incomplete.
(Parenthetically, why this single bit of information is sent over the wire not as a bit flag bit in the RPC, as is normally done in many other cases, but rather by stuffing a magic string into the environment of the job, is beyond my understanding. Although not in a performance-critical path, this method increases the RPC message size without any reason: if need be, the environment variable could have been added on the slurmctld side with an identical end result).
Options:
Do nothing, do not specify --export, and pass the whole environment to the job over the wire. This is suboptimal, as the whole environment size could be on the order of tens of kilobytes. PAM can be configured on compute nodes to avoid creating the systemd session: su -l uses the PAM id su-l, and it is configured by default in Debian in /etc/pam.d/su-l.
Patch the setting of the variable out of sbatch's code. Since we are already patching it to increase polling frequency, and it is very unlikely to change, this is probably the preferred solution.
Remove all environment variables except those the user want to export explicitly from the environment in the Kaldi slurm.pl driver before invoking sbatch. This a more general solution, suitable for the users who want to use slurm.pl driver in a generic Slurm environments, and who do not compile their own sbatch. However, this optimization likely loses its value and hardly worth the effort, given the new hardcoded "steady-state delay of 32s between queries," given that some Kaldi jobs take mere seconds to complete, and only a few minutes long on average.
The text was updated successfully, but these errors were encountered:
Slurm usessu -l
to acquire the identity of a job submitter when executing a batch on the compute node. By default,su -l
opens a PAM session, which in turn makes systemd to create a user session. This is bad for two reasons:‧ It is an overhead. The session includes user services, such as gpg-agent etc. There is no reason to do that.
‧ It potentially interferes with Slurm's cgroup management (no evidence that this in fact happens), as both Slurm and systemd isolate the user processes in cgroups, essentially nesting them.
The analysis is incorrect. The whole PAM session and systemd session thing is done to "obtain user's clean environment", by running a script on the target node, essentially boiling down to either
su - <username> env
orsu -c env <username>
(configurable with--get-user-env[=...]
, q. v.). This is not the whole truth:--get-user-env
is implicit in some cases.Another option,
--export
, controls which environment variables are inherited fromsbatch
by the batch script. The documentation on it is also ambiguous, but the behavior has not changed in the most recent Slurm release (20.11.1-1), so it's apparently by design. The documentation is:--get-user-env
is respected in the case of the--export
switch omitted or given as--export=ALL
or--export
, which are synonyms, and all three are equivalent semantically. Omitting--get-user-env
causes all batch environment be set from that of the sbatch command. This is the only case when omitting--get-user-env
does not result in running thesu
thing.--export=NONE
, quoth “--get-user-env
will be ignored.” This is not true: if--get-user-env[=...]
is not specified, the behavior is same as if--get-user-env
was specified, deferring to the default timeout and method of obtaining the “clean” environment.--export=[ALL,]{<variable>,...}{<variable>=<value>,...}
The actual behavior is identical to the above case.All this is evident from the code fragment in sbatch. Also notable is the behavior of the
--export-file=...
option, which has a lower precedence w.r.t the environment setting, and is hardly practically useful (trumped either by the current complete environment in the default case of--export=ALL
, or the "clean" environment obtained on the node with thesu
trick.It almost reads that the condition
opt.get_user_env_time >= 0
is a bug, that should have beenopt.get_user_env_time > 0
, which at least would true the false documentation statements, but it has been now duplicated in thescrontab
command and in the REST API, both new for version 20, which implies the behavior is intended and the manual is incorrect and incomplete.The environment variable
SLURM_GET_USER_ENV
is set to 1 and is not possible to override, and is checked as the sole condition that triggers the whole "obtaining a clean environment" conundrum in slurmctld (the actualsu ... env
invocation happens in theenv_array_user_default
function in the file env.c).(Parenthetically, why this single bit of information is sent over the wire not as a bit flag bit in the RPC, as is normally done in many other cases, but rather by stuffing a magic string into the environment of the job, is beyond my understanding. Although not in a performance-critical path, this method increases the RPC message size without any reason: if need be, the environment variable could have been added on the slurmctld side with an identical end result).
Options:
--export
, and pass the whole environment to the job over the wire. This is suboptimal, as the whole environment size could be on the order of tens of kilobytes. PAM can be configured on compute nodes to avoid creating the systemd session:su -l
uses the PAM idsu-l
, and it is configured by default in Debian in/etc/pam.d/su-l
.sbatch
. This a more general solution, suitable for the users who want to use slurm.pl driver in a generic Slurm environments, and who do not compile their own sbatch. However, this optimization likely loses its value and hardly worth the effort, given the new hardcoded "steady-state delay of 32s between queries," given that some Kaldi jobs take mere seconds to complete, and only a few minutes long on average.The text was updated successfully, but these errors were encountered: