Skip to content

Python 3.6+ toolbox for submitting jobs to Slurm

License

Notifications You must be signed in to change notification settings

ychiat35/submitit

 
 

Repository files navigation

CircleCI Code style: black Pypi conda-forge

Submit it!

What is submitit?

Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps submission and provide access to results, logs and more. Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Submitit allows to switch seamlessly between executing on Slurm or locally.

An example is worth a thousand words: performing an addition

From inside an environment with submitit installed:

import submitit

def add(a, b):
    return a + b

# executor is the submission interface (logs are dumped in the folder)
executor = submitit.AutoExecutor(folder="log_test")
# set timeout in min, and partition for running the job
executor.update_parameters(timeout_min=1, slurm_partition="dev")
job = executor.submit(add, 5, 7)  # will compute add(5, 7)
print(job.job_id)  # ID of your job

output = job.result()  # waits for completion and returns output
assert output == 12  # 5 + 7 = 12...  your addition was computed in the cluster

The Job class also provides tools for reading the log files (job.stdout() and job.stderr()).

If what you want to run is a command, turn it into a Python function using submitit.helpers.CommandFunction, then submit it. By default stdout is silenced in CommandFunction, but it can be unsilenced with verbose=True.

Find more examples here!!!

Submitit is a Python 3.6+ toolbox for submitting jobs to Slurm. It aims at running python function from python code.

Install

Quick install, in a virtualenv/conda environment where pip is installed (check which pip):

  • stable release:
    pip install submitit
    
  • stable release using conda:
    conda install -c conda-forge submitit
    
  • main branch:
    pip install git+https://github.com/facebookincubator/submitit@main#egg=submitit
    

You can try running the MNIST example to check that everything is working as expected (requires sklearn).

Documentation

See the following pages for more detailled information:

  • Examples: for a bunch of examples dealing with errors, concurrency, multi-tasking etc...
  • Structure and main objects: to get a better understanding of how submitit works, which files are created for each job, and the main objects you will interact with.
  • Checkpointing: to understand how you can configure your job to get checkpointed when preempted and/or timed-out.
  • Tips and caveats: for a bunch of information that can be handy when working with submitit.
  • Hyperparameter search with nevergrad: basic example of nevergrad usage and how it interfaces with submitit.

Goals

The aim of this Python3 package is to be able to launch jobs on Slurm painlessly from inside Python, using the same submission and job patterns than the standard library package concurrent.futures:

Here are a few benefits of using this lightweight package:

  • submit any function, even lambda and script-defined functions.
  • raises an error with stack trace if the job failed.
  • requeue preempted jobs (Slurm only)
  • swap between submitit executor and one of concurrent.futures executors in a line, so that it is easy to run your code either on slurm, o