Skip to content

Latest commit

 

History

History
 
 

deploy

DAPHNE Deployment

Overview

This directory deploy/ can be used to deploy the Daphne System. With these scripts one can:

  • build the Daphne System (using build.sh),
  • package,
  • deliver and install to a deployment platform (e.g. HPC) and
  • utilize the resources of multiple machines/nodes.
  • It can also be used to just try out DAPHNE on a single machine.

Once deployed, Daphne system consists of multiple DistributedWorkers and a single coordinator who is responsible for handling a distributed execution.

Where to Start

  • deployDistributed.sh can be used to manually deploy using only SSH. When executed without parameters, it prints out the help message.
  • deploy-distributed-on-slurm.sh can be used for environments with Slurm tool. When executed without parameters, it prints out the help message.

Deployment Scheme

DAPHNE Deployment Scheme encompasses the following:

  • A Compilation node (where the Daphne System will be compiled)
  • Deployment Platform (e.g. an HPC with SLURM support)
    • Login Node (or, other type of access)
      • HPC Task Submission interface (e.g. SLURM)
    • Compute Node(s)
      • DAPHNE coordinator
      • DAPHNE DistributedWorkers
                    DAPHNE Deployment Scheme

+--------------------------------------------------------------------------------------+
|                                                                                      |
|   +------------------+                                                               |
|   | Compilation node |                                                               |
|   |                  |                                                               |
|   +------------------+                                                               |
|       |                                                                              |
|       |                                                                              |
|       | (SSH connection)                                                             |
|       |                                                                              |
|       |                                                                              |
| +----------------------------------------------------------------------------------+ |
| | Deployment Platform (e.g. an HPC with SLURM support)                             | |
| |                                                                                  | |
| |  +------------------------------+                                                | |
| |  | Access/Submission/Login Node |                                                | |
| |  |                              |                                                | |
| |  +------------------------------+                                                | |
| |      |                                                                           | |
| |      |                                                                           | |
| |      |   Network connections, e.g. Infiniband, to e.g. SLURM interfaces,         | |
| |      |   used also for communications between MT and DWs.                        | |
| |      |-------------------------------------------------------------------+       | |
| |      |                                         |                         |       | |
| |  +--------------------------+     +--------------------------+     +-----------+ | |
| |  | Node 1                   |     | Node 2                   |     | Node n    | | |
| |  | - Resources              | ... |                          | ... |           | | |
| |  |   - CPU/GPU/FPGA         |     | CPU/GPU/FPGAs            |     | Resources | | |
| |  | - Running Tasks          |     |   (e.g. 128+)            |     |           | | |
| |  |   - `coordinator`        |     | {DistributedWorker (DW)} |     | DWs       | | |
| |  |   - (optional: more DWs) |     |   (e.g. DWs 1..128)      |     |           | | |
| |  +--------------------------+     +--------------------------+     +-----------+ | |
| |                                                                                  | |
| +----------------------------------------------------------------------------------+ |
|                                                                                      |
+--------------------------------------------------------------------------------------+

Deployment scripts

This directory includes a set of bash scripts providing support for:

  • packaging/virtualization of the deployment (installation) package,
  • containerized packaging,
  • virtualized installation,
  • managed deployment,
  • deployment of the ˙daphne˙ executable,
  • starting and managing Daphne processes within containerized environments (schedule and execute remotely SLURM tasks), and
  • stopping and cleaning of a deployment.

List of Files in this Directory

  1. This short README file to explain directory structure and point to more documentation at Deploy.
  2. A script that builds the "daphne.sif" singularity image from the Docker image daphneeu/daphne-dev
  3. deploy-distributed-on-slurm script allows the user to deploy DAPHNE with SLURM.
  4. deployDistributed script builds and sends DAPHNE to remote machines manually with SSH (no tools like Slurm needed).
  5. example-time.daphne Daphne example script which prints out the running time of a simple operation.
  6. The Singularity image configuration file.

More Documentation

  1. Documentation about deployment, including tutorial-like explanation examples about how to package, distributively deploy, manage, and execute workloads using DAPHNE.
  2. Getting started guide
  3. Bulding the Daphne System