Skip to content

NerlPlanner

Ohad Adi edited this page Jul 2, 2024 · 19 revisions

NerlPlanner

NerlPlanner is a GUI tool of Nerlnet framework that assists with generating configuration json files required by Nerlnet.
To run Nerlplanner execute the script: ./NerlnetPlanner.sh located at repo's main folder.
Nerlplanner is currently supported only on Ubuntu machines with Python 3 and Virtualenv installed.
There are 3 types of configurations files that configure a Nerlnet distribution ML experiment:

1. Distributed Configuration JSON File

Distributed Configuration File (dc_<name>.json), describes the layout of a Nerlnet distributed machine learning Cluster. It consists of

  • Worker models definition (multiple kinds of models are supported),
  • Devices communication properties (IPv4 and Port)
  • Entities of Nerlnet (Client, Router, Source) and their allocations to devices.

The first step of creating a DC file is generating worker models. Using the Nerlplanner worker model generating dialog.

Click on create/edit worker .json and this dialog menu will appear. Follow the steps: 1.1 Worker Model Definition Json File. 1.2 Import model to DC file.

1.1 Worker Model Definition Json File

The worker dialog menu assists with generating worker models json files.
A json model can also be imported to this dialog for editing and saved as a new model.

1.1.1 File Section:

  • Load json: A worker configuration json to load former network model.
  • Select Json File Output Directory.
  • Name of worker network model json file.

1.1.2 Model Definition

This section is responsible for the definition of model.

1.1.2.1 Model type

Model type allows user to select between several pre-defined models. A custom neural network is the NN type where user should config all layers of the network. Pre-defined projects add layers before and after hidden layer to achieve the desired functionality (E.g., classification, regression or text prediction network types).

1.1.2.2 Layer Sizes

Declare the number of neurons of each layer. Each layer must have a size definition. Sizes of layers are separated by ",". There is a simple layer size which is an integer positive value that defines the number of neurons for a 1D dimension layer. In addition, there is a complex layer size string which declares a multi-dimensional layers, E.g., CNN layers. Example of a CNN which is a mix of complex and simple layer sizes representation:
First layer complex size of CNN: "128x128k3x3s2x2x1p1x1".
Second layer size: "4096".
Third layer size: "128".
Last layer size: "4".
The input string to layer sizes should be: "128x128k3x3s2x2x1p1x1,4096,128,4"
Example of a simple DNN layer sizes list:
32,16,4,2,1.
Example of a simple Autoencoder NN list:
32,16,4,16,32.

1.1.2 Optimizer Definitions

Parameters of the the optimization process:

  • Learning Rate (float)
  • Epochs (int)
  • Optimizer Type and Optimizer Args
  • Loss Method

1.1.3 Distributed System Configurations

  • Infra Type, currently only OpenNN is supported.
  • Distributed System Type: none is independent workers.
    Federated architectures can be selected from the list.
  • Distributed System Token: A token that defines a cluster that consists of workers that communicates with one another. E.g., federated learning: parameter server and workers communicates and token is the handshake between server and its workers.
  • Distributed System Arguments: Custom arguments to pass for distributed ML cluster depends on system type.
    TODO add link to page that describes distributed system types.

1.2 Import model to DC file

Use the browse button to select the file generated by the worker model definition dialog.
Then give the worker a name and click on add.

A worker can be duplicated easily by selecting an existing model from the list box of workers. Once a worker is selected click on load, fill a new and unique name fo the worker and click on add. The model graph can be viewed using the "show worker model" button.

1.3 Default Settings and Special Entities Parameters

Complete settings and special entities parameters and click on save button of each section. Types: Frequency(int), BatchSize(int), Port(int). The port is of the machine that the special entity is intended to run on. The entity is bound to the machine in the "Devices" section.

1.4 Add Entities

In this step entities are added through the section of "Entities".
In this section there is a right pane of 3 list boxes that accumulate entities by their type.
There are 3 types of entities in Nerlnet: Client, Source and Router.
This is a figure of the "Entities" section in Nerlplanner:

1.4.1 Client

The client entity is a communication layer that hosts multiple workers (from 1.1).
A client is generated filling the name and port and clicking on add.
Then workers can be bound to a client as follows:

  • Select the client from the right pane clients list box.
  • Click on the left pane load, to load the current client.
  • Choose available workers from the drop-down list and click on add.
  • Client's workers will appear in workers list box.
    Worker's list is updated by client's workers each time that a client is loaded.

1.4.2 Source

Source is an entity that streams data toward the worker for training or prediction, depends on current phase of the Nerlnet cluster.
Currently, only a CSV source type is supported.
Source supports several policies of sending batches to its target workers:

  • Round Robin: a batch is sent to a worker chosen by RR method.
  • Random: a batch is sent to a randomly chosen worker.
  • Casting: a batch is sent to all workers.

1.4.3 Router

Router is an entity that connects entities together in Nerlnet.
The router can help user to form communication graphs layouts that influence the path that data flows through until it reaches the worker.
Routers collect statistics of messages and batches that are routed.

1.4.4 Devices

Device hosts entities. It is the bound between OS resources to entities.
Each device should run a single instance of NerlnetApp.
NerlnetApp is started using the script ./NerlnetRun.
A device can be VM, container or a physical machine. Device is an isolated Erlang environment on top OS and a dedicated network interface of IPv4.
New device fields should be filled in Nerlplanner:

  • A valid IPv4 of computer that hosts NerlnetApp should be given.
  • Name of device.
  • List of entities that are hosted on the device (user should be aware of compute capabilities of the device).

1.5 Graph and Experiment

In this step experiments and communication maps are added through the section of "Graph and Experiment". In this section, you will find two buttons: Generate Communication Map, Generate Experiment Flow.

1.5.1 Experiment Flow JSON File

The Experiment Flow JSON File (exp_.json) manages the entire process of conducting experiments using Nerlnet. It defines data to sensors, and the components that are part of the experiment (Training/Prediction).

  • Click on Generate Experiment Flow button to generate an Experiment Flow JSON file.

This is figure of the "Generate Experiment Flow Json File" window in Nerlplanner:

1.5.1.1 Dataset Settings

In this step dataset settings are defined through the section of "Dataset Settings"

  • Select the CSV file path by clicking on the Browse button.
  • Complete dataset settings to specify the number of labels, features, and header names.
1.5.1.2 Phase

In this step Phases are added through the section of "Phase". In this section, there is a pane featuring three list boxes:

  • Experiment Phases: Displays all the phases of the experiment.
  • Source Pieces per Phase: Shows a list of source pieces hosted within each phase.
  • All Source Pieces: Lists all available source pieces across the experiment.

These list boxes provide an organized view of experiment phases and their associated data sources.

This is figure of the "Phase" section in Nerlpanner:

In the subsection "Add Experiment Phase". An experiment phase is generated filling the experiment phase name and expermient phase type and clicking on add.

  • To select an existing experiment phase, choose one from the 'Experiments phases' pane list box, then click 'Select' in the "Add Experiment Phase" subsection.
  • To connect a source piece with an experiment phase, choose a source piece from 'All Source Pieces' pane list box and click the "Add Source Piece" button.

In the subsection "Source Piece". A source piece is an object that references a range within a dataset CSV. one source piece is generated for each source entity.

  • To create a source piece, enter the source piece name, starting sample, number of batches, and select the list of workers associated with the source piece, then click 'Add'.
  • To select an existing source piece, choose one from the 'All Source Pieces' pane list box, then click 'Select' in the source piece subsection.
1.5.1.3 File
  • Load Json: Load a previously saved experiment flow configuration json.
  • Select Json File Output Dierctory.
  • Name of experiment flow json file.

Noa and Ohad - TODO continue the explanations as I did about fields (each one of them).