tenzing-core

The core library of the Tenzing project. tenzing-core provides facilities for interacting with CUDA + MPI programs as sequential decision problems. This facilitates optimizing CUDA + MPI programs using sequential decision strategies.

Two solvers are available

tenzing-mcts: Uses Monte-Carlo tree search
tenzing-dfs: Uses depth-first search

Build

On a supported platform:

source load-env.sh

In any case:

mkdir build && cd build
cmake .. -DCMAKE_CUDA_ARCHITECTURES=70
make

Tests

Tests are split into two locations:

unit tests may be defined in source files
tests with a more "itegration" flavor are in test/

To run tests, you can do

make test
ctest
tenzing-all
- -ltc: list tests cases
- -tc="a,b": only run test cases named a and b

This creates some CMake complexity, as the test functions present in static libraries will not be linked into the resulting test binary. Therefore, we use a CMake object library to generate the test binary, and then generate a static library from the object library. object library properties do not get propagated properly / at all, so we have to redefine what needs to be linked and included, etc

tenzing-core has been tested on the following platforms:

NERSC perlmutter: g++ 10.3 / nvcc 11.4 / Cray MPICH 8.1.13
Sandia vortex (similar to ORNL Lassen and OLCF Summit): g++ 7.5.0 / nvcc 10.1 / IBM Spectrum MPI
Sandia ascicgpu

Documentation

Visit the API documentation in docs/api.md
ascicgpu system documentation in docs/ascicgpu.md
vortex system documentation in docs/vortex.md
perlmutter ssytem documentation in docs/perlmutter.md

Roadmap

python bindings (with pybind11)

Contributing

See CONTRIBUTING.md for contribution guidelines.

Design Issues

enable / disable CUDA / MPI
- isolate Ser/Des
- isolate platform assignments
a BoundOp cannot produce the std::shared_ptr<OpBase> of it's unbound self, only OpBase
- can't ask an std::shared_ptr<BoundOp> for std::shared_ptr<OpBase>
- maybe std::shared_from_this?
special status of Start and End is a bit clumsy.
- maybe there should be a StartEnd : BoundOp that they both are instead of separate classes
  - in the algs they're probably treated the same (always synced, etc)
Platform is a clumsy abstraction, since it also tracks resources that are only valid for a single order
- e.g., each order requires a certain number of events, which can be resued for the next order

Copyright and License

Please see NOTICE.md for copyright and license information.

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
.github/workflows		.github/workflows
.vscode		.vscode
cmake/modules		cmake/modules
docs		docs
include/tenzing		include/tenzing
postprocess		postprocess
scripts/perlmutter		scripts/perlmutter
src		src
tenzing-dfs		tenzing-dfs
tenzing-mcts		tenzing-mcts
test		test
thirdparty		thirdparty
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING		CONTRIBUTING
NOTICE.md		NOTICE.md
README.md		README.md
load-env.sh		load-env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tenzing-core

Build

Tests

Documentation

Roadmap

Contributing

Design Issues

Copyright and License

About

Releases

Packages

Languages

sandialabs/tenzing

Folders and files

Latest commit

History

Repository files navigation

tenzing-core

Build

Tests

Documentation

Roadmap

Contributing

Design Issues

Copyright and License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages