GitHub - elftausend/custos: A minimal OpenCL, CUDA, Vulkan and host CPU array manipulation engine / framework.

A minimal, extensible OpenCL, Vulkan (with WGSL), CUDA, NNAPI (Android) and host CPU array manipulation engine / framework written in Rust. This crate provides tools for executing custom array and automatic differentiation operations.

Installation

The latest published version is of 0.7.x (April 14th, 2023). A lot has changed since then. 0.7.x can be found in the custos-0.7 branch.

Add "custos" as a dependency:

[dependencies]
custos = "0.7.0"

# to disable the default features (cpu, cuda, opencl, static-api, blas, macro) and use an own set of features:
#custos = {version = "0.7.0", default-features=false, features=["opencl", "blas"]}

Available features:

To make specific devices useable, activate the corresponding features:

Feature	Device	Notes
cpu	`CPU`	Uses heap allocations.
stack	`Stack`	Useable in `no-std` environments as it uses stack allocated `Buffer`s without requiring `alloc` or `std`. Practically only supports the `Base` module.
opencl	`OpenCL`	Automatically maps unified memory.
cuda	`CUDA`
vulkan	`Vulkan`	Shaders are written in WGSL. + unified memory
nnapi	`NnapiDevice`	`Lazy` module is mandatory.
untyped	`Untyped`	Removes the need of `Buffer`'s generic parameters. (CPU and CUDA only for now)

custos ships combineable modules. Different selected modules result in different behaviour when executing operations. New modules can be added in user code.

use custos::prelude::*; 
// Autograd, Base = Modules
let device = CPU::<Autograd<Base>>::new();

To make specific modules useable for building a device, activate the corresponding features:

Feature	Module	Description
on by default	`Base`	Default behaviour.
autograd	`Autograd`	Enables running automatic differentiation.
cached	`Cached`	Reuses allocations on demand.
fork	`Fork`	Decides whether the CPU or GPU is faster for an operation. It then uses the faster device for following computations. (unified memory devices)
lazy	`Lazy`	Lazy execution of operations and lazy intermediate allocations. Enables support for CUDA graphs.
graph	`Graph`	Adds a memory usage optimizeable graph and fusing of unary operations in combination with `Lazy`.

Usage of these modules when writing custom operations: modules.md and modules_usage.rs.

If an operations wants to be affected by a module, specific custos code must be called in that operation.

Remaining features:

Feature	Description
static-api	Enables the creation of `Buffer`s without providing a device.
std	Adds standard library support.
no-std	For no std environments, activates `stack` feature.
macro	Reexport of custos-macro
blas	Adds gemm functions of the system's (selected) BLAS library.
half	Adds support for half precision floats.
serde	Adds serialization and deserialization support.
json	Adds convenience functions for serialization and deserialization to and from json.

Examples

Implement an operation for CPU:

If you want to implement your own operations for all compute devices, consider looking here: implement_operations.rs or "modules_usage.rs"
or to see it at a larger scale, look here custos-math (outdated, requires custos 0.7) or here sliced (for automatic diff examples).

This operation is only affected by the Cached module (and partially Autograd).

use custos::prelude::*;
use std::ops::{Deref, Mul};

pub trait MulBuf<T: Unit, S: Shape = (), D: Device = Self>: Sized + Device {
    fn mul(&self, lhs: &Buffer<T, D, S>, rhs: &Buffer<T, D, S>) -> Buffer<T, Self, S>;
}

impl<Mods, T, S, D> MulBuf<T, S, D> for CPU<Mods>
where
    Mods: Retrieve<Self, T, S>,
    T: Unit + Mul<Output = T> + Copy + 'static,
    S: Shape,
    D: Device,
    D::Base<T, S>: Deref<Target = [T]>,
{
    fn mul(&self, lhs: &Buffer<T, D, S>, rhs: &Buffer<T, D, S>) -> Buffer<T, Self, S> {
        let mut out = self.retrieve(lhs.len(), (lhs, rhs)).unwrap(); // unwrap or return error (update trait)

        for ((lhs, rhs), out) in lhs.iter().zip(rhs.iter()).zip(&mut out) {
            *out = *lhs * *rhs;
        }

        out
    }
}

A lot more usage examples can be found in the tests and examples folders. (Or in the unary operation file, custos-math and sliced)

Name		Name	Last commit message	Last commit date
Latest commit History 1,499 Commits
.github/workflows		.github/workflows
android-nnapi-ci		android-nnapi-ci
assets		assets
benches		benches
custos-bench		custos-bench
examples		examples
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
modules.md		modules.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Available features:

Examples

About

Releases

Contributors 3

Languages

License

elftausend/custos

Folders and files

Latest commit

History

Repository files navigation

Installation

Available features:

Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 3

Languages