GitHub - b0xtch/llama3-cake: Distributed LLama3 inference.

llama3-cake is a pure Rust implementation of the llama3 LLM distributed inference based on Candle.

This is experimental code.

The idea is to shard the transformer blocks to multiple devices in order to be able to run the inference on models that wouldn't normally fit in the GPU memory of a single device. Inferences over contiguous transformer blocks on the same worker are batched in order to minimize latency due to data transfer.

Run a worker node:

cake-cli --model /path/to/Meta-Llama-3-8B --mode worker --name worker0 --topology topology.yml --address 0.0.0.0:10128

Run a master node:

cake-cli --model /path/to/Meta-Llama-3-8B --topology topology.yml

Where topology.yaml determines which layers are served by whom:

worker0:
  host: 'linux-server.local:10128'
  description: 'NVIDIA Titan X Pascal (12GB)'
  layers:
    - 'model.layers.0'
    - 'model.layers.1'
    - 'model.layers.2'
    - 'model.layers.3'
    - 'model.layers.4'
    - 'model.layers.5'
    - 'model.layers.6'
    - 'model.layers.7'
    - 'model.layers.8'
    - 'model.layers.9'
    - 'model.layers.10'
    - 'model.layers.11'
    - 'model.layers.12'
    - 'model.layers.13'
    - 'model.layers.14'
    - 'model.layers.15'

worker1:
  host: 'apple-server.local:10128'
  description: 'Apple M1 Max (64GB)'
  layers:
    - 'model.layers.16'
    - 'model.layers.17'
    - 'model.layers.18'
    - 'model.layers.19'
    - 'model.layers.20'
    - 'model.layers.21'
    - 'model.layers.22'
    - 'model.layers.23'
    - 'model.layers.24'
    - 'model.layers.25'
    - 'model.layers.26'
    - 'model.layers.27'
    - 'model.layers.28'
    - 'model.layers.29'
    - 'model.layers.30'
    - 'model.layers.31'

License

Released under the GPL 3 license. To see the licenses of the project dependencies, install cargo license with cargo install cargo-license and then run cargo license.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.cargo		.cargo
cake-cli		cake-cli
cake-core		cake-core
cake-ios-worker-app		cake-ios-worker-app
cake-ios		cake-ios
cake-reduce-safetensors		cake-reduce-safetensors
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build-ios.sh		build-ios.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

License

About

Releases

Packages

Contributors 2

Languages

License

b0xtch/llama3-cake

Folders and files

Latest commit

History

Repository files navigation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages