CUDA wrapper

Goal

Many deep-learning researchers are not aware of how many resources they are consuming very well. But all will go wrong when there's no free GPU memory. It's necessary to limit GPU memory usage per container to share the GPU across the containers.

We study the most popular machine learning and deep-learning framework Tensorflow, and find that it use some CUDA driver API such as cuMemGetInfo_v2 and cuMemAlloc_v2 to get the total/free memory on the node and use it. If we modify the return value from these API calls, we can make tensorflow believes that there's not that much memory on the device or there's not that much free memory on the device.

For other kind of usage of GPU memory, we also use a hash table from CUdeviceptr to bytesize to record every usage of the container by wrapping cuMemAlloc function.

Approach

This work wraps the NVIDIA driver API by dlsym and captures every alloc and free usage by record the size. It will reads ENV to set the quota dynamically. When the app hits the quota, we will return a CUDA_ERROR_OUT_OF_MEMORY error to the last alloc function. Although each deep-learning framework handles the error in different ways, it works on tensorflow-based deep-learning jobs.

We record every alloc function like cuMemAlloc with both their pointer, the CUdeviceptr and size. When there's a cuMemfree, we use the given pointer to find out how much memory it has alloced. We use pthread_mutex_lock to avoid race condition.

Usage

Compile

gcc -I /usr/local/cuda/include/ cuda-wrapper2.c -fPIC -shared -ldl -lcuda -o ./release/libcuda2.so

Use

permanent link to libcuda2.so

LD_PRELOAD=/path/to/libcuda2.so python test.py

LD_PRELOAD=/path/to/libcuda2.so WRAPPER_MAX_MEMORY=4217928960 python test.py

LD_PRELOAD=/cuda-wrapper/release/libcuda2.so.9.2 WRAPPER_MAX_MEMORY=4217928960 python mnist.py
LD_PRELOAD=/cuda-wrapper/release/libcuda3.so.9.2 WRAPPER_MAX_MEMORY=4217928960 python cifar10-pytorch.py

deploy

cuda-wrapper-deploy.md

TODO

1、对于多进程的支持
desigin：多进程数据同步，进程锁，通过互斥量或文件锁解决

Related Project

pod-gpu-metrics-exporter k8s-device-plugin

Optional Project

gpushare-device-plugin gpushare-scheduler-extender

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
core		core
example		example
hack-example		hack-example
lib		lib
log/strace-log		log/strace-log
mps		mps
old		old
test		test
tool		tool
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
Dockerfile.volcano.pytorch		Dockerfile.volcano.pytorch
Dockerfile.volcano.tf		Dockerfile.volcano.tf
README.md		README.md
build.sh		build.sh
cuda-wrapper-deploy.md		cuda-wrapper-deploy.md
cuda-wrapper-dev-v1.1.c		cuda-wrapper-dev-v1.1.c
cuda-wrapper-dev-v1.2.c		cuda-wrapper-dev-v1.2.c
cuda-wrapper-prod-v1.0.c		cuda-wrapper-prod-v1.0.c
cuda-wrapper-prod-v1.1.c		cuda-wrapper-prod-v1.1.c
cuda-wrapper-prod-v1.2.c		cuda-wrapper-prod-v1.2.c
cuda-wrapper-prod-volcano.c		cuda-wrapper-prod-volcano.c
single.py		single.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA wrapper

Goal

Approach

Usage

Compile

Use

deploy

TODO

Related Project

Optional Project

About

Releases

Packages

Languages

king-jingxiang/cuda-wrapper

Folders and files

Latest commit

History

Repository files navigation

CUDA wrapper

Goal

Approach

Usage

Compile

Use

deploy

TODO

Related Project

Optional Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages