CuPBoP: Cuda for Parallelized and Broad-range Processors

Introduction

CuPBoP is a framework which support executing unmodified CUDA source code on non-NVIDIA devices. Currently, CuPBoP support serveral CPU backends, including x86, AArch64, and RISC-V. Supporting Vortex (a RISC-V GPU) is working in progress.

Install

Prerequisites

Linux system
LLVM 14.0.1
CUDA Toolkit

Although CuPBoP does not require NVIDIA GPUs, it needs CUDA to compile the source programs to NVVM/LLVM IRs. CUDA toolkit can be built on machines without NVIDIA GPUs. For building CUDA toolkit, please refer to https://developer.nvidia.com/cuda-downloads.

Installation

Clone from github

git clone --recursive https://github.com/cupbop/CuPBoP
cd CuPBoP
export CuPBoP_PATH=`pwd`
export LD_LIBRARY_PATH=$CuPBoP_PATH/build/runtime:$CuPBoP_PATH/build/runtime/threadPool:$LD_LIBRARY_PATH
export CUDA_PATH=/usr/local/cuda-11.7 # set to your own location

Build CuPBoP

mkdir build && cd build
#set -DDEBUG=ON for debugging
cmake .. \
   -DLLVM_CONFIG_PATH=`which llvm-config` \
   -DCUDA_PATH=$CUDA_PATH
make

(Optional) Use CuPBoP to execute Hetero-mark benchmark for verification
```
make test
```

Run Vector Addition example

In this section, we provide an example of how to use CuPBoP to execute a CUDA program.

cd examples/vecadd
# Compile CUDA source code (both host and kernel) to bitcode files
clang++ -std=c++11 vecadd.cu \
      -I../.. --cuda-path=$CUDA_PATH \
      --cuda-gpu-arch=sm_50 -L$CUDA_PATH/lib64 \
      -lcudart_static -ldl -lrt -pthread -save-temps -v  || true
# Apply compilation transformations on the kernel bitcode file
$CuPBoP_PATH/build/compilation/kernelTranslator \
      vecadd-cuda-nvptx64-nvidia-cuda-sm_50.bc kernel.bc
# Apply compilation transformations on the host bitcode file
$CuPBoP_PATH/build/compilation/hostTranslator \
      vecadd-host-x86_64-unknown-linux-gnu.bc host.bc
# Generate object files
llc --relocation-model=pic --filetype=obj  kernel.bc
llc --relocation-model=pic --filetype=obj  host.bc
# Link with runtime libraries and generate the executable file
g++ -o vecadd -fPIC -no-pie \
      -I$CuPBoP_PATH/runtime/threadPool/include \
      -L$CuPBoP_PATH/build/runtime  \
      -L$CuPBoP_PATH/build/runtime/threadPool \
      host.o kernel.o \
      -I../.. -lc -lCPUruntime -lthreadPool -lpthread
# Execute
./vecadd

How to contribute?

Any kinds of contributions are welcome. Please refer to Contribution.md for more detail.

Related publications

If you want to refer CuPBoP in your projects, please cite the related papers:

Contributors

Ruobing Han
Jun Chen
Bhanu Garg
Xule Zhou
John Lu
Chihyo Ahn
Haotian Sheng
Blaise Tine
Hyesoon Kim

Acknowledgements

POCL is an open-source OpenCL implementations that based on LLVM. We reuse some code from it (e.g., apply optimizations, load/store LLVM IRs).
Hetero-Mark and Rodinia Benchmark are two benchmark suites for heterogeneous system computation. CuPBoP uses them as integrated test to verify the correctness.
moodycamel::ConcurrentQueue is a fast multi-producer, multi-consumer lock-free concurrent queue for C++11. CuPBoP uses it as the task queue for launching and executing kernels.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github/workflows		.github/workflows
common		common
compilation		compilation
examples		examples
external		external
runtime		runtime
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CuPBoP: Cuda for Parallelized and Broad-range Processors

Introduction

Install

Prerequisites

Installation

Run Vector Addition example

How to contribute?

Related publications

Contributors

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

cupbop/CuPBoP

Folders and files

Latest commit

History

Repository files navigation

CuPBoP: Cuda for Parallelized and Broad-range Processors

Introduction

Install

Prerequisites

Installation

Run Vector Addition example

How to contribute?

Related publications

Contributors

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages