Skip to content

camenduru/non-profit-gpu-cluster

Repository files navigation

🐣 Please follow me for new updates https://twitter.com/camenduru
🔥 Please join our discord server https://discord.gg/k5BwmmvJJU
🥳 Please join my patreon community https://patreon.com/camenduru

Motivation

A non-profit GPU cluster that runs open-source paper demos with a UI for free for everyone.

https://twitter.com/camenduru/status/1747802652182737050 image

  • If each person receives 24 hours of compute time every week with a 3090 or A5000 GPU 7 people can use it, with 2xGPU 14 people, with 24xGPU 168 people ...
  • Operation cost (electricity): 2xGPU 3090 or A5000 24 Hours ~$2
  • End-of-the-year goal: 6 servers with a total of 24 x A5000 or 3090 GPUs.

Server 1 Parts (Training and Inference)

  • ✔ GPU1: Asus ROG Strix RTX3090 O24G (default 3-slot with Liquid Cooler 2-slot) $747
  • ✔ GPU2: Asus ROG Strix RTX3090 O24G-W (default 3-slot with Liquid Cooler 2-slot) $790
  • ✔ 3 x Case Fan: NZXT AER P - RF-AP140-FP $26
  • ✔ Motherboard: Asus Pro WS C621-64L SAGE (4 GPU Support) $628
  • ✔ CPU: Intel® Xeon® W-3235 Processor (64 Lane PCIe 3.0) (4 GPU Support) $197
  • ✔ CPU Cooler: 4U Active CPU Heat Sink LGA3647 (Narrow) $78
  • ✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $95
  • ✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $95
  • ✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $110
  • ✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $110
  • ✔ SSD: Lexar NM790 4TB M.2 NVMe PCIe Gen 4X4 7400-6500 MB/s $271
  • ✔ Power supply: Corsair AX1500i 1500 Watt 80+ Titanium $249
  • ✔ Case: Antec P20C-W (E-ATX) $89

Server 2 Parts (Training and Inference)

  • ✔ GPU1: Asus ROG Strix RTX3090 O24G (default 3-slot with Liquid Cooler 2-slot) $745
  • GPU2: 3090 OR A5000
  • ✔ 3 x Case Fan: NZXT AER P - RF-AP140-FP $26
  • ✔ Motherboard: Asus Pro WS C621-64L SAGE (4 GPU Support) $634
  • ✔ CPU: Intel® Xeon® W-3235 Processor (64 Lane PCIe 3.0) (4 GPU Support) $190
  • ✔ CPU Cooler: 4U Active CPU Heat Sink LGA3647 (Narrow) $63
  • ✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $107
  • ✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $107
  • ✔ SSD: Patriot Viper VP4300 Lite 4TB Gen 4x4 7400-6000 MB/s $262
  • ✔ Power supply: Corsair AX1500i 1500 Watt 80+ Titanium $227
  • ✔ Case: Antec P20C-W (E-ATX) $89

Server 3 Parts (Database - Web - Dispatcher - Scheduler)

  • ✔ Motherboard: Asus Z97-A $30
  • ✔ CPU: Intel® Core™ i7-4790K Processor $60
  • ✔ CPU Cooler: CPU Heat Sink $5
  • ✔ Ram: 1 x 8 GB G-SKILL DDR3-1333 DIMM $15
  • ✔ Ram: 1 x 8 GB G-SKILL DDR3-1333 DIMM $15
  • ✔ SSD: Kingston 256 GB $30
  • ✔ Power supply: Corsair VS650 Watt $45
  • ✔ Case: ATX $10

Budget & Sponsors

Updates

July 7, 2024

✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $110
✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $110

June 12, 2024

✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $107

June 6, 2024

✔ CPU Cooler: 4U Active CPU Heat Sink LGA3647 (Narrow) $63
✔ Ram: 1 x 32GB Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22 $107
✔ SSD: Patriot Viper VP4300 Lite 4TB Gen 4x4 7400-6000 MB/s $262

May 22, 2024

✔ 3 x Case Fan: NZXT AER P - RF-AP140-FP $26
✔ 3 x Case Fan: NZXT AER P - RF-AP140-FP $26
✔ GPU1: Asus ROG Strix RTX3090 O24G (default 3-slot with Liquid Cooler 2-slot) $745

May 20, 2024

✔ CPU: Intel® Xeon® W-3235 Processor (64 Lane PCIe 3.0) (4 GPU Support) $190

May 17, 2024

✔ Motherboard2: Asus Pro WS C621-64L SAGE (4 GPU Support) $634

January 31, 2024

✔ CPU1: Intel® Xeon® W-3235 Processor (64 Lane PCIe 3.0) (4 GPU Support) $197
✔ CPU2: Intel® Xeon® Silver 4110 Processor (48 Lane PCIe 3.0) (3 GPU Support) $33
✔ RAM1: 1 x 32GB (Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22) $95
✔ SSD1: Lexar NM790 4TB M.2 NVMe PCIe Gen 4X4 7400-6500 MB/s $271

January 27, 2024

✔ Power Supply2: Corsair AX1500i 1500 Watt 80+ Titanium $227
✔ Case2: (Antec P20C-W) $89

January 24, 2024

✔ GPU2: Asus ROG Strix RTX3090 O24G-W (default 3-slot with Liquid Cooler 2-slot) $790

January 23, 2024

✔ 🧿 The non-profit GPU cluster is now running instantid.github.io 🥳 (operating with an old motherboard and CPU because our new CPU has not arrived yet)

January 22, 2024

✔ 🧿 The non-profit GPU cluster is now running photo-maker.github.io 🥳 (operating with an old motherboard and CPU because our new CPU has not arrived yet)

January 20 2024

✔ GPU1: Asus ROG Strix RTX3090 O24G (default 3-slot with Liquid Cooler 2-slot) $747
✔ Power Supply1: Corsair AX1500i 1500 Watt 80+ Titanium $249
✔ CPU Cooler1: 4U Active CPU Heat Sink LGA3647 (Narrow) $78

January 19 2024

✔ Motherboard1: (Asus Pro WS C621-64L SAGE) $628
✔ RAM1: 1 x 32GB (Micron 32GB DDR4-3200 ECC-RDIMM 1Rx4 CL22) $95
✔ Case1: (Antec P20C-W) $89
mb_one_ram_case

Setup

SSH root login

sudo nano /etc/ssh/sshd_config
PermitRootLogin prohibit-password to PermitRootLogin yes
sudo systemctl restart ssh
sudo passwd
sudo ufw allow ssh

Ubuntu 22.04.3 LTS

apt update
apt upgrade -y
apt install build-essential software-properties-common zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev -y
apt install wget nvtop python-is-python3 python3-pip aria2 unrar -y

Cuda 12.1.0_530.30.02

lsmod | grep nouveau
cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF
sudo update-initramfs -u
sudo reboot

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sh cuda_12.1.0_530.30.02_linux.run
nvidia-smi
nano /etc/ld.so.conf
ldconfig

nano .bashrc
ldconfig
nvcc --version

nvidia-smi -q | grep -i bar -A 3
lsmod | grep -i nvidia
rmmod nvidia_drm nvidia_modeset nvidia_uvm nvidia
https://www.nvidia.com/en-us/geforce/news/geforce-rtx-30-series-resizable-bar-support/
https://forums.developer.nvidia.com/t/enabling-resizable-bar-on-rtx-30-series-gpus-in-linux/239950

Python 3.10.12

pip install torch==2.3.0+cu121 torchvision==0.18.0+cu121 torchaudio==2.3.0+cu121 torchtext==0.18.0 torchdata==0.7.1 --extra-index-url https://download.pytorch.org/whl/cu121
pip install xformers==0.0.26.post1
pip install torch==2.2.1+cu121 torchvision==0.17.1+cu121 torchaudio==2.2.1+cu121 torchtext==0.17.1 torchdata==0.7.1 --extra-index-url https://download.pytorch.org/whl/cu121
pip install xformers==0.0.25
pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 torchtext==0.16.0 torchdata==0.7.0 --extra-index-url https://download.pytorch.org/whl/cu121
pip install notebook
pip show torch notebook

Network

nano /etc/netplan/00-installer-config.yaml
netplan apply

wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
dpkg -i cloudflared-linux-amd64.deb
cloudflared service install TOKEN_HERE

Docker

https://docs.docker.com/engine/install/ubuntu/
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit
systemctl restart docker
https://github.com/jupyter/docker-stacks

https://github.com/camenduru/docker-stacks-foundation
https://github.com/camenduru/base-notebook
https://quay.io/repository/camenduru/docker-stacks-foundation
https://quay.io/repository/camenduru/base-notebook
docker build -t base-notebook .

Ubuntu 22.04 Python 3.10.11
timeout 4h docker container run -it --rm --gpus all -u root -e GRANT_SUDO=yes -p 1000:7860 quay.io/camenduru/base-notebook:latest
timeout 4h docker container run -it --rm --gpus device=0 -u root -e GRANT_SUDO=yes -p 1000:7860 registry.hf.space/camenduru-base-notebook:latest
timeout 4h docker container run -it --rm --gpus device=1 -u root -e GRANT_SUDO=yes -p 1000:7860 camenduru/base-notebook:latest
docker system prune -a

docker cp /content/test.rar 76e35c4a6e8f:/home/jovyan/test.rar
docker exec -it 76e35c4a6e8f bash

Other

tmux ls
tmux a
tmux attach-session -t 0
tmux capture-pane -pS - > ~/tmux-buffer.txt

wget https://openrgb.org/releases/release_0.9/openrgb_0.9_amd64_bookworm_b5f46e3.deb
dpkg -i openrgb_0.9_amd64_bookworm_b5f46e3.deb
apt --fix-broken install
openrgb -m off

find / -type f -exec du -h {} + | sort -rh | head -n 20

git clone https://github.com/aristocratos/btop
cd btop
make
make install PREFIX=/usr

https://github.com/raboof/nethogs

Jupyter

https://github.com/jupyterlab/jupyterlab

mkdir /content
nano /etc/systemd/system/jupyter-lab.service
systemctl daemon-reload
systemctl start jupyter-lab
systemctl enable jupyter-lab
systemctl list-unit-files --type=service --state=enabled
pip install pickleshare ipywidgets

OpenVSCode

https://github.com/coder/code-server

curl -fsSL https://code-server.dev/install.sh | sh
nano /etc/systemd/system/default.target.wants/[email protected]
systemctl enable --now code-server@$USER
systemctl status code-server@$USER
systemctl enable code-server@$USER
systemctl list-unit-files --type=service --state=enabled

nano /root/.config/code-server/config.yaml
bind-addr: 0.0.0.0:8080
auth: none
disable-telemetry: true
cert: false

chrome:https://flags/#unsafely-treat-insecure-origin-as-secure

Web

apt install openjdk-21-jdk

https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-ubuntu/

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" 

nvm install 18.20.3
npm install -g generator-jhipster
npm install -g yo

./mvnw
npm start

sftp [email protected]
get -r /content/folder /content/folder

mongodump --db=web --out=/content/folder/db
mongorestore --db=web /content/folder/db/web
pip install pymongo

journalctl -u dispatcher-web-name.service
systemctl list-units --type=service --state=running

FFmpeg

https://docs.nvidia.com/video-technologies/index.html

!mkdir /content/ffmpeg
%cd /content/ffmpeg
!git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
%cd nv-codec-headers
!sudo make install
%cd /content/ffmpeg
!git clone https://git.ffmpeg.org/ffmpeg.git /content/ffmpeg/ffmpeg
%cd /content/ffmpeg/ffmpeg
!sudo apt-get install build-essential yasm cmake libtool libc6 libc6-dev unzip wget libnuma1 libnuma-dev pkg-config -y
!./configure --enable-nonfree --enable-cuda-nvcc --enable-nvenc --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --disable-static --enable-shared --enable-libmp3lame
!make -j 24
!make install

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages