Fast, minimalistic and open-source implementation (Apache 2.0 License) of 3D Gaussian Splatting rasterizer function as forward/backward CUDA kernels. Forward call is our original work and our backward code is based on nerfstudio's gsplat implementation. We are using the same api as vanilla graphdeco-inria 3D Gaussian Splatting implementation, so it is very easy to replace original render calls simply by swapping the import.
- β‘ Get fast open-source forward/backward kernels
- π§ Install from repository
- π How to switch to open-source KNN
- π Benchmarks
Fast open-source and easy to use replacement for these who are using non-commercial friendly vanilla graphdeco-inria 3D Gaussian Splatting implementation.
- Forward and backward CUDA calls
- Fast, minimalistic and open-source
- Easy to integrate, compatible with vanilla graphdeco-inria API
- Native CPP (Thrust) and Torch I/O API
Follow this step, if you are already using vanilla's graphdeco-inria 3D Gaussian Splatting implementation in your project and you want to replace forward/backward kernels with open-source kernels.
Make sure CUDA compiler is installed in your environment and simply install:
pip install ds-splat
You are good to go just by swapping imports:
- from diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer
+ from ds-splat import GaussianRasterizationSettings, GaussianRasterizer
After swapping to our code, you will keep 3D Gaussian Splatting functionality (backward and forward passes) and you will use open-source code. If you also want to use open-source code for the KNN step in preprocessing, scroll down!
If you are rather starting project from scratch and are interested in end-to-end environment, we recommend to check our integration into gaussian-splatting-lighting repository. Gaussian splatting lighting repository is under MIT License, but submodules like vanilla's forward/backward kernels or KNN implementation has non-commercial friendly license. You can use deepsense ds-splat as a backend, and this way using fast open-source forward/backward kernel calls.
Instead of installing from PyPI, you can install ds-splat package directly from this repository.
you can use pip install in the project's root directory:
pip install .
Via, this will compile CUDA and CPP code and will install ds-splat package.
This is a bit more manual and you don't have to make it if you installed from PyPI or with the above pip install.
If you prefer to build project from scratch follow instructions here.
This project uses conan for additional dependencies i.e. Catch2. To generate CMake project follow these instructions:
cd cuda_rasterizer # make sure you are in the root directory
conan install . -of=conan/x86_reldebug --settings=build_type=RelWithDebInfo --profile=default
mkdir build_cpp; cd build_cpp
cmake -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` -DCMAKE_TOOLCHAIN_FILE=../conan/x86_reldebug/build/RelWithDebInfo/generators/conan_toolchain.cmake -DBUILD_TESTING=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
If there are any problems regarding runtime exception (e.g. std::bad_alloc
) or link errors make sure to edit your conan profile to use specific ABI.
Following conanfile was tested:
If you are using for e.g. gaussain splatting lighting repository, then forward/backward CUDA kernels and KNN are under Gaussian-Splatting License. When you switch to our code following instructions above, you will use our open source forward and backward calls. Here, we provide instructions on how to also use open source KNN implementation via Faiss. This instruction is for replacing KNN implementation in gaussain splatting lighting repository. For example, if you are using conda, in your environment install:
conda install -c pytorch -c nvidia -c rapidsai -c conda-forge faiss-gpu-raft=1.8.0
- localize file that contains class GaussianModel
- import faiss
import faiss
- add method for averaged distances
def _get_averaged_distances(self, pcd_points_np: np.ndarray, method: str = "CPU_approx", device_id: int = 0, k: int = 4, dim: int = 3, nlist: int = 200) -> np.ndarray: """ This method takes numpy array of points and returns averaged distances for k-nearest neighbours for each query point (excluding query point). Database/reference points and query points are same set. Using Faiss as a backend. Args: pcd_points_np: pcd points as numpy array method: how faiss create indices and what is target device for calc. {"CPU", "GPU", "CPU_approx", "GPU_approx"} device_id: GPU device id k: k-nearest neighbours (including self) dim: dimentionality of the dataset. 3 by default. nlist: the number of clusters or cells in the inverted file (IVF) structure when using an IndexIVFFlat index. Only relevant for approximated methods. Returns: numpy array as mean from k-nearest neighbour (except self) for each query point """ valid_index_types = {"CPU", "GPU", "CPU_approx", "GPU_approx"} pcd_points_float_32 = pcd_points_np.astype(np.float32) if method == "CPU": index = faiss.IndexFlatL2(dim) elif method == "GPU": res = faiss.StandardGpuResources() index = faiss.GpuIndexFlatL2(res, dim) elif method == "CPU_approx": quantizer = faiss.IndexFlatL2(3) # the other index index = faiss.IndexIVFFlat(quantizer, dim, nlist) elif method == "GPU_approx": res = faiss.StandardGpuResources() quantizer = faiss.IndexFlatL2(3) # the other index. Must be CPU as nested GPU indexes are not supported index = faiss.index_cpu_to_gpu(res, device_id, faiss.IndexIVFFlat(quantizer, dim, nlist)) else: raise ValueError(f"Invalid index_type. Expected one of {valid_index_types}, but got {method}.") if method in {"CPU_approx", "GPU_approx"}: index.train(pcd_points_float_32) index.add(pcd_points_float_32) D, _ =, k) D_mean = np.mean(D[:, 1:], axis=1) return D_mean
- localize create_from_pcd(...) method and modify it.
Replace lines:
- dist2 = torch.clamp_min(distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda()), 0.0000001).to(deivce) + dist_means_np = self._get_averaged_distances(pcd_points_np=pcd_points_np, method="CPU_approx") + dist2 = torch.clamp_min(torch.tensor(dist_means_np), 0.0000001).to(deivce)
This way you have modified the KNN method. Now it is independent from a licensed submodule (distCUDA2 method) and now it is open source!
We have conducted a series of benchmarks, comparing deepsense implementation inference runtime to vanilla implementation graphdeco-inria 3D gaussian splatting implementation and to nerfstudio's gsplat implementation (version 0.1.12). Tests were conducted in the gaussian-splatting-lighting environment comparing the following implementations: nerfstudio's gsplat, vanilla and deepsense (ours).
Below plots present inference time in ms measured for 120 frames as fly through a scene with zooming out to capture all Gaussians. 6.1M Gaussians rendered in 1920x1080 with an NVIDIA 4070 Laptop GPU and 5.8M Gaussians rendered in 3840x2160 with an NVIDIA 3090 GPU.
For trained scenes, we have also compared PSNR (Peak Signal-to-Noise Ratio) for deepsense and gsplat methods to vanilla as ground truth. Using vanilla's inria implementation, we rendered images when flying through a scene, treating them as ground truth. For deepsense and gsplat implementations, we rendered scenes from the same camera positions and compared them to vanilla. This test shows how close our/gsplat implementation is to vanilla's. Some details are implementation-specific and result in slightly different outcomes, but both methods have very good PSNR in this regard. Higher PSNR is better.
π₯ Download more benchmark plots from GDrive.