This repository contains the code that uses the hardware ray tracing capability provided by Nvidia's RT cores to accelerate neighbor search in low-dimensional space (lower than 3D), which is prevalent in engineering and science fields (e.g., computational fluid dynamics, graphics, vision), because they deal with physical data such as particles and surface samples that inherently reside in the 2D/3D space.
While RT cores are designed (and optimized) for ray tracing, we show how to map neighbor search to the ray tracing hardware to achieve significant speedups (an order of magnitude) over traditional GPU (CUDA) implementations. The code is primarily developed using the OptiX programming interface (for ray tracing) and also uses CUDA to parallelize many non-ray tracing helper functions. The technical aspects of the code are discussed in this PPoPP 2022 paper. The performance of RTNN has been much improved since the publication of the paper.
Two types of neighbor search exist: fixed-radius search (a.k.a., range search) and K nearest neighbor search (KNN). RTNN optimizes for both types. For both types of search we assume a search interface that provides a search radius r
and a maximum neighbor count K
, consistent with the interface of many existing neighbor search libraries. We could emulate an unbounded KNN search by providing a very large r
and emulate an unbounded range search by providing a very large K
.
Why do we need a search radius (even in KNN searches)? In practical applications the returned neighbors are usually bounded by a search radius, beyond which the neighbors are discarded. This is because the significance of a neighbor (e.g., the force that a particle exerts on another) is minimal and of little interest when it is too far away.
Why do we need to bound K
(even in range searches)? In practical applications the maximum amount of returned neighbors is bounded in order to bound the memory consumption and to interface with downstream tasks, which usually expect a fixed amount of neighbors.
- NVCC. Tested on 11.3.109 and 11.4.48.
- CMake. Tested on 3.20.5.
- g++. Tested on 7.5.0.
- Thrust. Tested on v10.11.00.
- A RTX-capable GPU (Turing architecture and later) from Nvidia. Tested on RTX 2080 and RTX 2800 Ti.
You do not have to install the OptiX SDK yourself. The code is developed using the SDK as a template and includes all the necessary headers. For reference, the particular OptiX SDK used is 7.1.0.
include
: headers needed for OptiX. Copied from the OptiX SDK 7.1 without modifications.
src
:
optixNSearch/
: the main source code.sutil/
: the utility library from the OptiX SDK. We keep only those that are actually used in this project.CMakeLists.txt
: the usual cmake file.CMake/
: contains a bunch of.cmake
files that are used by `CMakeLists.txt to find libraries, etc. This is also copied from the OptiX SDK without any change.samplepc.txt
: a sample point cloud file illustrating the input file format.memstatlib/
: this builds a dynamic library that tracks memory allocation in CUDA. It's not needed for the main RTNN code. See the readme file there for more information.