Yang-gon kim
This research changes the original RT units to support more general search algorithms. It gives much easier programming interfaces. It is a very smart approach to improve the current NVIDIA, AMD architecture. But, this approach requires detailed information about these baseline RT units. Also, they added broad evaluation benchmarks.
To understand this paper, we need some broader background. So, I read another paper[1].
There are two types of neighbor search variants; fixed-radius search, k-nearest
neighbor(maximum k) search. But, KNN also considers some maximum radius search space because we don’t have to expand search space too far.

The data structure for this search is Bounding Volume Hierarchy (BVH). The colored objects are primitives and those primitives are represented by bounding boxes(Axis-Aligned Bounding Boxes). And all those bounding boxes are represented via tree structure.
And the purpose of this algorithm is to find the closest primitives from the ray. And, this algorithm becomes a tree traversal problem. The benefits are as follows. If the ray does not intersect with some node’s AABB, then entire subtrees beneath can be skipped.

In the OptiX programming model, the procedure is as follows. {Building BVH, Ray generation shader(RG), Traversal(TL) & ray-AABB intersection test, Intersection shader(IS), Any-hit shader(AH), Closest-hit shader(CH)}. The yellow boxes are programmable shader programs which give opportunities to use this RT for other purposes.
OptiX provides a “Single Instruction Multiple Rays” execution model. And not all procedures are executed in RT cores. Only TL& ray-AABB intersection tests are executed in the RT cores and
other procedures are executed in the cuda cores. This is possible because RT and Cuda cores share the memory hierarchy. So, RT cores are simply dedicated execution units in SMs.
In the hardware of RT units, there is a dedicated stack (fifo) to sequentially load the nodes data. The RT unit has the 9 stage pipeline structure with mux and register in between those stages.
The current RT cores are too dedicated to some specific graphics API. So, they are not applicable to more general classes of hierarchical search algorithms. Like the early days of graphics programming, programmers have to reformulate their problems into OptiX API. But this way limits the program with a fixed set of graphics shaders and data structures. There is a necessity to extend the ray-tracing unit to support additional computations like approximate nearest neighbors search algorithms, B-tree key-value store index with high-dimensional features.