In: Panagiotis Ritsos, Kai Xu (editor). Computer Graphics & Visual Computing (CGVC) 2020. Computer Graphics and Visual Computing (CGVC-2020) September 10-11 London England United Kingdom The Eurographics Association 2020.
Nearest neighbor search algorithms on GPUs have been improving for years. Starting with tree-based approaches in the middle 70’s, state-of-the-art methods use hash-based or grid-based methods. Leveraging high-performance hardware functionality decreases runtime of these search algorithms. Furthermore, memory consumption has been decreased significantly as well using Shared Memory. In the scope of these enhancements, particles have been reordered by different constraints that simplify neighbor processing. However, inspecting the existing algorithms reveals underused capabilities caused by algorithm desing. Exploiting these capabilities in a smart way can increase occupancy and efficiency on GPUs. In this paper, we present a neighbor processing approach that is based on dynamic load balancing. We rely on a lightweight workload-analysis phase that is applied during neighbor processing to distribute work throughout all warps in a thread group on-the-fly. In different domains, the neighbor function is often symmetric and, thus, commutative in each argument. In contrast to prior work, we use this domain knowledge to reduce the number of memory accesses considerably. Measurements of the newly introduced features on our evaluation scenarios show a comparable runtime performance to state-of-the-art methods. Increasing the overall workload by processing million-particle domains leads to significant improvements in terms of runtime. At the same time, we minimize global memory consumption to enable more particles to be processed compared to current approaches.