目录
Eugene Park

Improve ball_query() runtime for large-scale cases (#2006)

Summary:

Overview

The current C++ code for pytorch3d.ops.ball_query() performs floating point multiplication for every coordinate of every pair of points (up until the maximum number of neighbor points is reached). This PR modifies the code (for both CPU and CUDA versions) to implement idea presented here: a D-cube around the D-ball is first constructed, and any point pairs falling outside the cube are skipped, without explicitly computing the squared distances. This change is especially useful for when the dimension D and the number of points P2 are large and the radius is much smaller than the overall volume of space occupied by the point clouds; as much as ~2.5x speedup (CPU case; ~1.8x speedup in CUDA case) is observed when D = 10 and radius = 0.01. In all benchmark cases, points were uniform randomly distributed inside a unit D-cube.

The benchmark code used was different from tests/benchmarks/bm_ball_query.py (only the forward part is benchmarked, larger input sizes were used) and is stored in tests/benchmarks/bm_ball_query_large.py.

Average time comparisons

cpu-03-0 01-avg cuda-03-0 01-avg cpu-03-0 10-avg cuda-03-0 10-avg cpu-10-0 01-avg cuda-10-0 01-avg cpu-10-0 10-avg cuda-10-0 10-avg

Peak time comparisons

cpu-03-0 01-peak cuda-03-0 01-peak cpu-03-0 10-peak cuda-03-0 10-peak cpu-10-0 01-peak cuda-10-0 01-peak cpu-10-0 10-peak cuda-10-0 10-peak

Full benchmark logs

benchmark-before-change.txt benchmark-after-change.txt

Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/2006

Reviewed By: shapovalov

Differential Revision: D85356394

Pulled By: bottler

fbshipit-source-id: 9b3ce5fc87bb73d4323cc5b4190fc38ae42f41b2

23天前1231次提交
目录README.md

该内容不合规,请修改。

邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

©Copyright 2023 CCF 开源发展委员会
Powered by Trustie& IntelliDE 京ICP备13000930号