Skip to content

Performance discrepancy and low Tensor Core utilization on RTX 4090 for Mip-NeRF360 (bicycle) #7

@GitHubTheSun

Description

@GitHubTheSun

This is a very impressive piece of work. Leveraging Tensor Cores to alleviate the computation bottleneck of the exp function is a highly effective idea, and it clearly improves the rendering performance of 3D Gaussian Splatting.

I have deployed the project on an RTX 4090 and evaluated it on the bicycle scene from the Mip-NeRF360 dataset, using the .ply file provided by the original 3DGS project. Due to GPU memory constraints, I ran the test with the command-line argument --resolution 2, i.e., rendering at half resolution.

Under this configuration, I obtained a rendering performance of around 312 FPS, which shows a noticeable gap compared to the ~465 FPS reported in the paper for the Mip-NeRF360 dataset. In addition, based on Nsight Systems profiling, I observed that the Tensor Core utilization inside the renderCUDA_TC kernel is only about 10%, which seems lower than expected.

To better align the evaluation settings, I would like to clarify the following two points:

  1. Data and resolution settings used during inference
    I noticed that in full_eval.py, during training on the Mip-NeRF360 dataset, outdoor scenes are configured with --image4 (4× downsampling), while during inference no explicit resolution downsampling appears to be applied. Therefore, I would like to confirm:

    • whether the inference results reported in the paper use newly trained .ply files or the .ply files provided by the original 3DGS project;
    • what the actual rendering resolution setting is for the bicycle scene during inference.
  2. Impact of hardware differences
    If my resolution setting is consistent with your inference setup, another possible explanation could be hardware differences. The paper reports results obtained on an A800, whereas my experiments are conducted on an RTX 4090. Could the observed performance gap and low Tensor Core utilization mainly stem from differences in memory bandwidth, or other architectural factors between these two GPUs?

Thank you very much for your time and for this excellent work. I would greatly appreciate any insights or guidance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions