[Docs]Rework Bring Your Own Codegen tutorial and add TensorRT example by tlopex · Pull Request #19839 · apache/tvm

tlopex · 2026-06-19T05:47:40Z

To solve #19682 , this pr reworks BYOC tutorial into two parts driven by one shared model:

"How BYOC works": run a single conv2d+relu through the same FuseOpsByPattern -> MergeCompositeFunctions -> RunCodegen flow on both the example NPU (a stub, so check shape) and TensorRT (real, cross-checked against a CPU build), so the only thing that varies is the backend. partition_for_tensorrt is shown as the one-line wrapper for those two passes, with the bind_constants / stub-vs-real / shape-vs-value contrasts side by side. Add an FP16 example via the relax.ext.tensorrt.options pass config and a summary table; drop the redundant second NPU section.
"Deploying a PyTorch model with TensorRT": take a real torch.nn.Module through torch.export -> from_exported_program -> partition_for_tensorrt -> build for CUDA -> run, cross-checking the GPU output against PyTorch.

This pr also fixes two stale references in the example NPU backend: the README and the runtime's \file docstring pointed at src/runtime/contrib/example_npu/ but the file lives under src/runtime/extra/contrib/example_npu/; and reword the README's "Memory constraint checking: Validates tensor sizes" bullet, since _check_npu_memory_constraints / _check_npu_quantization are explicit placeholders that return True.

Validated end-to-end on a CUDA GPU with TensorRT 10: the example NPU, TensorRT, FP16, and PyTorch-deployment cells all run and match their references.

gemini-code-assist

Code Review

This pull request significantly updates the Bring Your Own Codegen (BYOC) tutorial to cover both the mock 'example NPU' backend and a real production backend (NVIDIA TensorRT), including an end-to-end example of deploying a PyTorch model. Related documentation and paths are also updated. The feedback suggests using tempfile.TemporaryDirectory() as a context manager instead of tempfile.mkdtemp() to ensure proper cleanup of temporary files and prevent disk pollution.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…deployment The tutorial taught BYOC with the example NPU and tacked TensorRT on as a separate appendix, with a redundant second NPU example and the NPU-vs-TensorRT differences spread across far-apart sections. Rework it into two parts driven by one shared model: - "How BYOC works": run a single conv2d+relu through the same FuseOpsByPattern -> MergeCompositeFunctions -> RunCodegen flow on both the example NPU (a stub, so check shape) and TensorRT (real, cross-checked against a CPU build), so the only thing that varies is the backend. partition_for_tensorrt is shown as the one-line wrapper for those two passes, with the bind_constants / stub-vs-real / shape-vs-value contrasts side by side. Add an FP16 example via the relax.ext.tensorrt.options pass config and a summary table; drop the redundant second NPU section. - "Deploying a PyTorch model with TensorRT": take a real torch.nn.Module through torch.export -> from_exported_program -> partition_for_tensorrt -> build for CUDA -> run, cross-checking the GPU output against PyTorch, then export the compiled module and reload it to show the build-once / run-later deployment path. This adds the end-to-end nn.Module example requested in apache#19682, plus short notes on operator fallback, dynamic shapes, and engine caching. Also fix two stale references in the example NPU backend (the README and the runtime \file docstring pointed at src/runtime/contrib/example_npu/ rather than .../extra/...) and reword the README's "Memory constraint checking" bullet (those checks are placeholders that return True); and repoint the dangling docs/deploy/tensorrt.rst reference in cmake/config.cmake at the new tutorial. Validated end-to-end on a CUDA GPU with TensorRT 10: the example NPU, TensorRT, FP16, PyTorch-deployment, and export/reload cells all run and match their references. Each section degrades gracefully when its backend (or PyTorch) is unavailable.

tlopex changed the title ~~[Docs][TensorRT] Rework Bring Your Own Codegen tutorial and add TensorRT example~~ [Docs]Rework Bring Your Own Codegen tutorial and add TensorRT example Jun 19, 2026

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread docs/how_to/tutorials/bring_your_own_codegen.py Outdated

tlopex force-pushed the tensorrt-byoc-tutorial branch from ddcba0b to 646c3bb Compare June 19, 2026 06:18

tqchen approved these changes Jun 19, 2026

View reviewed changes

tqchen merged commit b8211bf into apache:main Jun 19, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs]Rework Bring Your Own Codegen tutorial and add TensorRT example#19839

[Docs]Rework Bring Your Own Codegen tutorial and add TensorRT example#19839
tqchen merged 1 commit into
apache:mainfrom
tlopex:tensorrt-byoc-tutorial

tlopex commented Jun 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tlopex commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tlopex commented Jun 19, 2026 •

edited

Loading