Skip to content

[Docs]Rework Bring Your Own Codegen tutorial and add TensorRT example#19839

Merged
tqchen merged 1 commit into
apache:mainfrom
tlopex:tensorrt-byoc-tutorial
Jun 19, 2026
Merged

[Docs]Rework Bring Your Own Codegen tutorial and add TensorRT example#19839
tqchen merged 1 commit into
apache:mainfrom
tlopex:tensorrt-byoc-tutorial

Conversation

@tlopex

@tlopex tlopex commented Jun 19, 2026

Copy link
Copy Markdown
Member

To solve #19682 , this pr reworks BYOC tutorial into two parts driven by one shared model:

  • "How BYOC works": run a single conv2d+relu through the same FuseOpsByPattern -> MergeCompositeFunctions -> RunCodegen flow on both the example NPU (a stub, so check shape) and TensorRT (real, cross-checked against a CPU build), so the only thing that varies is the backend. partition_for_tensorrt is shown as the one-line wrapper for those two passes, with the bind_constants / stub-vs-real / shape-vs-value contrasts side by side. Add an FP16 example via the relax.ext.tensorrt.options pass config and a summary table; drop the redundant second NPU section.

  • "Deploying a PyTorch model with TensorRT": take a real torch.nn.Module through torch.export -> from_exported_program -> partition_for_tensorrt -> build for CUDA -> run, cross-checking the GPU output against PyTorch.

This pr also fixes two stale references in the example NPU backend: the README and the runtime's \file docstring pointed at src/runtime/contrib/example_npu/ but the file lives under src/runtime/extra/contrib/example_npu/; and reword the README's "Memory constraint checking: Validates tensor sizes" bullet, since _check_npu_memory_constraints / _check_npu_quantization are explicit placeholders that return True.

Validated end-to-end on a CUDA GPU with TensorRT 10: the example NPU, TensorRT, FP16, and PyTorch-deployment cells all run and match their references.

@tlopex tlopex changed the title [Docs][TensorRT] Rework Bring Your Own Codegen tutorial and add TensorRT example [Docs]Rework Bring Your Own Codegen tutorial and add TensorRT example Jun 19, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly updates the Bring Your Own Codegen (BYOC) tutorial to cover both the mock 'example NPU' backend and a real production backend (NVIDIA TensorRT), including an end-to-end example of deploying a PyTorch model. Related documentation and paths are also updated. The feedback suggests using tempfile.TemporaryDirectory() as a context manager instead of tempfile.mkdtemp() to ensure proper cleanup of temporary files and prevent disk pollution.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread docs/how_to/tutorials/bring_your_own_codegen.py Outdated
…deployment

The tutorial taught BYOC with the example NPU and tacked TensorRT on as a
separate appendix, with a redundant second NPU example and the NPU-vs-TensorRT
differences spread across far-apart sections.

Rework it into two parts driven by one shared model:

- "How BYOC works": run a single conv2d+relu through the same
  FuseOpsByPattern -> MergeCompositeFunctions -> RunCodegen flow on both the
  example NPU (a stub, so check shape) and TensorRT (real, cross-checked
  against a CPU build), so the only thing that varies is the backend.
  partition_for_tensorrt is shown as the one-line wrapper for those two
  passes, with the bind_constants / stub-vs-real / shape-vs-value contrasts
  side by side. Add an FP16 example via the relax.ext.tensorrt.options pass
  config and a summary table; drop the redundant second NPU section.

- "Deploying a PyTorch model with TensorRT": take a real torch.nn.Module
  through torch.export -> from_exported_program -> partition_for_tensorrt ->
  build for CUDA -> run, cross-checking the GPU output against PyTorch, then
  export the compiled module and reload it to show the build-once / run-later
  deployment path. This adds the end-to-end nn.Module example requested in
  apache#19682, plus short notes on operator fallback, dynamic shapes, and engine
  caching.

Also fix two stale references in the example NPU backend (the README and the
runtime \file docstring pointed at src/runtime/contrib/example_npu/ rather than
.../extra/...) and reword the README's "Memory constraint checking" bullet
(those checks are placeholders that return True); and repoint the dangling
docs/deploy/tensorrt.rst reference in cmake/config.cmake at the new tutorial.

Validated end-to-end on a CUDA GPU with TensorRT 10: the example NPU, TensorRT,
FP16, PyTorch-deployment, and export/reload cells all run and match their
references. Each section degrades gracefully when its backend (or PyTorch) is
unavailable.
@tlopex tlopex force-pushed the tensorrt-byoc-tutorial branch from ddcba0b to 646c3bb Compare June 19, 2026 06:18
@tqchen tqchen merged commit b8211bf into apache:main Jun 19, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants