extension/llm/server: document pi integration by mergennachin · Pull Request #19999 · pytorch/executorch

mergennachin · 2026-06-03T23:02:54Z

Add an operational recipe to the server README for pointing pi (or any
OpenAI-compatible harness) at the ExecuTorch server for local tool-use: the
launch command, useful flags (--no-think / --max-context /
--allow-chatml-fallback), client base_url/model/api_key settings, the supported
chat-completions + Hermes/Qwen tool-call contract (only tool_choice
auto/none/unset; response_format/logprobs/top_p!=1/seed rejected), and
reliability guidance. Docs only; no runtime or dependency changes.

Part of #20001

[ghstack-poisoned]

mergennachin · 2026-06-03T23:02:55Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-06-03T23:02:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19999

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Unrelated Failure

As of commit 83986e5 with merge base eeb0646 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t 80f721ddf5514de7b00320a4b923b93d630d5d9000648f991dc907d89c22897a /exec failed with exit code 1
pull / unittest / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 490b345dcbe8627d2ebdc7f14bc635bef5a24d7da00a5add94de5043ffb079d5 /exec failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest-editable / macos / macos-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

Add an operational recipe to the server README for pointing pi (or any OpenAI-compatible harness) at the ExecuTorch server for local tool-use: the launch command, useful flags (--no-think / --enable-prefix-cache / --max-context / --allow-chatml-fallback), client base_url/model/api_key settings, the supported chat-completions + Hermes/Qwen tool-call contract (only tool_choice auto/none/unset; response_format/logprobs/top_p!=1/seed rejected), and reliability guidance. Docs only; no runtime or dependency changes. ghstack-source-id: 672db61 ghstack-comment-id: 4617420672 Pull-Request: #19999

[ghstack-poisoned]

Add an operational recipe to the server README for pointing pi (or any OpenAI-compatible harness) at the ExecuTorch server for local tool-use: the launch command, useful flags (--no-think / --enable-prefix-cache / --max-context / --allow-chatml-fallback), client base_url/model/api_key settings, the supported chat-completions + Hermes/Qwen tool-call contract (only tool_choice auto/none/unset; response_format/logprobs/top_p!=1/seed rejected), and reliability guidance. Docs only; no runtime or dependency changes. ghstack-source-id: 9e816d2 ghstack-comment-id: 4617420672 Pull-Request: #19999

[ghstack-poisoned]

[INITIAL] Update

628e2c4

[ghstack-poisoned]

mergennachin requested a review from larryliu0820 as a code owner June 3, 2026 23:02

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2026

mergennachin requested review from Gasoonjia, GregoryComer, kirklandsign and psiddh June 3, 2026 23:07

[UPDATE] Update

c6b1c9a

[ghstack-poisoned]

[UPDATE] Update

7c1b405

[ghstack-poisoned]

psiddh approved these changes Jun 4, 2026

View reviewed changes

[UPDATE] Update

c5e466e

[ghstack-poisoned]

mergennachin mentioned this pull request Jun 4, 2026

examples/models/qwen3_5_moe: CUDA Engine/Session adapter + OpenAI serving #20043

Open

mergennachin marked this pull request as draft June 4, 2026 18:51

mergennachin added 3 commits June 4, 2026 15:14

[UPDATE] Update

731782a

[ghstack-poisoned]

[UPDATE] Update

5e5a3a0

[ghstack-poisoned]

[UPDATE] Update

950096c

[ghstack-poisoned]

mergennachin marked this pull request as ready for review June 5, 2026 19:00

[UPDATE] Update

83986e5

[ghstack-poisoned]

mergennachin mentioned this pull request Jun 8, 2026

Qwen3.5-MoE CUDA V2 foundation: one model, many isolated sessions #20117

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extension/llm/server: document pi integration#19999

extension/llm/server: document pi integration#19999
mergennachin wants to merge 8 commits into
gh/mergennachin/5/headfrom
gh/mergennachin/6/head

mergennachin commented Jun 3, 2026 •

edited

Loading

Uh oh!

mergennachin commented Jun 3, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mergennachin commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergennachin commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19999

❌ 3 New Failures, 1 Unrelated Failure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mergennachin commented Jun 3, 2026 •

edited

Loading

mergennachin commented Jun 3, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 3, 2026 •

edited

Loading