End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering by SignalRT · Pull Request #1316 · SciSharp/LLamaSharp

SignalRT · 2026-01-19T22:48:14Z

Summary:
This PR delivers a full multimodal chat pipeline in LLama.Web: PDF and Word document ingestion with text extraction, image and audio uploads, native in‑browser audio recording (preview/attach/discard), plus streaming response
rendering with Markdown support.

Key Features:

Streaming chat responses rendered incrementally.
Markdown rendering in the UI (including code blocks, lists, etc.).
Multimodal inference pipeline with MTMD support wired into session execution.
PDF ingestion with text extraction and truncation safeguards.
Word (DOCX) ingestion with text extraction from document XML.
Image uploads supported end‑to‑end (validation, storage, rendering in chat).
Audio uploads supported end‑to‑end (validation, storage, playback in chat).
In‑browser audio recording (MediaRecorder) with preview + attach/discard workflow.
Capability‑aware UI (shows whether text/vision/audio are supported per model).
Download models automatically and shows the progress

Implementation Highlights

Attachment service handles file validation, storage, and extraction (PDF/DOCX).
Model session builds prompts with attached media and enforces capability checks.
Chat UI renders images/audio and guides users on supported inputs.
Captures audio and converts it to a browser file for existing upload flow.
Streaming tokens update the UI while Markdown is rendered on the fly.

Capability to upload images and ask about the images

Model auto-download + Capability to upload files and ask about the files

Initial version

- Reworked MTMD prompt handling to preserve text/media ordering and evaluate multimodal input incrementally. - Disabled unsupported multimodal features such as session persistence and context shifting. - Added standalone MTMD media loading and synchronized MTMD weight operations. - Updated MTMD example and tests to cover prompt ordering, guards, and opt-in NoCI execution. - Fixed web model/session defaults for multimodal models, including template-derived stop markers and unspecified pooling. - Improved LLama.Web audio attachment/recording flow, Qwen audio prompt handling, and chat composer UX. - Removed the broken browser script include and added a safe markdown fallback.

Some cleanup and change documentation. only mtmd doc update. I think we should regererate all doc, but I'm not sure

Stop and load the model on change Solve issue with the ENTER

martindevans · 2026-03-20T23:36:31Z

One thing that I'm not sure about is the media queue in the SafeMtmdModelHandle. Why is it an implicit queue instead of an explicit parameter passed into the tokenize call?

Alternatively, if it is necessary for some reason, could it be moved up one layer into the MtmdModel, instead of SafeModelModelHandle? That way the SafeHandle remains a minimal wrapper around llama.cpp, with additional behaviour added for convenience at the higher level wrapper.

martindevans · 2026-03-20T23:38:41Z

Other than that one comment, looks good to me!

Copilot

Pull request overview

This PR modernizes LLamaSharp’s multimodal support by migrating from LLava to MTMD, and substantially upgrades LLama.Web to support end-to-end multimodal chat (attachments, uploads, streaming markdown rendering) plus automatic model downloads with progress reporting.

Changes:

Replace LLava types/APIs/docs with MTMD equivalents (MtmdWeights, SafeMtmd* handles, executor multimodal plumbing).
Add LLama.Web pipeline: attachment upload + extraction (PDF/DOCX), media embeddings (image/audio), streaming UI rendering with markdown/mermaid, and capability-aware behavior.
Add model auto-download service with SignalR progress updates and corresponding UI/status wiring.

Reviewed changes

Copilot reviewed 77 out of 78 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
mkdocs.yml	Updates documentation navigation to MTMD docs (removes LLava entries).
docs/xmldocs/llama.statelessexecutor.md	Docs update for MTMD properties (`ClipModel`, `Embeds`).
docs/xmldocs/llama.native.safemtmdmodelhandle.md	New generated docs for MTMD safe handle API.
docs/xmldocs/llama.native.safemtmdinputchunks.md	New generated docs for MTMD input chunks wrapper.
docs/xmldocs/llama.native.safemtmdinputchunk.md	New generated docs for MTMD input chunk wrapper.
docs/xmldocs/llama.native.safemtmdembed.md	New generated docs for MTMD embed wrapper.
docs/xmldocs/llama.native.nativelibraryconfigcontainer.md	Docs: rename LLava params to MTMD, fix AVX wording, update DryRun signature docs.
docs/xmldocs/llama.native.mtmdcontextparams.md	New generated docs for MTMD context params.
docs/xmldocs/llama.mtmdweights.md	New generated docs for `MtmdWeights`.
docs/xmldocs/llama.interactiveexecutor.md	Docs update: MTMD fields, cancellation tokens, antiprompt processor, state limitations, embeds.
docs/xmldocs/llama.instructexecutor.md	Docs update mirroring interactive executor changes for MTMD + cancellation tokens.
docs/xmldocs/llama.batched.conversation.md	Docs update: add MTMD prompt overloads and remove LLava image embed overload.
docs/xmldocs/llama.batched.batchedexecutor.md	Docs update: add MTMD clip model support.
docs/xmldocs/llama.abstractions.illamaexecutor.md	Docs update: `ClipModel`/`Embeds` now MTMD types.
docs/xmldocs/index.md	Docs index updated for MTMD types and removes LLava references.
docs/Tutorials/NativeLibraryConfig.md	Tutorial updated for MTMD library configuration.
docs/Tutorials/Executors.md	Tutorial updated for MTMD fields + state persistence limitations for multimodal executors.
docs/QuickStart.md	QuickStart updated with MTMD example and embed loading flow.
docs/Examples/MtmdInteractiveModeExecute.md	Example docs updated from `SafeMtmdWeights`/single-brace paths to `MtmdWeights`/double-brace paths.
LLama/Native/SafeMtmdModelHandle.cs	Adds standalone embed creation APIs and refactors load methods to use them.
LLama/Native/Load/NativeLibraryConfig.cs	Fixes `DryRun` out params initialization/behavior and documents outputs.
LLama/MtmdWeights.cs	Adds locking and new standalone media load APIs; wraps tokenize/eval calls for thread safety.
LLama/LLamaInteractExecutor.cs	MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes.
LLama/LLamaInstructExecutor.cs	MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes.
LLama/ChatSession.cs	Blocks session persistence APIs for multimodal sessions, refactors stateful executor access.
LLama/AntipromptProcessor.cs	Uses `StringComparison.Ordinal` for antiprompt matching.
LLama.Web/wwwroot/js/sessionConnectionChat.js	Adds attachment uploads, download status UI, and streaming markdown rendering.
LLama.Web/libman.json	Adds offline web libs for markdown rendering (markdown-it plugins, katex, mermaid).
LLama.Web/appsettings.json	Updates model list to downloadable models and adds mmproj paths/URLs + new defaults.
LLama.Web/_Imports.razor	New shared imports for Blazor components/services.
LLama.Web/Shared/MainLayout.razor	Adds Blazor main layout wrapper.
LLama.Web/Services/ModelSessionService.cs	Adds attachment-aware prompt preparation + embeds, capabilities API, history handling.
LLama.Web/Services/ModelService.cs	Integrates model download readiness checks and normalizes UBatchSize/BatchSize.
LLama.Web/Services/ModelLoaderService.cs	Starts model downloads at startup and loads models after downloads complete.
LLama.Web/Services/ModelDownloadService.cs	New background download service with SignalR progress + local storage management.
LLama.Web/Services/IModelSessionService.cs	Updates Infer API to `PromptRequest` and adds capabilities method.
LLama.Web/Services/IModelService.cs	Documentation/wording cleanups.
LLama.Web/Services/IModelDownloadService.cs	New interface for model download management.
LLama.Web/Services/IAttachmentService.cs	New interface for attachment storage/extraction lifecycle.
LLama.Web/Services/AttachmentService.cs	New attachment pipeline: validation, storage, PDF/DOCX extraction, cleanup.
LLama.Web/README.md	Documents local asset storage, LibMan restore, and attachment/model download locations.
LLama.Web/Program.cs	Adds Blazor + controllers, registers new services, maps endpoints, logs storage paths.
LLama.Web/Pages/_Host.cshtml	Adds Blazor server host page.
LLama.Web/Pages/Shared/_Parameters.cshtml	Updates parameter binding to sampling pipeline fields.
LLama.Web/Pages/Shared/_Layout.cshtml	Updates layout to load offline markdown/diagram libs and Blazor runtime.
LLama.Web/Pages/Shared/_ChatTemplates.cshtml	Templates updated for markdown styling + attachment display.
LLama.Web/Pages/Index.cshtml.cs	Removed legacy Razor Pages index model.
LLama.Web/Pages/Index.cshtml	Removed legacy Razor Pages chat UI.
LLama.Web/Models/StorageInfo.cs	New model for storage path UI info.
LLama.Web/Models/PromptRequest.cs	New prompt request model including attachment IDs.
LLama.Web/Models/ModelSession.cs	Major session refactor: template-based prompts, history, multimodal capability exposure, logging.
LLama.Web/Models/ModelDownloadStatus.cs	New download snapshot/progress models and enums.
LLama.Web/Models/ModelCapabilities.cs	New model capability DTO.
LLama.Web/Models/MemoryBrowserFile.cs	In-memory `IBrowserFile` implementation.
LLama.Web/Models/LLamaModel.cs	Loads MTMD mmproj weights when configured and disposes them.
LLama.Web/Models/AttachmentInfo.cs	New attachment metadata + upload result models.
LLama.Web/LLama.Web.csproj	Adds LibMan build integration and PdfPig dependency.
LLama.Web/Hubs/SessionConnectionHub.cs	Adds download snapshot + storage info broadcasts; prompt now accepts `PromptRequest`; cleans up attachments on disconnect.
LLama.Web/Hubs/ISessionClient.cs	Adds SignalR client methods for download progress/snapshots and storage info.
LLama.Web/Extensions.cs	Comment/formatting cleanups for CSV/list helpers.
LLama.Web/Controllers/AttachmentController.cs	New attachments API endpoints for upload + download.
LLama.Web/Common/ModelOptions.cs	Adds model/mmproj download URL fields and default pooling type.
LLama.Web/Common/ModelLoadType.cs	Comment cleanup.
LLama.Web/Async/AsyncLock.cs	Comment cleanup.
LLama.Web/Async/AsyncGuard.cs	Comment cleanup.
LLama.Web/App.razor	New Blazor router app shell.
LLama.Unittest/NativeLibraryConfigContainerTests.cs	Adds unit test to ensure `DryRun` preserves loaded library outputs.
LLama.Unittest/MtmdWeightsTests.cs	Refactors MTMD tests to use fixture/collection and context-per-test.
LLama.Unittest/MtmdNoCiCollection.cs	Adds shared MTMD fixture and disables parallelization for these tests.
LLama.Unittest/MtmdExecutorTests.cs	Refactors and adds MTMD executor behavior tests (prompt ordering, chunk handling).
LLama.Unittest/MtmdContextGuardTests.cs	Adds MTMD context guard + “no state/session persistence” behavior tests.
LLama.Examples/Examples/MtmdInteractiveModeExecute.cs	Updates sample for MTMD standalone embed loads and template marker antiprompt handling.
.gitignore	Ignores LLama.Web offline libs and downloaded models directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LLama.Web/Controllers/AttachmentController.cs

LLama.Web/Pages/Shared/_Layout.cshtml

LLama.Web/Services/AttachmentService.cs

LLama.Web/Controllers/AttachmentController.cs

SignalRT · 2026-03-23T20:20:07Z

One thing that I'm not sure about is the media queue in the SafeMtmdModelHandle. Why is it an implicit queue instead of an explicit parameter passed into the tokenize call?

Alternatively, if it is necessary for some reason, could it be moved up one layer into the MtmdModel, instead of SafeModelModelHandle? That way the SafeHandle remains a minimal wrapper around llama.cpp, with additional behaviour added for convenience at the higher level wrapper.

That's a convenience API.

So my preference would be:

Keep explicit media passing as the primary API.
Treat the implicit queue as optional convenience only.
Move that convenience up out of SafeMtmdModelHandle if we keep it at all.

Keep explicit media passing as the primary API. Treat the implicit queue as optional convenience only. Move that convenience up out of SafeMtmdModelHandle

SignalRT added 2 commits January 19, 2026 23:45

Improve LLama.Web

6859e57

Initial version

Add Missing Files

466a8cb

SignalRT mentioned this pull request Feb 20, 2026

Explanations about mtmd are needed (critical problem found) #1337

Closed

SignalRT added 2 commits March 14, 2026 13:23

Merge branch 'SciSharp:master' into WebReview

d6d0da8

SignalRT mentioned this pull request Mar 15, 2026

[BUG]: InteractiveExecutor when using MTMD with a limited context size, a NoKvSlot error occurs #1355

Open

SignalRT added 3 commits March 20, 2026 22:01

Clean up + Documentation

9e2c72f

Some cleanup and change documentation. only mtmd doc update. I think we should regererate all doc, but I'm not sure

Solve some issues in the WEB

c2170b0

Stop and load the model on change Solve issue with the ENTER

Add log on attachment management

68622e8

SignalRT marked this pull request as ready for review March 20, 2026 23:18

martindevans requested a review from Copilot March 20, 2026 23:38

Copilot started reviewing on behalf of martindevans March 20, 2026 23:39 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

SignalRT added 2 commits March 23, 2026 21:45

Move that convenience up out of SafeMtmdModelHandle

6670e77

Keep explicit media passing as the primary API. Treat the implicit queue as optional convenience only. Move that convenience up out of SafeMtmdModelHandle

Solve Copilot comments

b34cf13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316

End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316
SignalRT wants to merge 9 commits intoSciSharp:masterfrom
SignalRT:WebReview

SignalRT commented Jan 19, 2026 •

edited

Loading

Uh oh!

martindevans commented Mar 20, 2026 •

edited

Loading

Uh oh!

martindevans commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SignalRT commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SignalRT commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindevans commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindevans commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SignalRT commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SignalRT commented Jan 19, 2026 •

edited

Loading

martindevans commented Mar 20, 2026 •

edited

Loading