End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316
End‑to‑end multimodal chat with document parsing, media uploads, audio recording, and streaming markdown rendering#1316SignalRT wants to merge 9 commits intoSciSharp:masterfrom
Conversation
Initial version
- Reworked MTMD prompt handling to preserve text/media ordering and evaluate multimodal input incrementally. - Disabled unsupported multimodal features such as session persistence and context shifting. - Added standalone MTMD media loading and synchronized MTMD weight operations. - Updated MTMD example and tests to cover prompt ordering, guards, and opt-in NoCI execution. - Fixed web model/session defaults for multimodal models, including template-derived stop markers and unspecified pooling. - Improved LLama.Web audio attachment/recording flow, Qwen audio prompt handling, and chat composer UX. - Removed the broken browser script include and added a safe markdown fallback.
Some cleanup and change documentation. only mtmd doc update. I think we should regererate all doc, but I'm not sure
Stop and load the model on change Solve issue with the ENTER
|
One thing that I'm not sure about is the media queue in the Alternatively, if it is necessary for some reason, could it be moved up one layer into the MtmdModel, instead of SafeModelModelHandle? That way the SafeHandle remains a minimal wrapper around llama.cpp, with additional behaviour added for convenience at the higher level wrapper. |
|
Other than that one comment, looks good to me! |
There was a problem hiding this comment.
Pull request overview
This PR modernizes LLamaSharp’s multimodal support by migrating from LLava to MTMD, and substantially upgrades LLama.Web to support end-to-end multimodal chat (attachments, uploads, streaming markdown rendering) plus automatic model downloads with progress reporting.
Changes:
- Replace LLava types/APIs/docs with MTMD equivalents (
MtmdWeights,SafeMtmd*handles, executor multimodal plumbing). - Add LLama.Web pipeline: attachment upload + extraction (PDF/DOCX), media embeddings (image/audio), streaming UI rendering with markdown/mermaid, and capability-aware behavior.
- Add model auto-download service with SignalR progress updates and corresponding UI/status wiring.
Reviewed changes
Copilot reviewed 77 out of 78 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| mkdocs.yml | Updates documentation navigation to MTMD docs (removes LLava entries). |
| docs/xmldocs/llama.statelessexecutor.md | Docs update for MTMD properties (ClipModel, Embeds). |
| docs/xmldocs/llama.native.safemtmdmodelhandle.md | New generated docs for MTMD safe handle API. |
| docs/xmldocs/llama.native.safemtmdinputchunks.md | New generated docs for MTMD input chunks wrapper. |
| docs/xmldocs/llama.native.safemtmdinputchunk.md | New generated docs for MTMD input chunk wrapper. |
| docs/xmldocs/llama.native.safemtmdembed.md | New generated docs for MTMD embed wrapper. |
| docs/xmldocs/llama.native.nativelibraryconfigcontainer.md | Docs: rename LLava params to MTMD, fix AVX wording, update DryRun signature docs. |
| docs/xmldocs/llama.native.mtmdcontextparams.md | New generated docs for MTMD context params. |
| docs/xmldocs/llama.mtmdweights.md | New generated docs for MtmdWeights. |
| docs/xmldocs/llama.interactiveexecutor.md | Docs update: MTMD fields, cancellation tokens, antiprompt processor, state limitations, embeds. |
| docs/xmldocs/llama.instructexecutor.md | Docs update mirroring interactive executor changes for MTMD + cancellation tokens. |
| docs/xmldocs/llama.batched.conversation.md | Docs update: add MTMD prompt overloads and remove LLava image embed overload. |
| docs/xmldocs/llama.batched.batchedexecutor.md | Docs update: add MTMD clip model support. |
| docs/xmldocs/llama.abstractions.illamaexecutor.md | Docs update: ClipModel/Embeds now MTMD types. |
| docs/xmldocs/index.md | Docs index updated for MTMD types and removes LLava references. |
| docs/Tutorials/NativeLibraryConfig.md | Tutorial updated for MTMD library configuration. |
| docs/Tutorials/Executors.md | Tutorial updated for MTMD fields + state persistence limitations for multimodal executors. |
| docs/QuickStart.md | QuickStart updated with MTMD example and embed loading flow. |
| docs/Examples/MtmdInteractiveModeExecute.md | Example docs updated from SafeMtmdWeights/single-brace paths to MtmdWeights/double-brace paths. |
| LLama/Native/SafeMtmdModelHandle.cs | Adds standalone embed creation APIs and refactors load methods to use them. |
| LLama/Native/Load/NativeLibraryConfig.cs | Fixes DryRun out params initialization/behavior and documents outputs. |
| LLama/MtmdWeights.cs | Adds locking and new standalone media load APIs; wraps tokenize/eval calls for thread safety. |
| LLama/LLamaInteractExecutor.cs | MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes. |
| LLama/LLamaInstructExecutor.cs | MTMD execution changes, state persistence rejection for multimodal, pending prompt logic changes. |
| LLama/ChatSession.cs | Blocks session persistence APIs for multimodal sessions, refactors stateful executor access. |
| LLama/AntipromptProcessor.cs | Uses StringComparison.Ordinal for antiprompt matching. |
| LLama.Web/wwwroot/js/sessionConnectionChat.js | Adds attachment uploads, download status UI, and streaming markdown rendering. |
| LLama.Web/libman.json | Adds offline web libs for markdown rendering (markdown-it plugins, katex, mermaid). |
| LLama.Web/appsettings.json | Updates model list to downloadable models and adds mmproj paths/URLs + new defaults. |
| LLama.Web/_Imports.razor | New shared imports for Blazor components/services. |
| LLama.Web/Shared/MainLayout.razor | Adds Blazor main layout wrapper. |
| LLama.Web/Services/ModelSessionService.cs | Adds attachment-aware prompt preparation + embeds, capabilities API, history handling. |
| LLama.Web/Services/ModelService.cs | Integrates model download readiness checks and normalizes UBatchSize/BatchSize. |
| LLama.Web/Services/ModelLoaderService.cs | Starts model downloads at startup and loads models after downloads complete. |
| LLama.Web/Services/ModelDownloadService.cs | New background download service with SignalR progress + local storage management. |
| LLama.Web/Services/IModelSessionService.cs | Updates Infer API to PromptRequest and adds capabilities method. |
| LLama.Web/Services/IModelService.cs | Documentation/wording cleanups. |
| LLama.Web/Services/IModelDownloadService.cs | New interface for model download management. |
| LLama.Web/Services/IAttachmentService.cs | New interface for attachment storage/extraction lifecycle. |
| LLama.Web/Services/AttachmentService.cs | New attachment pipeline: validation, storage, PDF/DOCX extraction, cleanup. |
| LLama.Web/README.md | Documents local asset storage, LibMan restore, and attachment/model download locations. |
| LLama.Web/Program.cs | Adds Blazor + controllers, registers new services, maps endpoints, logs storage paths. |
| LLama.Web/Pages/_Host.cshtml | Adds Blazor server host page. |
| LLama.Web/Pages/Shared/_Parameters.cshtml | Updates parameter binding to sampling pipeline fields. |
| LLama.Web/Pages/Shared/_Layout.cshtml | Updates layout to load offline markdown/diagram libs and Blazor runtime. |
| LLama.Web/Pages/Shared/_ChatTemplates.cshtml | Templates updated for markdown styling + attachment display. |
| LLama.Web/Pages/Index.cshtml.cs | Removed legacy Razor Pages index model. |
| LLama.Web/Pages/Index.cshtml | Removed legacy Razor Pages chat UI. |
| LLama.Web/Models/StorageInfo.cs | New model for storage path UI info. |
| LLama.Web/Models/PromptRequest.cs | New prompt request model including attachment IDs. |
| LLama.Web/Models/ModelSession.cs | Major session refactor: template-based prompts, history, multimodal capability exposure, logging. |
| LLama.Web/Models/ModelDownloadStatus.cs | New download snapshot/progress models and enums. |
| LLama.Web/Models/ModelCapabilities.cs | New model capability DTO. |
| LLama.Web/Models/MemoryBrowserFile.cs | In-memory IBrowserFile implementation. |
| LLama.Web/Models/LLamaModel.cs | Loads MTMD mmproj weights when configured and disposes them. |
| LLama.Web/Models/AttachmentInfo.cs | New attachment metadata + upload result models. |
| LLama.Web/LLama.Web.csproj | Adds LibMan build integration and PdfPig dependency. |
| LLama.Web/Hubs/SessionConnectionHub.cs | Adds download snapshot + storage info broadcasts; prompt now accepts PromptRequest; cleans up attachments on disconnect. |
| LLama.Web/Hubs/ISessionClient.cs | Adds SignalR client methods for download progress/snapshots and storage info. |
| LLama.Web/Extensions.cs | Comment/formatting cleanups for CSV/list helpers. |
| LLama.Web/Controllers/AttachmentController.cs | New attachments API endpoints for upload + download. |
| LLama.Web/Common/ModelOptions.cs | Adds model/mmproj download URL fields and default pooling type. |
| LLama.Web/Common/ModelLoadType.cs | Comment cleanup. |
| LLama.Web/Async/AsyncLock.cs | Comment cleanup. |
| LLama.Web/Async/AsyncGuard.cs | Comment cleanup. |
| LLama.Web/App.razor | New Blazor router app shell. |
| LLama.Unittest/NativeLibraryConfigContainerTests.cs | Adds unit test to ensure DryRun preserves loaded library outputs. |
| LLama.Unittest/MtmdWeightsTests.cs | Refactors MTMD tests to use fixture/collection and context-per-test. |
| LLama.Unittest/MtmdNoCiCollection.cs | Adds shared MTMD fixture and disables parallelization for these tests. |
| LLama.Unittest/MtmdExecutorTests.cs | Refactors and adds MTMD executor behavior tests (prompt ordering, chunk handling). |
| LLama.Unittest/MtmdContextGuardTests.cs | Adds MTMD context guard + “no state/session persistence” behavior tests. |
| LLama.Examples/Examples/MtmdInteractiveModeExecute.cs | Updates sample for MTMD standalone embed loads and template marker antiprompt handling. |
| .gitignore | Ignores LLama.Web offline libs and downloaded models directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
That's a convenience API. So my preference would be:
|
Keep explicit media passing as the primary API. Treat the implicit queue as optional convenience only. Move that convenience up out of SafeMtmdModelHandle
Summary:
This PR delivers a full multimodal chat pipeline in LLama.Web: PDF and Word document ingestion with text extraction, image and audio uploads, native in‑browser audio recording (preview/attach/discard), plus streaming response
rendering with Markdown support.
Key Features:
Implementation Highlights
Capability to upload images and ask about the images
Model auto-download + Capability to upload files and ask about the files
