Skip to content
#

multimodal-agent

Here are 14 public repositories matching this topic...

Claude Code in Docker. Drop-in OpenAI-compatible API, MCP server, Telegram bot, and CLI — five interfaces, one image. Persistent sessions, file ops, always-on skill injection, and a full dev toolchain (Go, Python, Node, K8s, Terraform, databases) or a minimal image with just the basics.

  • Updated Apr 16, 2026
  • Shell

Build an end-to-end system that ingests inventory report PDFs/images, runs OCR to normalize and extract tabular data, stores the cleaned dataset, and exposes a secure, conversational agent that can answer business queries over the data (aggregation, filtering, joins, trends), returning tables, charts, and exportable results.

  • Updated Dec 5, 2025
  • Python

Improve this page

Add a description, image, and links to the multimodal-agent topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-agent topic, visit your repo's landing page and select "manage topics."

Learn more