A minimal, zero-dependency Agent Service built with the pure Python standard library. It bootstraps coding agents from project context and dynamically connects LLMs, tools, prompt templates, and subagents at runtime — no static agent definitions required.
- Zero third-party dependencies — core runtime uses only Python standard library
- Self-bootstrapping agents — bootstrap runnable coding agents from project context, model/tool configuration, prompts, skills, and session state
- Multi-protocol support — OpenAI-compatible API, Ollama native
/api/chat, and Anthropic Messages API - Three tool types — in-process Function tools, MCP (Model Context Protocol) tools, and Skills
- Direct MCP/function tool invocation — bypass the LLM and call MCP/function tools directly, 100% reliability; optionally return parsed JSON with
format: "json" - Skill progressive disclosure — first round exposes only skill summary; full
SKILL.mdis injected only when the model selects the skill - Streaming inference — real-time token streaming with thinking/reasoning content support
- Prompt template inference — user messages can reference a named template by ID;
{{placeholder}}variables are resolved at runtime from the request'sargumentsdict, enabling dynamic prompt adjustment and model/tool-agnostic parameterization without redeploying - Multi-agent collaboration — delegate subtasks to independent Subagents via the built-in
delegatetool; each Subagent runs with its own model and toolset, returning results to the parent agent. Supports streaming output, nested delegation, and automatic session persistence - Agent management — save current model, tools, and system prompt configurations as reusable Agents; quickly switch between saved Agents in the chat interface
- Web UI management console — Svelte 5 SPA for managing models, tools, prompt templates, agents, and chat
- HTTP API server — lightweight REST API built on
http.server, no FastAPI/uvicorn needed - Multimodal — supports image (base64) and audio inputs for VLM models
- Multi-task concurrent conversations with real-time status tracking — support multiple simultaneous chat sessions with independent streaming states; real-time session status updates via SSE (streaming, success, error, unread); automatic read status management based on user scroll position; session title broadcasting
- Workspace file management — full-featured workspace file manager with directory tree navigation, file listing (list/grid views), search (AND/OR modes via ripgrep/grep), rename, duplicate, delete, download, and chunked/parallel upload with pause/resume/retry support; workspace file references (
<file>path</file>) in chat prompts are auto-expanded to inline content or attached images at inference time - Self-install setup script — export the current Agent Service code, built Web UI (
web/dist), and runtime configuration as a self-extracting installer via/v1/setup; install on another machine withcurl -s http://{host}:7988/v1/setup | sh.
runtime/
├── __init__.py # Public API exports
├── models.py # Data models: Message, ModelConfig, ToolConfig, etc.
├── registry.py # ModelRegistry + ToolRegistry
├── protocols.py # Protocol adapters: OpenAI / Ollama / Anthropic
├── runtime.py # Runtime engine: inference + tool call loop + Skill disclosure
├── tools.py # Function tool decorator
├── skill_manager.py # SkillManager: SKILL.md parsing and progressive disclosure
├── mcp_client.py # MCP Client: pure stdlib stdio/SSE implementation (StreamReader limit raised to 100 MB for large payloads)
├── builtin_tools.py # Built-in tools: bash, fetch
├── prompt_template_manager.py # Prompt template CRUD
├── context_manager.py # Context manager: session management, rolling summary, memory extraction
├── env_manager.py # Environment variable manager
├── session_manager.py # Session index manager
├── workspace_manager.py # Workspace file manager: listing, search, upload, file refs
└── server.py # HTTP API server
web/ # Svelte 5 management console SPA
examples/ # Usage examples
1. Python API — Function Tool
import os
from runtime import (
ModelConfig, ModelRegistry,
ToolConfig, ToolRegistry,
Runtime, InferenceRequest, Message,
)
# Register a model (Ollama)
model_registry = ModelRegistry()
model_registry.register(ModelConfig(
model_id="qwen3-14b",
api_base="http://localhost:11434",
model_name="qwen3:14b",
api_protocol="ollama",
))
# Register a function tool
tool_registry = ToolRegistry()
tool_registry.register(
ToolConfig(
tool_id="web_search",
tool_type="function",
name="web_search",
description="Search the internet for information.",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
},
"required": ["query"],
},
),
callable_fn=my_search_function,
)
# Run inference
runtime = Runtime(model_registry=model_registry, tool_registry=tool_registry)
result = runtime.infer(InferenceRequest(
model_id="qwen3-14b",
tool_ids=["web_search"],
messages=[Message(role="user", content="What is the latest Python version?")],
))
print(result.messages[-1].content)2. MCP Tools
Since MCPClientManager is a singleton, any code running in the same process as the server can call a registered MCP tool directly in one line:
from runtime.mcp_client import MCPClientManager
result = MCPClientManager().call_tool("chrome-devtools", "new_page", {"url": "https://example.com"})To use MCP tools with model inference:
from runtime import ModelRegistry, ToolRegistry, Runtime, InferenceRequest
from runtime.mcp_client import MCPClientManager
mcp = MCPClientManager()
mcp.load_config({
"mcpServers": {
"time": {"command": "uvx", "args": ["mcp-server-time"]},
"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]},
}
})
tool_registry = ToolRegistry()
all_tools = []
for server_name in ["time", "fetch"]:
tools = mcp.get_tools(server_name)
for t in tools:
tool_registry.register(t)
all_tools.extend(tools)
runtime = Runtime(model_registry=..., tool_registry=tool_registry, mcp_manager=mcp)
result = runtime.infer(InferenceRequest(
model_id="my-model",
tool_ids=[t.tool_id for t in all_tools],
text="What time is it now?",
))3. Skill with Progressive Disclosure
from runtime import ModelRegistry, ToolRegistry, Runtime, InferenceRequest, SkillManager
tool_registry = ToolRegistry()
skill_manager = SkillManager(tool_registry)
skill_config = skill_manager.load_skill("/path/to/my_skill") # directory with SKILL.md
runtime = Runtime(
model_registry=...,
tool_registry=tool_registry,
skill_manager=skill_manager,
)
# Stream with progressive disclosure
for msg in runtime.infer_stream(InferenceRequest(
model_id="my-model",
tool_ids=[skill_config.tool_id],
text="Help me query the latest data",
max_tool_rounds=20,
)):
if msg.content:
print(msg.content, end="", flush=True)
elif msg.thinking:
print(f"[thinking] {msg.thinking}", end="", flush=True)4. Prompt Template Inference
Prompt templates let you define reusable, parameterized prompts that are resolved at runtime — no redeployment needed when you want to tweak wording or adapt to a different model.
from runtime import Runtime, InferenceRequest, Message
from runtime.prompt_template_manager import PromptTemplateManager
# Create a template with {{placeholder}} variables
pt_manager = PromptTemplateManager()
pt_manager.create(
name="summarize",
content="Please summarize the following text in {{language}}:\n\n{{text}}",
)
runtime = Runtime(
model_registry=...,
tool_registry=...,
prompt_template_manager=pt_manager,
)
# Reference the template by name; supply variables via arguments
result = runtime.infer(InferenceRequest(
model_id="qwen3-14b",
messages=[Message(
role="user",
prompt_template="summarize",
arguments={"language": "English", "text": "...long article..."},
)],
))
print(result.messages[-1].content)The template content is fetched and all {{variable}} placeholders are substituted before the message is sent to the model. Templates can be created, updated, and deleted at runtime via the HTTP API or Web UI — making prompt iteration fast without touching code.
5. Multi-Agent Collaboration (Delegate Tool)
The built-in delegate tool enables hierarchical task delegation. A parent agent can spawn Subagents with different models and toolsets to handle specialized subtasks:
from runtime import Runtime, InferenceRequest, Message
runtime = Runtime(model_registry=..., tool_registry=...)
# The parent agent uses a general-purpose model with the delegate tool
result = runtime.infer(InferenceRequest(
model_id="qwen3-14b",
tool_ids=["delegate", "web_search"], # delegate + other tools
messages=[Message(
role="user",
content="Research the latest AI breakthroughs and write a summary report.",
)],
))
# The model may call delegate() with:
# - model_id: a specialized model (e.g., a coding model for code generation)
# - tool_names: subset of available tools for the Subagent
# - task: the subtask description
# - context: optional system prompt for the SubagentKey features:
- Streaming output: Subagent responses stream back in real-time via SSE
- Nested delegation: Subagents can further delegate to deeper-level agents
- Tool scoping: Parent agent's tools are automatically listed in a Markdown table and injected into the Subagent's system prompt
- Session persistence: Each Subagent session is saved to
~/.agents_runtime/chat_data/{session_id}/sub_{timestamp}/
6. Start the HTTP Server
python app.py # default: 0.0.0.0:7988
python app.py 7988 # custom port
python app.py 0.0.0.0:9000 # custom host and port| Method | Path | Description |
|---|---|---|
| POST | /v1/infer |
Non-streaming inference |
| POST | /v1/infer/stream |
Streaming inference (SSE) |
| POST | /v1/infer/abort |
Abort an active streaming inference by session ID |
| GET | /v1/models |
List registered models |
| POST | /v1/models |
Register a model |
| PUT | /v1/models/{model_id} |
Update a model |
| DELETE | /v1/models/{model_id} |
Delete a model |
| GET | /v1/tools |
List registered tools |
| POST | /v1/tools |
Register a tool |
| PUT | /v1/tools/{tool_id} |
Update a tool |
| DELETE | /v1/tools/{tool_id} |
Delete a tool |
| POST | /v1/tools/call |
Directly call a tool (bypass LLM) |
| POST | /v1/tools/mcp |
Register MCP servers |
| POST | /v1/tools/skill |
Register a skill |
| GET | /v1/mcp-servers |
List registered MCP servers |
| DELETE | /v1/mcp-servers/{server_name} |
Delete an MCP server |
| POST | /v1/sessions/{session_id}/generate-title |
Auto-generate session title |
| POST | /v1/sessions/{session_id}/revoke |
Revoke a session |
| DELETE | /v1/tools/batch |
Batch delete tools |
| GET | /v1/prompt-templates |
List prompt templates |
| POST | /v1/prompt-templates |
Create a prompt template |
| PUT | /v1/prompt-templates/{id} |
Update a prompt template |
| DELETE | /v1/prompt-templates/{id} |
Delete a prompt template |
| GET | /v1/env |
Get environment variables |
| POST | /v1/env |
Set environment variable |
| POST | /v1/env/detect |
Auto-detect environment variables |
| DELETE | /v1/env/{key} |
Delete environment variable |
| GET | /v1/sessions |
List all sessions |
| GET | /v1/sessions/events |
SSE endpoint for real-time session status updates |
| GET | /v1/sessions/{session_id} |
Get session details |
| DELETE | /v1/sessions/{session_id} |
Delete session |
| POST | /v1/sessions/{session_id}/read |
Mark session as read |
| GET | /v1/agents |
List all agents |
| GET | /v1/agents/{agent_id} |
Get a single agent |
| POST | /v1/agents |
Create an agent |
| PUT | /v1/agents/{agent_id} |
Update an agent |
| DELETE | /v1/agents/{agent_id} |
Delete an agent |
| GET | /v1/workspace/list |
List files in a workspace directory (paginated) |
| GET | /v1/workspace/tree |
Get workspace directory tree structure |
| GET | /v1/workspace/children |
List child directories of any path (no workspace restriction) |
| GET | /v1/workspace/search |
Search files in workspace (AND/OR modes) |
| GET | /v1/workspace/content |
Get file content for preview |
| GET | /v1/workspace/download |
Download a file |
| GET | /v1/workspace/thumbnail |
Get image thumbnail |
| POST | /v1/workspace/rename |
Rename a file or directory |
| POST | /v1/workspace/duplicate |
Duplicate a file |
| DELETE | /v1/workspace/delete |
Delete a file or directory |
| POST | /v1/workspace/upload/init |
Initialize a chunked file upload |
| PUT | /v1/workspace/upload/{upload_id}/chunk/{chunk_id} |
Upload a file chunk |
| POST | /v1/workspace/upload/{upload_id}/complete |
Complete a chunked upload |
| DELETE | /v1/workspace/upload/{upload_id} |
Cancel an upload |
Streaming inference request:
{
"model_id": "qwen3-14b",
"tool_ids": ["web_search"],
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Search for the latest AI news."}
],
"stream": true,
"max_tool_rounds": 10,
"session_id": "new"
}Note: The
session_idfield is optional. Use"new"to create a new session, an existing session ID to resume a conversation, or omit it for stateless inference.
The management console is a Svelte 5 SPA located in web/. Build and serve it:
cd web
npm install
npm run buildThe built files in web/dist/ are automatically served by the HTTP server at the root path.
Features:
- Chat with model selection, tool selection, prompt template support, and agent selection
- Multi-task concurrent conversations — each session maintains independent streaming state; switching sessions doesn't interrupt active streams
- Real-time session status indicators in sidebar (streaming, success-unread, error-unread) via SSE
- Automatic read status management — marks session as read when user scrolls to bottom
- Model management (CRUD) — copy an existing model config to quickly create a new one
- Tool management (CRUD)
- Prompt template management with
{{placeholder}}variable support - Agent management — save current configuration as a reusable agent; switch agents in the chat interface
- Markdown rendering with syntax highlighting
- Expandable long JSON string previews that auto-fit the available code block width
- Workspace file manager — directory tree navigation, list/grid views, file search, rename/duplicate/delete, chunked upload with progress tracking, and clipboard paste upload
- Rich text chat input with workspace file reference chips (
<file>path</file>) - Multimodal: image upload and microphone recording
- Dark/light theme, responsive layout
- Resizable sidebar with collapse/expand toggle; width persisted to localStorage
| File | Description |
|---|---|
examples/example_function_register.py |
Register a SearXNG search as a Function Tool; the LLM automatically calls it to answer queries |
examples/example_mcp_ollama.py |
Connect Ollama (qwen3:14b) with MCP time and fetch servers; supports --stream flag |
examples/example_mcp_openai.py |
Same as above but using the OpenAI-compatible protocol; easily switch to OpenAI, vLLM, LiteLLM, etc. |
examples/example_skill.py |
Load a Skill from a directory and run streaming inference with progressive SKILL.md disclosure |
examples/example_vlm_tool_call.py |
VLM reads an image, understands the instruction in it, and calls built-in bash/fetch tools to execute |
examples/example_browser_use.py |
Client/server split: server registers chrome-devtools MCP; client calls /v1/tools/call to open a page directly, then /v1/infer/stream to let the LLM inspect and interact with the browser |
examples/example_stream_as_infer.py |
Use /v1/infer/stream (SSE) to receive streaming tokens and reassemble them into the same JSON structure as /v1/infer — avoids idle-timeout disconnections on long-running inference |
examples/example_multi_agents.py |
Multi-Agent collaboration: PlanAgent delegates tasks to MainAgent via the delegate tool. Demonstrates prompt templates, MCP tools, and hierarchical task delegation with automatic TOOLS markdown generation |
All configuration is persisted to ~/.agents_runtime/:
~/.agents_runtime/
├── models.json
├── tools.json
├── mcp_servers.json
├── prompt_templates.json
├── env.json
├── agents/ # Agent data directory
│ └── {agent_id}.json # Individual agent configuration files
└── chat_data/ # Session data directory
└── {session_id}/
├── conversation.json
├── summary.md
└── memory.md
env.json is a flat key-value map of environment variables loaded at server startup, useful for injecting API keys and other secrets without modifying the system environment:
{
"OPENAI_API_KEY": "sk-...",
"SOME_SERVICE_TOKEN": "abc123"
}- Python 3.10+
- No third-party Python packages required for the core runtime
- For the web UI: Node.js 18+ and npm
This project was born out of frustrations encountered while using Qwen-Agent. Several pain points drove the decision to build a new Agent Service from scratch:
- MCP tools are registered per-agent, so different agents each spin up their own local MCP process instances — unnecessary overhead since most MCP servers can be shared as stateless services.
- The combinatorial explosion of models × tools makes static pre-definitions impractical.
- Function tools cannot be dynamically defined and loaded at runtime.
- MCP/function tools cannot be called directly — every invocation must go through the LLM, making deterministic automation unreliable.
- No support for Skills.
- Hard-coded OpenAI protocol causes abnormal inference behavior when connecting to local Ollama models for VLM tasks.
- The Web UI and a clean HTTP server API cannot run in the same process simultaneously.
- Models, tools, and prompt templates need to be added, updated, and removed at runtime — especially prompt templates, which require frequent iteration. The author added CRUD support to the official Qwen-Agent GUI (fork here), but the Gradio-based UI is sluggish and the experience is poor.
These issues made building a dedicated Agent Service worthwhile. Leveraging the power of modern AI-assisted development, this project was built from scratch to address all of the above. It intentionally avoids introducing third-party dependencies so it can be embedded into any existing project — usable as either an SDK or a standalone HTTP service.
The project is under active development. Next steps include enhancing the multi-agent collaboration framework with more orchestration patterns and the closely related topic of secure user data management.
MIT License — see LICENSE
一个极简、零第三方依赖的 Agent Service,完全基于 Python 标准库构建。它可基于项目上下文自举编码 Agent,并在运行期动态连接大模型、工具、提示词模板与 Subagent,无需预定义静态 Agent。
- 零第三方依赖 — 核心运行时仅使用 Python 标准库
- 自举式 Agent — 基于项目上下文、模型/工具配置、提示词、Skill 和会话状态自举可运行的编码 Agent
- 多协议支持 — OpenAI 兼容 API、Ollama 原生
/api/chat和 Anthropic Messages API - 三种工具类型 — 进程内 Function 工具、MCP(模型上下文协议)工具、Skill 技能
- MCP/function工具直接调用 — 可绕过大模型直接调用MCP/function工具,可靠性100%;支持通过
format: "json"返回解析后的 JSON - Skill 渐进披露 — 第一轮推理仅暴露技能摘要,大模型选择后才注入完整
SKILL.md - 流式推理 — 实时 token 流式输出,支持 thinking/reasoning 内容
- 提示词模板推理 — 用户消息可通过模板 ID 引用命名模板,
{{占位符}}变量在推理时从请求的arguments字典动态替换,无需重新部署即可调整提示词,并支持参数化以适应不同模型和工具 - 多智能体协作 — 通过内置
delegate工具将子任务委派给独立的 Subagent 执行;每个 Subagent 可使用不同的模型和工具集,完成后将结果返回给父 Agent。支持流式输出、嵌套委派和自动会话持久化 - 智能体管理 — 将当前模型、工具和系统提示词配置保存为可复用的智能体;在聊天界面中快速切换已保存的智能体
- Web UI 管理控制台 — Svelte 5 SPA,支持模型、工具、提示词模板、智能体管理和对话
- HTTP API 服务 — 基于
http.server的轻量 REST API,无需 FastAPI/uvicorn - 多模态 — 支持图片(base64)和音频输入,适配 VLM 模型
- 多任务并发对话及实时状态跟踪 — 支持多个聊天会话同时进行,每个会话独立管理流式状态;通过SSE实时更新会话状态(流式中、成功、错误、未读);基于用户滚动位置自动管理已读状态;会话标题实时广播更新
- 工作区文件管理 — 完整的工作区文件管理器,支持目录树导航、文件列表(列表/网格视图)、搜索(AND/OR模式,基于ripgrep/grep)、重命名、复制、删除、下载及分块/并行上传(支持暂停/恢复/重试);对话中的工作区文件引用(
<file>路径</file>)在推理时自动展开为内联内容或附加图片 - 自安装脚本 — 通过
/v1/setup将当前 Agent Service 源码、已编译 Web UI(web/dist)和运行时配置导出为自解压安装脚本;可在另一台机器上使用curl -s http://{host}:7988/v1/setup | sh安装。
runtime/
├── __init__.py # 公开 API 导出
├── models.py # 数据模型:Message、ModelConfig、ToolConfig 等
├── registry.py # ModelRegistry + ToolRegistry
├── protocols.py # 协议适配器:OpenAI / Ollama / Anthropic
├── runtime.py # 运行时引擎:推理 + 工具调用循环 + Skill 渐进披露
├── tools.py # Function 工具装饰器
├── skill_manager.py # SkillManager:SKILL.md 解析与渐进披露管理
├── mcp_client.py # MCP Client:纯标准库 stdio/SSE 实现(StreamReader 上限扩展至 100 MB,支持大数据量返回)
├── builtin_tools.py # 内置工具:bash、fetch
├── prompt_template_manager.py # 提示词模板 CRUD
├── context_manager.py # 上下文管理器:会话管理、滚动摘要、记忆提取
├── env_manager.py # 环境变量管理器
├── session_manager.py # 会话索引管理器
├── workspace_manager.py # 工作区文件管理器:文件列表、搜索、上传、文件引用展开
└── server.py # HTTP API 服务器
web/ # Svelte 5 管理控制台 SPA
examples/ # 使用示例
1. Python API — Function 工具
import os
from runtime import (
ModelConfig, ModelRegistry,
ToolConfig, ToolRegistry,
Runtime, InferenceRequest, Message,
)
# 注册模型(Ollama)
model_registry = ModelRegistry()
model_registry.register(ModelConfig(
model_id="qwen3-14b",
api_base="http://localhost:11434",
model_name="qwen3:14b",
api_protocol="ollama",
))
# 注册 Function 工具
tool_registry = ToolRegistry()
tool_registry.register(
ToolConfig(
tool_id="web_search",
tool_type="function",
name="web_search",
description="通过互联网搜索引擎搜索信息。",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "搜索关键词"},
},
"required": ["query"],
},
),
callable_fn=my_search_function,
)
# 发起推理
runtime = Runtime(model_registry=model_registry, tool_registry=tool_registry)
result = runtime.infer(InferenceRequest(
model_id="qwen3-14b",
tool_ids=["web_search"],
messages=[Message(role="user", content="Python 最新版本是什么?")],
))
print(result.messages[-1].content)2. MCP 工具
MCPClientManager 是单例,在注册了 MCP server 的进程内,可以一句话直接调用工具,无需持有 server 或 runtime 的引用:
from runtime.mcp_client import MCPClientManager
result = MCPClientManager().call_tool("chrome-devtools", "new_page", {"url": "https://example.com"})配合模型推理使用:
from runtime import ModelRegistry, ToolRegistry, Runtime, InferenceRequest
from runtime.mcp_client import MCPClientManager
mcp = MCPClientManager()
mcp.load_config({
"mcpServers": {
"time": {"command": "uvx", "args": ["mcp-server-time"]},
"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]},
}
})
tool_registry = ToolRegistry()
all_tools = []
for server_name in ["time", "fetch"]:
tools = mcp.get_tools(server_name)
for t in tools:
tool_registry.register(t)
all_tools.extend(tools)
runtime = Runtime(model_registry=..., tool_registry=tool_registry, mcp_manager=mcp)
result = runtime.infer(InferenceRequest(
model_id="my-model",
tool_ids=[t.tool_id for t in all_tools],
text="现在几点了?",
))3. Skill 渐进披露
from runtime import ModelRegistry, ToolRegistry, Runtime, InferenceRequest, SkillManager
tool_registry = ToolRegistry()
skill_manager = SkillManager(tool_registry)
skill_config = skill_manager.load_skill("/path/to/my_skill") # 包含 SKILL.md 的目录
runtime = Runtime(
model_registry=...,
tool_registry=tool_registry,
skill_manager=skill_manager,
)
# 流式推理 + 渐进披露
for msg in runtime.infer_stream(InferenceRequest(
model_id="my-model",
tool_ids=[skill_config.tool_id],
text="帮我查一下最近的数据",
max_tool_rounds=20,
)):
if msg.content:
print(msg.content, end="", flush=True)
elif msg.thinking:
print(f"[思考] {msg.thinking}", end="", flush=True)4. 提示词模板推理
提示词模板支持运行时动态调整提示词,无需重新部署代码。模板内容可通过 Web UI 或 HTTP API 随时增删改,{{占位符}} 变量在推理时从请求参数中替换,使同一套推理逻辑能适配不同模型、工具和业务场景。
from runtime import Runtime, InferenceRequest, Message
from runtime.prompt_template_manager import PromptTemplateManager
# 创建带占位符的模板
pt_manager = PromptTemplateManager()
pt_manager.create(
name="summarize",
content="请用{{language}}对以下内容进行摘要:\n\n{{text}}",
)
runtime = Runtime(
model_registry=...,
tool_registry=...,
prompt_template_manager=pt_manager,
)
# 通过模板名引用,arguments 提供占位符的值
result = runtime.infer(InferenceRequest(
model_id="qwen3-14b",
messages=[Message(
role="user",
prompt_template="summarize",
arguments={"language": "中文", "text": "...长文内容..."},
)],
))
print(result.messages[-1].content)5. 多智能体协作(Delegate 工具)
内置 delegate 工具支持层级化任务委派。父 Agent 可以生成使用不同模型和工具集的 Subagent 来处理专门的子任务:
from runtime import Runtime, InferenceRequest, Message
runtime = Runtime(model_registry=..., tool_registry=...)
# 父 Agent 使用通用模型,并启用 delegate 工具
result = runtime.infer(InferenceRequest(
model_id="qwen3-14b",
tool_ids=["delegate", "web_search"], # delegate + 其他工具
messages=[Message(
role="user",
content="研究最新的 AI 突破并撰写一份总结报告。",
)],
))
# 模型可能会调用 delegate(),参数包括:
# - model_id: 专用模型(如代码生成模型)
# - tool_names: Subagent 可用的工具子集
# - task: 子任务描述
# - context: 可选的 Subagent 系统提示词主要特性:
- 流式输出:Subagent 响应通过 SSE 实时流式返回
- 嵌套委派:Subagent 可继续向更深层级委派任务
- 工具作用域:父 Agent 的工具自动生成 Markdown 表格并注入到 Subagent 的系统提示词
- 会话持久化:每个 Subagent 会话保存到
~/.agents_runtime/chat_data/{session_id}/sub_{timestamp}/
6. 启动 HTTP 服务
python app.py # 默认:0.0.0.0:7988
python app.py 7988 # 自定义端口
python app.py 0.0.0.0:9000 # 自定义主机和端口| 方法 | 路径 | 说明 |
|---|---|---|
| POST | /v1/infer |
非流式推理 |
| POST | /v1/infer/stream |
流式推理(SSE) |
| POST | /v1/infer/abort |
中止指定会话的流式推理 |
| GET | /v1/models |
获取模型列表 |
| POST | /v1/models |
注册模型 |
| PUT | /v1/models/{model_id} |
更新模型 |
| DELETE | /v1/models/{model_id} |
删除模型 |
| GET | /v1/tools |
获取工具列表 |
| POST | /v1/tools |
注册工具 |
| PUT | /v1/tools/{tool_id} |
更新工具 |
| DELETE | /v1/tools/{tool_id} |
删除工具 |
| POST | /v1/tools/call |
直接调用工具(绕过大模型) |
| POST | /v1/tools/mcp |
注册 MCP 服务器 |
| POST | /v1/tools/skill |
注册 Skill |
| GET | /v1/mcp-servers |
列出已注册的 MCP servers |
| DELETE | /v1/mcp-servers/{server_name} |
删除一个 MCP server |
| POST | /v1/sessions/{session_id}/generate-title |
为会话自动生成标题 |
| POST | /v1/sessions/{session_id}/revoke |
撤销/取消一个会话 |
| DELETE | /v1/tools/batch |
批量删除工具 |
| GET | /v1/prompt-templates |
获取提示词模板列表 |
| POST | /v1/prompt-templates |
创建提示词模板 |
| PUT | /v1/prompt-templates/{id} |
更新提示词模板 |
| DELETE | /v1/prompt-templates/{id} |
删除提示词模板 |
| GET | /v1/env |
获取环境变量 |
| POST | /v1/env |
设置环境变量 |
| POST | /v1/env/detect |
自动检测环境变量 |
| DELETE | /v1/env/{key} |
删除环境变量 |
| GET | /v1/sessions |
列出所有会话 |
| GET | /v1/sessions/events |
SSE端点,实时推送会话状态更新 |
| GET | /v1/sessions/{session_id} |
获取会话详情 |
| DELETE | /v1/sessions/{session_id} |
删除会话 |
| POST | /v1/sessions/{session_id}/read |
标记会话为已读 |
| GET | /v1/agents |
列出所有智能体 |
| GET | /v1/agents/{agent_id} |
获取单个智能体 |
| POST | /v1/agents |
创建智能体 |
| PUT | /v1/agents/{agent_id} |
更新智能体 |
| DELETE | /v1/agents/{agent_id} |
删除智能体 |
| GET | /v1/workspace/list |
列出工作区目录中的文件(分页) |
| GET | /v1/workspace/tree |
获取工作区目录树结构 |
| GET | /v1/workspace/children |
列出任意路径的子目录(不限工作区) |
| GET | /v1/workspace/search |
搜索工作区文件(AND/OR模式) |
| GET | /v1/workspace/content |
获取文件内容用于预览 |
| GET | /v1/workspace/download |
下载文件 |
| GET | /v1/workspace/thumbnail |
获取图片缩略图 |
| POST | /v1/workspace/rename |
重命名文件或目录 |
| POST | /v1/workspace/duplicate |
复制文件 |
| DELETE | /v1/workspace/delete |
删除文件或目录 |
| POST | /v1/workspace/upload/init |
初始化分块文件上传 |
| PUT | /v1/workspace/upload/{upload_id}/chunk/{chunk_id} |
上传文件分块 |
| POST | /v1/workspace/upload/{upload_id}/complete |
完成分块上传 |
| DELETE | /v1/workspace/upload/{upload_id} |
取消上传 |
流式推理请求示例:
{
"model_id": "qwen3-14b",
"tool_ids": ["web_search"],
"messages": [
{"role": "system", "content": "你是一个智能助手。"},
{"role": "user", "content": "搜索最新的 AI 新闻。"}
],
"stream": true,
"max_tool_rounds": 10,
"session_id": "new"
}注意:
session_id字段为可选参数。传入"new"创建新会话,传入已有会话 ID 恢复对话,或省略该字段进行无状态推理。
管理控制台是一个 Svelte 5 SPA,位于 web/ 目录。构建方式:
cd web
npm install
npm run build构建产物 web/dist/ 会由 HTTP 服务器自动在根路径提供服务。
功能包括:
- 对话页面:模型选择、工具选择、提示词模板(支持
{{占位符}}变量)、智能体选择 - 多任务并发对话 — 每个会话独立维护流式状态,切换会话不影响正在进行的推理
- 侧边栏实时会话状态指示(流式中、成功未读、错误未读),通过SSE推送
- 自动已读状态管理 — 用户滚动到底部时自动标记会话为已读
- 模型管理(增删改查)— 支持复制现有模型配置,快速创建新模型
- 工具管理(增删改查)
- 提示词模板管理
- 智能体管理 — 将当前配置保存为可复用的智能体;在对话中快速切换智能体
- Markdown 渲染与语法高亮
- JSON 长字符串可折叠预览,并自动适配代码块可用宽度
- 工作区文件管理器 — 目录树导航、列表/网格视图、文件搜索、重命名/复制/删除、分块上传及进度跟踪、剪贴板粘贴上传
- 富文本聊天输入框,支持工作区文件引用标签(
<file>路径</file>) - 多模态:图片上传与麦克风录音
- 深色/浅色主题,响应式布局
- 侧边栏支持拖拽调整宽度与折叠/展开,宽度自动持久化到 localStorage
| 文件 | 说明 |
|---|---|
examples/example_function_register.py |
将 SearXNG 搜索封装为 Function Tool,大模型自动调用搜索工具回答问题 |
examples/example_mcp_ollama.py |
Ollama(qwen3:14b)+ MCP time/fetch 工具,支持 --stream 流式输出 |
examples/example_mcp_openai.py |
同上,使用 OpenAI 兼容协议,可轻松切换 OpenAI、vLLM、LiteLLM 等服务 |
examples/example_skill.py |
从目录加载 Skill,流式推理演示 SKILL.md 渐进披露全流程 |
examples/example_vlm_tool_call.py |
VLM 读取图片中的文字指令,自动调用内置 bash/fetch 工具执行 |
examples/example_browser_use.py |
客户端/服务端分离:Server 注册 chrome-devtools MCP;Client 通过 /v1/tools/call 直接打开页面,再通过 /v1/infer/stream 让大模型操控浏览器 |
examples/example_stream_as_infer.py |
通过 /v1/infer/stream(SSE)接收流式 token,在本地拼装成与 /v1/infer 完全一致的 JSON 结果,彻底规避长时推理的网关/代理 idle timeout 断连问题;支持 --compare 参数同时调用两个接口对比结果 |
examples/example_multi_agents.py |
多 Agent 协作:PlanAgent 通过 delegate 工具将任务委派给 MainAgent 执行。演示提示词模板、MCP 工具、层级化任务委派,以及自动生成 TOOLS markdown 表格 |
所有配置持久化到 ~/.agents_runtime/:
~/.agents_runtime/
├── models.json
├── tools.json
├── mcp_servers.json
├── prompt_templates.json
├── env.json
├── agents/ # 智能体数据目录
│ └── {agent_id}.json # 智能体配置文件
└── chat_data/ # 会话数据目录
└── {session_id}/
├── conversation.json
├── summary.md
└── memory.md
env.json 是一个扁平的键值映射,服务启动时自动加载为环境变量,适合注入 API Key 等敏感配置,无需修改系统环境:
{
"OPENAI_API_KEY": "sk-...",
"SOME_SERVICE_TOKEN": "abc123"
}- Python 3.10+
- 核心运行时无需任何第三方 Python 包
- Web UI 编译需要 Node.js 18+ 和 npm
本项目源于在使用 Qwen-Agent 过程中遇到的一系列痛点,促使作者决定从零构建一个 Agent Service:
- MCP 工具注册在 Agent 内部,不同 Agent 会重复启动各自的 MCP 本地进程实例,而大多数 MCP 服务完全可以作为无状态服务共享使用,这种重复启动是不必要的开销。
- 模型与工具的组合数量庞大,预先静态定义远远不够用。
- Function 工具无法在运行期动态定义和加载。
- MCP/function 工具不能绕过大模型直接调用,所有调用都必须经过大模型,确定性自动化场景下可靠性差。
- 不支持 Skill 技能。
- 固定使用 OpenAI 协议,对接本地 Ollama 模型时 VLM 推理效果异常。
- Web GUI 与简洁的 HTTP Server 接口无法在同一进程中同时提供服务。
- 模型、工具和提示词模板需要在运行期间增删改查,尤其是提示词模板需要反复调整。作者曾为官方 GUI 增加了相关 CRUD 功能(fork 地址),但 Gradio 制作的 GUI 响应迟缓,体验较差。
基于以上问题,构建一个专门的 Agent Service 就有了必要性。借助现代 AI 辅助开发的强大能力,本项目从零开始开发,解决了上述所有问题。它有意避免引入第三方依赖,以便嵌入到任何现有项目中使用——既可作为 SDK 引入,也可作为独立 HTTP 服务运行。
此项目仍在积极迭代中。下一步计划完善多 Agent 协同工作框架(增加更多编排模式),以及与之密切相关的用户数据安全管理机制。
MIT License — 详见 LICENSE



