Agent Service

English

A minimal, zero-dependency Agent Service built with the pure Python standard library. It bootstraps coding agents from project context and dynamically connects LLMs, tools, prompt templates, and subagents at runtime — no static agent definitions required.

Features

Zero third-party dependencies — core runtime uses only Python standard library
Self-bootstrapping agents — bootstrap runnable coding agents from project context, model/tool configuration, prompts, skills, and session state
Multi-protocol support — OpenAI-compatible API, Ollama native /api/chat, and Anthropic Messages API
Three tool types — in-process Function tools, MCP (Model Context Protocol) tools, and Skills
Direct MCP/function tool invocation — bypass the LLM and call MCP/function tools directly, 100% reliability; optionally return parsed JSON with format: "json"
Skill progressive disclosure — first round exposes only skill summary; full SKILL.md is injected only when the model selects the skill
Streaming inference — real-time token streaming with thinking/reasoning content support
Prompt template inference — user messages can reference a named template by ID; {{placeholder}} variables are resolved at runtime from the request's arguments dict, enabling dynamic prompt adjustment and model/tool-agnostic parameterization without redeploying
Multi-agent collaboration — delegate subtasks to independent Subagents via the built-in delegate tool; each Subagent runs with its own model and toolset, returning results to the parent agent. Supports streaming output, nested delegation, and automatic session persistence
Agent management — save current model, tools, and system prompt configurations as reusable Agents; quickly switch between saved Agents in the chat interface
Web UI management console — Svelte 5 SPA for managing models, tools, prompt templates, agents, and chat
HTTP API server — lightweight REST API built on http.server, no FastAPI/uvicorn needed
Multimodal — supports image (base64) and audio inputs for VLM models
Multi-task concurrent conversations with real-time status tracking — support multiple simultaneous chat sessions with independent streaming states; real-time session status updates via SSE (streaming, success, error, unread); automatic read status management based on user scroll position; session title broadcasting
Workspace file management — full-featured workspace file manager with directory tree navigation, file listing (list/grid views), search (AND/OR modes via ripgrep/grep), rename, duplicate, delete, download, and chunked/parallel upload with pause/resume/retry support; workspace file references (<file>path</file>) in chat prompts are auto-expanded to inline content or attached images at inference time
Self-install setup script — export the current Agent Service code, built Web UI (web/dist), and runtime configuration as a self-extracting installer via /v1/setup; install on another machine with curl -s http://{host}:7988/v1/setup | sh.

Architecture

runtime/
├── __init__.py              # Public API exports
├── models.py                # Data models: Message, ModelConfig, ToolConfig, etc.
├── registry.py              # ModelRegistry + ToolRegistry
├── protocols.py             # Protocol adapters: OpenAI / Ollama / Anthropic
├── runtime.py               # Runtime engine: inference + tool call loop + Skill disclosure
├── tools.py                 # Function tool decorator
├── skill_manager.py         # SkillManager: SKILL.md parsing and progressive disclosure
├── mcp_client.py            # MCP Client: pure stdlib stdio/SSE implementation (StreamReader limit raised to 100 MB for large payloads)
├── builtin_tools.py         # Built-in tools: bash, fetch
├── prompt_template_manager.py  # Prompt template CRUD
├── context_manager.py       # Context manager: session management, rolling summary, memory extraction
├── env_manager.py           # Environment variable manager
├── session_manager.py       # Session index manager
├── workspace_manager.py     # Workspace file manager: listing, search, upload, file refs
└── server.py                # HTTP API server

web/                         # Svelte 5 management console SPA
examples/                    # Usage examples

Quick Start

1. Python API — Function Tool

import os
from runtime import (
    ModelConfig, ModelRegistry,
    ToolConfig, ToolRegistry,
    Runtime, InferenceRequest, Message,
)

# Register a model (Ollama)
model_registry = ModelRegistry()
model_registry.register(ModelConfig(
    model_id="qwen3-14b",
    api_base="http://localhost:11434",
    model_name="qwen3:14b",
    api_protocol="ollama",
))

# Register a function tool
tool_registry = ToolRegistry()
tool_registry.register(
    ToolConfig(
        tool_id="web_search",
        tool_type="function",
        name="web_search",
        description="Search the internet for information.",
        parameters={
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"},
            },
            "required": ["query"],
        },
    ),
    callable_fn=my_search_function,
)

# Run inference
runtime = Runtime(model_registry=model_registry, tool_registry=tool_registry)
result = runtime.infer(InferenceRequest(
    model_id="qwen3-14b",
    tool_ids=["web_search"],
    messages=[Message(role="user", content="What is the latest Python version?")],
))
print(result.messages[-1].content)

2. MCP Tools

Since MCPClientManager is a singleton, any code running in the same process as the server can call a registered MCP tool directly in one line:

from runtime.mcp_client import MCPClientManager
result = MCPClientManager().call_tool("chrome-devtools", "new_page", {"url": "https://example.com"})

To use MCP tools with model inference:

from runtime import ModelRegistry, ToolRegistry, Runtime, InferenceRequest
from runtime.mcp_client import MCPClientManager

mcp = MCPClientManager()
mcp.load_config({
    "mcpServers": {
        "time": {"command": "uvx", "args": ["mcp-server-time"]},
        "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]},
    }
})

tool_registry = ToolRegistry()
all_tools = []
for server_name in ["time", "fetch"]:
    tools = mcp.get_tools(server_name)
    for t in tools:
        tool_registry.register(t)
    all_tools.extend(tools)

runtime = Runtime(model_registry=..., tool_registry=tool_registry, mcp_manager=mcp)
result = runtime.infer(InferenceRequest(
    model_id="my-model",
    tool_ids=[t.tool_id for t in all_tools],
    text="What time is it now?",
))

3. Skill with Progressive Disclosure

from runtime import ModelRegistry, ToolRegistry, Runtime, InferenceRequest, SkillManager

tool_registry = ToolRegistry()
skill_manager = SkillManager(tool_registry)
skill_config = skill_manager.load_skill("/path/to/my_skill")  # directory with SKILL.md

runtime = Runtime(
    model_registry=...,
    tool_registry=tool_registry,
    skill_manager=skill_manager,
)

# Stream with progressive disclosure
for msg in runtime.infer_stream(InferenceRequest(
    model_id="my-model",
    tool_ids=[skill_config.tool_id],
    text="Help me query the latest data",
    max_tool_rounds=20,
)):
    if msg.content:
        print(msg.content, end="", flush=True)
    elif msg.thinking:
        print(f"[thinking] {msg.thinking}", end="", flush=True)

4. Prompt Template Inference

Prompt templates let you define reusable, parameterized prompts that are resolved at runtime — no redeployment needed when you want to tweak wording or adapt to a different model.

from runtime import Runtime, InferenceRequest, Message
from runtime.prompt_template_manager import PromptTemplateManager

# Create a template with {{placeholder}} variables
pt_manager = PromptTemplateManager()
pt_manager.create(
    name="summarize",
    content="Please summarize the following text in {{language}}:\n\n{{text}}",
)

runtime = Runtime(
    model_registry=...,
    tool_registry=...,
    prompt_template_manager=pt_manager,
)

# Reference the template by name; supply variables via arguments
result = runtime.infer(InferenceRequest(
    model_id="qwen3-14b",
    messages=[Message(
        role="user",
        prompt_template="summarize",
        arguments={"language": "English", "text": "...long article..."},
    )],
))
print(result.messages[-1].content)

The template content is fetched and all {{variable}} placeholders are substituted before the message is sent to the model. Templates can be created, updated, and deleted at runtime via the HTTP API or Web UI — making prompt iteration fast without touching code.

5. Multi-Agent Collaboration (Delegate Tool)

The built-in delegate tool enables hierarchical task delegation. A parent agent can spawn Subagents with different models and toolsets to handle specialized subtasks:

from runtime import Runtime, InferenceRequest, Message

runtime = Runtime(model_registry=..., tool_registry=...)

# The parent agent uses a general-purpose model with the delegate tool
result = runtime.infer(InferenceRequest(
    model_id="qwen3-14b",
    tool_ids=["delegate", "web_search"],  # delegate + other tools
    messages=[Message(
        role="user",
        content="Research the latest AI breakthroughs and write a summary report.",
    )],
))

# The model may call delegate() with:
# - model_id: a specialized model (e.g., a coding model for code generation)
# - tool_names: subset of available tools for the Subagent
# - task: the subtask description
# - context: optional system prompt for the Subagent

Key features:

Streaming output: Subagent responses stream back in real-time via SSE
Nested delegation: Subagents can further delegate to deeper-level agents
Tool scoping: Parent agent's tools are automatically listed in a Markdown table and injected into the Subagent's system prompt
Session persistence: Each Subagent session is saved to ~/.agents_runtime/chat_data/{session_id}/sub_{timestamp}/

6. Start the HTTP Server

python app.py              # default: 0.0.0.0:7988
python app.py 7988         # custom port
python app.py 0.0.0.0:9000 # custom host and port

HTTP API Reference

Method	Path	Description
POST	`/v1/infer`	Non-streaming inference
POST	`/v1/infer/stream`	Streaming inference (SSE)
POST	`/v1/infer/abort`	Abort an active streaming inference by session ID
GET	`/v1/models`	List registered models
POST	`/v1/models`	Register a model
PUT	`/v1/models/{model_id}`	Update a model
DELETE	`/v1/models/{model_id}`	Delete a model
GET	`/v1/tools`	List registered tools
POST	`/v1/tools`	Register a tool
PUT	`/v1/tools/{tool_id}`	Update a tool
DELETE	`/v1/tools/{tool_id}`	Delete a tool
POST	`/v1/tools/call`	Directly call a tool (bypass LLM)
POST	`/v1/tools/mcp`	Register MCP servers
POST	`/v1/tools/skill`	Register a skill
GET	`/v1/mcp-servers`	List registered MCP servers
DELETE	`/v1/mcp-servers/{server_name}`	Delete an MCP server
POST	`/v1/sessions/{session_id}/generate-title`	Auto-generate session title
POST	`/v1/sessions/{session_id}/revoke`	Revoke a session
DELETE	`/v1/tools/batch`	Batch delete tools
GET	`/v1/prompt-templates`	List prompt templates
POST	`/v1/prompt-templates`	Create a prompt template
PUT	`/v1/prompt-templates/{id}`	Update a prompt template
DELETE	`/v1/prompt-templates/{id}`	Delete a prompt template
GET	`/v1/env`	Get environment variables
POST	`/v1/env`	Set environment variable
POST	`/v1/env/detect`	Auto-detect environment variables
DELETE	`/v1/env/{key}`	Delete environment variable
GET	`/v1/sessions`	List all sessions
GET	`/v1/sessions/events`	SSE endpoint for real-time session status updates
GET	`/v1/sessions/{session_id}`	Get session details
DELETE	`/v1/sessions/{session_id}`	Delete session
POST	`/v1/sessions/{session_id}/read`	Mark session as read
GET	`/v1/agents`	List all agents
GET	`/v1/agents/{agent_id}`	Get a single agent
POST	`/v1/agents`	Create an agent
PUT	`/v1/agents/{agent_id}`	Update an agent
DELETE	`/v1/agents/{agent_id}`	Delete an agent
GET	`/v1/workspace/list`	List files in a workspace directory (paginated)
GET	`/v1/workspace/tree`	Get workspace directory tree structure
GET	`/v1/workspace/children`	List child directories of any path (no workspace restriction)
GET	`/v1/workspace/search`	Search files in workspace (AND/OR modes)
GET	`/v1/workspace/content`	Get file content for preview
GET	`/v1/workspace/download`	Download a file
GET	`/v1/workspace/thumbnail`	Get image thumbnail
POST	`/v1/workspace/rename`	Rename a file or directory
POST	`/v1/workspace/duplicate`	Duplicate a file
DELETE	`/v1/workspace/delete`	Delete a file or directory
POST	`/v1/workspace/upload/init`	Initialize a chunked file upload
PUT	`/v1/workspace/upload/{upload_id}/chunk/{chunk_id}`	Upload a file chunk
POST	`/v1/workspace/upload/{upload_id}/complete`	Complete a chunked upload
DELETE	`/v1/workspace/upload/{upload_id}`	Cancel an upload

Streaming inference request:

{
  "model_id": "qwen3-14b",
  "tool_ids": ["web_search"],
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Search for the latest AI news."}
  ],
  "stream": true,
  "max_tool_rounds": 10,
  "session_id": "new"
}

Note: The session_id field is optional. Use "new" to create a new session, an existing session ID to resume a conversation, or omit it for stateless inference.

Web UI

The management console is a Svelte 5 SPA located in web/. Build and serve it:

cd web
npm install
npm run build

The built files in web/dist/ are automatically served by the HTTP server at the root path.

Features:

Chat with model selection, tool selection, prompt template support, and agent selection
Multi-task concurrent conversations — each session maintains independent streaming state; switching sessions doesn't interrupt active streams
Real-time session status indicators in sidebar (streaming, success-unread, error-unread) via SSE
Automatic read status management — marks session as read when user scrolls to bottom
Model management (CRUD) — copy an existing model config to quickly create a new one
Tool management (CRUD)
Prompt template management with {{placeholder}} variable support
Agent management — save current configuration as a reusable agent; switch agents in the chat interface
Markdown rendering with syntax highlighting
Expandable long JSON string previews that auto-fit the available code block width
Workspace file manager — directory tree navigation, list/grid views, file search, rename/duplicate/delete, chunked upload with progress tracking, and clipboard paste upload
Rich text chat input with workspace file reference chips (<file>path</file>)
Multimodal: image upload and microphone recording
Dark/light theme, responsive layout
Resizable sidebar with collapse/expand toggle; width persisted to localStorage

Examples

File	Description
`examples/example_function_register.py`	Register a SearXNG search as a Function Tool; the LLM automatically calls it to answer queries
`examples/example_mcp_ollama.py`	Connect Ollama (qwen3:14b) with MCP `time` and `fetch` servers; supports `--stream` flag
`examples/example_mcp_openai.py`	Same as above but using the OpenAI-compatible protocol; easily switch to OpenAI, vLLM, LiteLLM, etc.
`examples/example_skill.py`	Load a Skill from a directory and run streaming inference with progressive SKILL.md disclosure
`examples/example_vlm_tool_call.py`	VLM reads an image, understands the instruction in it, and calls built-in `bash`/`fetch` tools to execute
`examples/example_browser_use.py`	Client/server split: server registers chrome-devtools MCP; client calls `/v1/tools/call` to open a page directly, then `/v1/infer/stream` to let the LLM inspect and interact with the browser
`examples/example_stream_as_infer.py`	Use `/v1/infer/stream` (SSE) to receive streaming tokens and reassemble them into the same JSON structure as `/v1/infer` — avoids idle-timeout disconnections on long-running inference
`examples/example_multi_agents.py`	Multi-Agent collaboration: PlanAgent delegates tasks to MainAgent via the `delegate` tool. Demonstrates prompt templates, MCP tools, and hierarchical task delegation with automatic TOOLS markdown generation

Data Persistence

All configuration is persisted to ~/.agents_runtime/:

~/.agents_runtime/
├── models.json
├── tools.json
├── mcp_servers.json
├── prompt_templates.json
├── env.json
├── agents/                  # Agent data directory
│   └── {agent_id}.json      # Individual agent configuration files
└── chat_data/              # Session data directory
    └── {session_id}/
        ├── conversation.json
        ├── summary.md
        └── memory.md

env.json is a flat key-value map of environment variables loaded at server startup, useful for injecting API keys and other secrets without modifying the system environment:

{
  "OPENAI_API_KEY": "sk-...",
  "SOME_SERVICE_TOKEN": "abc123"
}

Requirements

Python 3.10+
No third-party Python packages required for the core runtime
For the web UI: Node.js 18+ and npm

Background & Motivation

This project was born out of frustrations encountered while using Qwen-Agent. Several pain points drove the decision to build a new Agent Service from scratch:

MCP tools are registered per-agent, so different agents each spin up their own local MCP process instances — unnecessary overhead since most MCP servers can be shared as stateless services.
The combinatorial explosion of models × tools makes static pre-definitions impractical.
Function tools cannot be dynamically defined and loaded at runtime.
MCP/function tools cannot be called directly — every invocation must go through the LLM, making deterministic automation unreliable.
No support for Skills.
Hard-coded OpenAI protocol causes abnormal inference behavior when connecting to local Ollama models for VLM tasks.
The Web UI and a clean HTTP server API cannot run in the same process simultaneously.
Models, tools, and prompt templates need to be added, updated, and removed at runtime — especially prompt templates, which require frequent iteration. The author added CRUD support to the official Qwen-Agent GUI (fork here), but the Gradio-based UI is sluggish and the experience is poor.

These issues made building a dedicated Agent Service worthwhile. Leveraging the power of modern AI-assisted development, this project was built from scratch to address all of the above. It intentionally avoids introducing third-party dependencies so it can be embedded into any existing project — usable as either an SDK or a standalone HTTP service.

The project is under active development. Next steps include enhancing the multi-agent collaboration framework with more orchestration patterns and the closely related topic of secure user data management.

License

MIT License — see LICENSE

中文

一个极简、零第三方依赖的 Agent Service，完全基于 Python 标准库构建。它可基于项目上下文自举编码 Agent，并在运行期动态连接大模型、工具、提示词模板与 Subagent，无需预定义静态 Agent。

特性

零第三方依赖 — 核心运行时仅使用 Python 标准库
自举式 Agent — 基于项目上下文、模型/工具配置、提示词、Skill 和会话状态自举可运行的编码 Agent
多协议支持 — OpenAI 兼容 API、Ollama 原生 /api/chat 和 Anthropic Messages API
三种工具类型 — 进程内 Function 工具、MCP（模型上下文协议）工具、Skill 技能
MCP/function工具直接调用 — 可绕过大模型直接调用MCP/function工具，可靠性100%；支持通过 format: "json" 返回解析后的 JSON
Skill 渐进披露 — 第一轮推理仅暴露技能摘要，大模型选择后才注入完整 SKILL.md
流式推理 — 实时 token 流式输出，支持 thinking/reasoning 内容
提示词模板推理 — 用户消息可通过模板 ID 引用命名模板，{{占位符}} 变量在推理时从请求的 arguments 字典动态替换，无需重新部署即可调整提示词，并支持参数化以适应不同模型和工具
多智能体协作 — 通过内置 delegate 工具将子任务委派给独立的 Subagent 执行；每个 Subagent 可使用不同的模型和工具集，完成后将结果返回给父 Agent。支持流式输出、嵌套委派和自动会话持久化
智能体管理 — 将当前模型、工具和系统提示词配置保存为可复用的智能体；在聊天界面中快速切换已保存的智能体
Web UI 管理控制台 — Svelte 5 SPA，支持模型、工具、提示词模板、智能体管理和对话
HTTP API 服务 — 基于 http.server 的轻量 REST API，无需 FastAPI/uvicorn
多模态 — 支持图片（base64）和音频输入，适配 VLM 模型
多任务并发对话及实时状态跟踪 — 支持多个聊天会话同时进行，每个会话独立管理流式状态；通过SSE实时更新会话状态（流式中、成功、错误、未读）；基于用户滚动位置自动管理已读状态；会话标题实时广播更新
工作区文件管理 — 完整的工作区文件管理器，支持目录树导航、文件列表（列表/网格视图）、搜索（AND/OR模式，基于ripgrep/grep）、重命名、复制、删除、下载及分块/并行上传（支持暂停/恢复/重试）；对话中的工作区文件引用（<file>路径</file>）在推理时自动展开为内联内容或附加图片
自安装脚本 — 通过 /v1/setup 将当前 Agent Service 源码、已编译 Web UI（web/dist）和运行时配置导出为自解压安装脚本；可在另一台机器上使用 curl -s http://{host}:7988/v1/setup | sh 安装。

架构

runtime/
├── __init__.py              # 公开 API 导出
├── models.py                # 数据模型：Message、ModelConfig、ToolConfig 等
├── registry.py              # ModelRegistry + ToolRegistry
├── protocols.py             # 协议适配器：OpenAI / Ollama / Anthropic
├── runtime.py               # 运行时引擎：推理 + 工具调用循环 + Skill 渐进披露
├── tools.py                 # Function 工具装饰器
├── skill_manager.py         # SkillManager：SKILL.md 解析与渐进披露管理
├── mcp_client.py            # MCP Client：纯标准库 stdio/SSE 实现（StreamReader 上限扩展至 100 MB，支持大数据量返回）
├── builtin_tools.py         # 内置工具：bash、fetch
├── prompt_template_manager.py  # 提示词模板 CRUD
├── context_manager.py       # 上下文管理器：会话管理、滚动摘要、记忆提取
├── env_manager.py           # 环境变量管理器
├── session_manager.py       # 会话索引管理器
├── workspace_manager.py     # 工作区文件管理器：文件列表、搜索、上传、文件引用展开
└── server.py                # HTTP API 服务器

web/                         # Svelte 5 管理控制台 SPA
examples/                    # 使用示例

快速开始

1. Python API — Function 工具

import os
from runtime import (
    ModelConfig, ModelRegistry,
    ToolConfig, ToolRegistry,
    Runtime, InferenceRequest, Message,
)

# 注册模型（Ollama）
model_registry = ModelRegistry()
model_registry.register(ModelConfig(
    model_id="qwen3-14b",
    api_base="http://localhost:11434",
    model_name="qwen3:14b",
    api_protocol="ollama",
))

# 注册 Function 工具
tool_registry = ToolRegistry()
tool_registry.register(
    ToolConfig(
        tool_id="web_search",
        tool_type="function",
        name="web_search",
        description="通过互联网搜索引擎搜索信息。",
        parameters={
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "搜索关键词"},
            },
            "required": ["query"],
        },
    ),
    callable_fn=my_search_function,
)

# 发起推理
runtime = Runtime(model_registry=model_registry, tool_registry=tool_registry)
result = runtime.infer(InferenceRequest(
    model_id="qwen3-14b",
    tool_ids=["web_search"],
    messages=[Message(role="user", content="Python 最新版本是什么？")],
))
print(result.messages[-1].content)

2. MCP 工具

MCPClientManager 是单例，在注册了 MCP server 的进程内，可以一句话直接调用工具，无需持有 server 或 runtime 的引用：

from runtime.mcp_client import MCPClientManager
result = MCPClientManager().call_tool("chrome-devtools", "new_page", {"url": "https://example.com"})

配合模型推理使用：

from runtime import ModelRegistry, ToolRegistry, Runtime, InferenceRequest
from runtime.mcp_client import MCPClientManager

mcp = MCPClientManager()
mcp.load_config({
    "mcpServers": {
        "time": {"command": "uvx", "args": ["mcp-server-time"]},
        "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]},
    }
})

tool_registry = ToolRegistry()
all_tools = []
for server_name in ["time", "fetch"]:
    tools = mcp.get_tools(server_name)
    for t in tools:
        tool_registry.register(t)
    all_tools.extend(tools)

runtime = Runtime(model_registry=..., tool_registry=tool_registry, mcp_manager=mcp)
result = runtime.infer(InferenceRequest(
    model_id="my-model",
    tool_ids=[t.tool_id for t in all_tools],
    text="现在几点了？",
))

3. Skill 渐进披露

from runtime import ModelRegistry, ToolRegistry, Runtime, InferenceRequest, SkillManager

tool_registry = ToolRegistry()
skill_manager = SkillManager(tool_registry)
skill_config = skill_manager.load_skill("/path/to/my_skill")  # 包含 SKILL.md 的目录

runtime = Runtime(
    model_registry=...,
    tool_registry=tool_registry,
    skill_manager=skill_manager,
)

# 流式推理 + 渐进披露
for msg in runtime.infer_stream(InferenceRequest(
    model_id="my-model",
    tool_ids=[skill_config.tool_id],
    text="帮我查一下最近的数据",
    max_tool_rounds=20,
)):
    if msg.content:
        print(msg.content, end="", flush=True)
    elif msg.thinking:
        print(f"[思考] {msg.thinking}", end="", flush=True)

4. 提示词模板推理

提示词模板支持运行时动态调整提示词，无需重新部署代码。模板内容可通过 Web UI 或 HTTP API 随时增删改，{{占位符}} 变量在推理时从请求参数中替换，使同一套推理逻辑能适配不同模型、工具和业务场景。

from runtime import Runtime, InferenceRequest, Message
from runtime.prompt_template_manager import PromptTemplateManager

# 创建带占位符的模板
pt_manager = PromptTemplateManager()
pt_manager.create(
    name="summarize",
    content="请用{{language}}对以下内容进行摘要：\n\n{{text}}",
)

runtime = Runtime(
    model_registry=...,
    tool_registry=...,
    prompt_template_manager=pt_manager,
)

# 通过模板名引用，arguments 提供占位符的值
result = runtime.infer(InferenceRequest(
    model_id="qwen3-14b",
    messages=[Message(
        role="user",
        prompt_template="summarize",
        arguments={"language": "中文", "text": "...长文内容..."},
    )],
))
print(result.messages[-1].content)

5. 多智能体协作（Delegate 工具）

内置 delegate 工具支持层级化任务委派。父 Agent 可以生成使用不同模型和工具集的 Subagent 来处理专门的子任务：

from runtime import Runtime, InferenceRequest, Message

runtime = Runtime(model_registry=..., tool_registry=...)

# 父 Agent 使用通用模型，并启用 delegate 工具
result = runtime.infer(InferenceRequest(
    model_id="qwen3-14b",
    tool_ids=["delegate", "web_search"],  # delegate + 其他工具
    messages=[Message(
        role="user",
        content="研究最新的 AI 突破并撰写一份总结报告。",
    )],
))

# 模型可能会调用 delegate()，参数包括：
# - model_id: 专用模型（如代码生成模型）
# - tool_names: Subagent 可用的工具子集
# - task: 子任务描述
# - context: 可选的 Subagent 系统提示词

主要特性：

流式输出：Subagent 响应通过 SSE 实时流式返回
嵌套委派：Subagent 可继续向更深层级委派任务
工具作用域：父 Agent 的工具自动生成 Markdown 表格并注入到 Subagent 的系统提示词
会话持久化：每个 Subagent 会话保存到 ~/.agents_runtime/chat_data/{session_id}/sub_{timestamp}/

6. 启动 HTTP 服务

python app.py              # 默认：0.0.0.0:7988
python app.py 7988         # 自定义端口
python app.py 0.0.0.0:9000 # 自定义主机和端口

HTTP API 接口

方法	路径	说明
POST	`/v1/infer`	非流式推理
POST	`/v1/infer/stream`	流式推理（SSE）
POST	`/v1/infer/abort`	中止指定会话的流式推理
GET	`/v1/models`	获取模型列表
POST	`/v1/models`	注册模型
PUT	`/v1/models/{model_id}`	更新模型
DELETE	`/v1/models/{model_id}`	删除模型
GET	`/v1/tools`	获取工具列表
POST	`/v1/tools`	注册工具
PUT	`/v1/tools/{tool_id}`	更新工具
DELETE	`/v1/tools/{tool_id}`	删除工具
POST	`/v1/tools/call`	直接调用工具（绕过大模型）
POST	`/v1/tools/mcp`	注册 MCP 服务器
POST	`/v1/tools/skill`	注册 Skill
GET	`/v1/mcp-servers`	列出已注册的 MCP servers
DELETE	`/v1/mcp-servers/{server_name}`	删除一个 MCP server
POST	`/v1/sessions/{session_id}/generate-title`	为会话自动生成标题
POST	`/v1/sessions/{session_id}/revoke`	撤销/取消一个会话
DELETE	`/v1/tools/batch`	批量删除工具
GET	`/v1/prompt-templates`	获取提示词模板列表
POST	`/v1/prompt-templates`	创建提示词模板
PUT	`/v1/prompt-templates/{id}`	更新提示词模板
DELETE	`/v1/prompt-templates/{id}`	删除提示词模板
GET	`/v1/env`	获取环境变量
POST	`/v1/env`	设置环境变量
POST	`/v1/env/detect`	自动检测环境变量
DELETE	`/v1/env/{key}`	删除环境变量
GET	`/v1/sessions`	列出所有会话
GET	`/v1/sessions/events`	SSE端点，实时推送会话状态更新
GET	`/v1/sessions/{session_id}`	获取会话详情
DELETE	`/v1/sessions/{session_id}`	删除会话
POST	`/v1/sessions/{session_id}/read`	标记会话为已读
GET	`/v1/agents`	列出所有智能体
GET	`/v1/agents/{agent_id}`	获取单个智能体
POST	`/v1/agents`	创建智能体
PUT	`/v1/agents/{agent_id}`	更新智能体
DELETE	`/v1/agents/{agent_id}`	删除智能体
GET	`/v1/workspace/list`	列出工作区目录中的文件（分页）
GET	`/v1/workspace/tree`	获取工作区目录树结构
GET	`/v1/workspace/children`	列出任意路径的子目录（不限工作区）
GET	`/v1/workspace/search`	搜索工作区文件（AND/OR模式）
GET	`/v1/workspace/content`	获取文件内容用于预览
GET	`/v1/workspace/download`	下载文件
GET	`/v1/workspace/thumbnail`	获取图片缩略图
POST	`/v1/workspace/rename`	重命名文件或目录
POST	`/v1/workspace/duplicate`	复制文件
DELETE	`/v1/workspace/delete`	删除文件或目录
POST	`/v1/workspace/upload/init`	初始化分块文件上传
PUT	`/v1/workspace/upload/{upload_id}/chunk/{chunk_id}`	上传文件分块
POST	`/v1/workspace/upload/{upload_id}/complete`	完成分块上传
DELETE	`/v1/workspace/upload/{upload_id}`	取消上传

流式推理请求示例：

{
  "model_id": "qwen3-14b",
  "tool_ids": ["web_search"],
  "messages": [
    {"role": "system", "content": "你是一个智能助手。"},
    {"role": "user", "content": "搜索最新的 AI 新闻。"}
  ],
  "stream": true,
  "max_tool_rounds": 10,
  "session_id": "new"
}

注意： session_id 字段为可选参数。传入 "new" 创建新会话，传入已有会话 ID 恢复对话，或省略该字段进行无状态推理。

Web UI 管理控制台

管理控制台是一个 Svelte 5 SPA，位于 web/ 目录。构建方式：

cd web
npm install
npm run build

构建产物 web/dist/ 会由 HTTP 服务器自动在根路径提供服务。

功能包括：

对话页面：模型选择、工具选择、提示词模板（支持 {{占位符}} 变量）、智能体选择
多任务并发对话 — 每个会话独立维护流式状态，切换会话不影响正在进行的推理
侧边栏实时会话状态指示（流式中、成功未读、错误未读），通过SSE推送
自动已读状态管理 — 用户滚动到底部时自动标记会话为已读
模型管理（增删改查）— 支持复制现有模型配置，快速创建新模型
工具管理（增删改查）
提示词模板管理
智能体管理 — 将当前配置保存为可复用的智能体；在对话中快速切换智能体
Markdown 渲染与语法高亮
JSON 长字符串可折叠预览，并自动适配代码块可用宽度
工作区文件管理器 — 目录树导航、列表/网格视图、文件搜索、重命名/复制/删除、分块上传及进度跟踪、剪贴板粘贴上传
富文本聊天输入框，支持工作区文件引用标签（<file>路径</file>）
多模态：图片上传与麦克风录音
深色/浅色主题，响应式布局
侧边栏支持拖拽调整宽度与折叠/展开，宽度自动持久化到 localStorage

功能示例

文件	说明
`examples/example_function_register.py`	将 SearXNG 搜索封装为 Function Tool，大模型自动调用搜索工具回答问题
`examples/example_mcp_ollama.py`	Ollama（qwen3:14b）+ MCP `time`/`fetch` 工具，支持 `--stream` 流式输出
`examples/example_mcp_openai.py`	同上，使用 OpenAI 兼容协议，可轻松切换 OpenAI、vLLM、LiteLLM 等服务
`examples/example_skill.py`	从目录加载 Skill，流式推理演示 SKILL.md 渐进披露全流程
`examples/example_vlm_tool_call.py`	VLM 读取图片中的文字指令，自动调用内置 `bash`/`fetch` 工具执行
`examples/example_browser_use.py`	客户端/服务端分离：Server 注册 chrome-devtools MCP；Client 通过 `/v1/tools/call` 直接打开页面，再通过 `/v1/infer/stream` 让大模型操控浏览器
`examples/example_stream_as_infer.py`	通过 `/v1/infer/stream`（SSE）接收流式 token，在本地拼装成与 `/v1/infer` 完全一致的 JSON 结果，彻底规避长时推理的网关/代理 idle timeout 断连问题；支持 `--compare` 参数同时调用两个接口对比结果
`examples/example_multi_agents.py`	多 Agent 协作：PlanAgent 通过 `delegate` 工具将任务委派给 MainAgent 执行。演示提示词模板、MCP 工具、层级化任务委派，以及自动生成 TOOLS markdown 表格

数据持久化

所有配置持久化到 ~/.agents_runtime/：

~/.agents_runtime/
├── models.json
├── tools.json
├── mcp_servers.json
├── prompt_templates.json
├── env.json
├── agents/                  # 智能体数据目录
│   └── {agent_id}.json     # 智能体配置文件
└── chat_data/              # 会话数据目录
    └── {session_id}/
        ├── conversation.json
        ├── summary.md
        └── memory.md

env.json 是一个扁平的键值映射，服务启动时自动加载为环境变量，适合注入 API Key 等敏感配置，无需修改系统环境：

{
  "OPENAI_API_KEY": "sk-...",
  "SOME_SERVICE_TOKEN": "abc123"
}

环境要求

Python 3.10+
核心运行时无需任何第三方 Python 包
Web UI 编译需要 Node.js 18+ 和 npm

背景与动机

本项目源于在使用 Qwen-Agent 过程中遇到的一系列痛点，促使作者决定从零构建一个 Agent Service：

MCP 工具注册在 Agent 内部，不同 Agent 会重复启动各自的 MCP 本地进程实例，而大多数 MCP 服务完全可以作为无状态服务共享使用，这种重复启动是不必要的开销。
模型与工具的组合数量庞大，预先静态定义远远不够用。
Function 工具无法在运行期动态定义和加载。
MCP/function 工具不能绕过大模型直接调用，所有调用都必须经过大模型，确定性自动化场景下可靠性差。
不支持 Skill 技能。
固定使用 OpenAI 协议，对接本地 Ollama 模型时 VLM 推理效果异常。
Web GUI 与简洁的 HTTP Server 接口无法在同一进程中同时提供服务。
模型、工具和提示词模板需要在运行期间增删改查，尤其是提示词模板需要反复调整。作者曾为官方 GUI 增加了相关 CRUD 功能（fork 地址），但 Gradio 制作的 GUI 响应迟缓，体验较差。

基于以上问题，构建一个专门的 Agent Service 就有了必要性。借助现代 AI 辅助开发的强大能力，本项目从零开始开发，解决了上述所有问题。它有意避免引入第三方依赖，以便嵌入到任何现有项目中使用——既可作为 SDK 引入，也可作为独立 HTTP 服务运行。

此项目仍在积极迭代中。下一步计划完善多 Agent 协同工作框架（增加更多编排模式），以及与之密切相关的用户数据安全管理机制。

开源协议

MIT License — 详见 LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
accessories		accessories
examples		examples
resources		resources
runtime		runtime
tests		tests
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Service

English

Features

Architecture

Quick Start

HTTP API Reference

Web UI

Examples

Data Persistence

Requirements

Background & Motivation

License

中文

特性

架构

快速开始

HTTP API 接口

Web UI 管理控制台

功能示例

数据持久化

环境要求

背景与动机

开源协议

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Service

English

Features

Architecture

Quick Start

HTTP API Reference

Web UI

Examples

Data Persistence

Requirements

Background & Motivation

License

中文

特性

架构

快速开始

HTTP API 接口

Web UI 管理控制台

功能示例

数据持久化

环境要求

背景与动机

开源协议

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages