Skip to content

Commit 241ea2d

Browse files
SonAIengineclaude
andcommitted
docs: README 업데이트 — Workflow Planning, SSE transport, 1068 tool 벤치마크
## 추가된 문서 (영어 + 한국어) ### 새 기능 문서화 - plan_workflow() API: 멀티스텝 워크플로우 자동 생성 + 수동 편집 - Visual Workflow Editor: 브라우저 기반 드래그앤드롭 편집기 - SSE/Streamable-HTTP transport: 원격 MCP 배포 지원 - plan.open_editor(): 브라우저에서 시각화 편집 ### 벤치마크 결과 업데이트 - 상단 요약: 1068 tool 스케일 테스트 추가 - 경쟁 벤치마크: 6개 retrieval 전략 비교표 - 대규모 테스트: GitHub 전체 API (1068 tools) 결과 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent aa5f572 commit 241ea2d

2 files changed

Lines changed: 164 additions & 2 deletions

File tree

README-ko.md

Lines changed: 85 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,19 @@ tool 간 관계를 그래프로 조직화한 뒤, **필요한 tool만 정확하
1313
[![CI](https://github.com/SonAIengine/graph-tool-call/actions/workflows/ci.yml/badge.svg)](https://github.com/SonAIengine/graph-tool-call/actions/workflows/ci.yml)
1414
[![Zero Dependencies](https://img.shields.io/badge/dependencies-0-brightgreen.svg)](https://pypi.org/project/graph-tool-call/)
1515

16+
<br>
17+
18+
| | Baseline (전체 tool) | graph-tool-call |
19+
|---|:---:|:---:|
20+
| **248 tools (K8s API)** | 12% 정확도 | **82% 정확도** |
21+
| **1068 tools (GitHub 전체 API)** | 불가능 (context overflow) | **78% Recall@5** |
22+
| **토큰 사용량** | 8,192 토큰 | 1,699 토큰 (**79% 절감**) |
23+
| **지연시간 (임베딩 없이)** || **평균 2.7ms** |
24+
25+
<sub>qwen3:4b (4-bit) 기준 측정 — <a href="#벤치마크">전체 벤치마크 아래 참고</a></sub>
26+
27+
<br>
28+
1629
[English](README.md) · 한국어 · [中文](README-zh_CN.md) · [日本語](README-ja.md)
1730

1831
</div>
@@ -214,6 +227,34 @@ for t in tools:
214227

215228
이 스펙에서는 `top_k=5` 기준으로 **Recall@5 98.3%** 를 기록했습니다.
216229

230+
### 워크플로우 플래닝
231+
232+
벡터 검색은 개별 tool을 반환하지만, `plan_workflow()`는 선행 조건을 포함한 실행 순서 체인을 반환합니다. LLM Agent의 왕복 횟수를 3~4회에서 1회로 줄입니다.
233+
234+
```python
235+
from graph_tool_call import ToolGraph
236+
237+
tg = ToolGraph.from_url("https://api.example.com/openapi.json")
238+
239+
# 멀티스텝 워크플로우 자동 생성
240+
plan = tg.plan_workflow("환불 처리")
241+
for step in plan.steps:
242+
print(f"{step.order}. {step.tool.name}{step.reason}")
243+
# 1. getOrder — prerequisite for requestRefund
244+
# 2. requestRefund — primary action
245+
246+
# 워크플로우 편집
247+
plan.remove_step("listOrders")
248+
plan.insert_step(0, "getOrder", tools=tg.tools, reason="주문 ID 필요")
249+
plan.set_param_mapping("requestRefund", "order_id", "getOrder.response.id")
250+
251+
# 시각화 편집기 (브라우저에서 열림)
252+
plan.open_editor(tools=tg.tools)
253+
254+
# 저장 / 로드
255+
plan.save("refund_workflow.json")
256+
```
257+
217258
### MCP 서버 (Claude Code, Cursor, Windsurf 등)
218259

219260
MCP 서버로 실행하면, MCP를 지원하는 모든 Agent가 설정 한 줄로 tool 검색을 사용할 수 있습니다:
@@ -231,7 +272,29 @@ MCP 서버로 실행하면, MCP를 지원하는 모든 Agent가 설정 한 줄
231272
}
232273
```
233274

234-
서버는 5개의 tool을 노출합니다: `search_tools`, `get_tool_schema`, `list_categories`, `graph_info`, `load_source`.
275+
MCP 서버는 SSE 및 Streamable HTTP transport를 통한 원격 배포도 지원합니다:
276+
277+
```bash
278+
# 원격 배포 (SSE transport)
279+
graph-tool-call serve --source api.json --transport sse --host 0.0.0.0 --port 8000
280+
281+
# Streamable HTTP
282+
graph-tool-call serve --source api.json --transport streamable-http --port 8000
283+
```
284+
285+
원격 MCP 서버 클라이언트 설정:
286+
287+
```json
288+
{
289+
"mcpServers": {
290+
"tool-search": {
291+
"url": "http://tool-search.internal:8000/sse"
292+
}
293+
}
294+
}
295+
```
296+
297+
서버는 6개의 tool을 노출합니다: `search_tools`, `get_tool_schema`, `execute_tool`, `list_categories`, `graph_info`, `load_source`.
235298

236299
### MCP Proxy (여러 MCP 서버 통합)
237300

@@ -561,6 +624,27 @@ graph-tool-call은 두 가지를 검증합니다.
561624
- **ontology**는 tool 설명이 짧거나 비표준적일 때 **검색 가능한 표현 자체를 확장**합니다.
562625
- 둘을 함께 쓰면 end-to-end accuracy 상승 폭은 제한적일 수 있지만, **정답 tool을 후보군에 포함시키는 능력은 가장 강해집니다**.
563626

627+
### 경쟁 벤치마크 (retrieval 전략 비교)
628+
629+
9개 데이터셋(19~1068 tools)에 걸쳐 6가지 retrieval 전략을 비교했습니다:
630+
631+
| 전략 | Recall@5 | MRR | 지연시간 |
632+
|---|:---:|:---:|:---:|
633+
| Vector Only (≈bigtool) | 96.8% | 0.897 | 176ms |
634+
| BM25 Only | 91.6% | 0.819 | 1.5ms |
635+
| BM25 + Graph (기본값) | 91.6% | 0.819 | 14ms |
636+
| Full Pipeline (임베딩 포함) | 96.8% | 0.897 | 172ms |
637+
638+
**핵심 발견**: 임베딩 없이도 BM25+Graph는 91.6% Recall을 달성합니다 — 벡터 검색 대비 65배 빠른 속도에서 경쟁력 있는 성능입니다. 임베딩을 활성화하면 순수 벡터 검색과 동일한 성능을 냅니다.
639+
640+
### 대규모 테스트: 1068 tools (GitHub 전체 API)
641+
642+
| 전략 | Recall@5 | MRR | Miss% |
643+
|---|:---:|:---:|:---:|
644+
| Vector Only | 88.0% | 0.761 | 12.0% |
645+
| BM25 + Graph | 78.0% | 0.643 | 22.0% |
646+
| Full Pipeline | 88.0% | 0.761 | 12.0% |
647+
564648
### 직접 재현하기
565649

566650
```bash

README.md

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@ Vector search finds *similar* tools, but misses the *workflow* they belong to.<b
1111
| | Baseline (all tools) | graph-tool-call |
1212
|---|:---:|:---:|
1313
| **248 tools (K8s API)** | 12% accuracy | **82% accuracy** |
14+
| **1068 tools (GitHub full API)** | impossible (context overflow) | **78% Recall@5** |
1415
| **Token usage** | 8,192 tokens | 1,699 tokens (**79% reduction**) |
15-
| **50 tools (GitHub API)** | 100% accuracy | 90% accuracy, **88% fewer tokens** |
16+
| **Latency (no embedding)** | | **2.7ms avg** |
1617

1718
<sub>Measured with qwen3:4b (4-bit) — <a href="#benchmark">full benchmark below</a></sub>
1819

@@ -202,6 +203,34 @@ result = tg.execute(
202203
)
203204
```
204205

206+
### Workflow Planning
207+
208+
Unlike vector search which returns individual tools, `plan_workflow()` returns ordered execution chains with prerequisites — reducing LLM agent round-trips from 3-4 to 1.
209+
210+
```python
211+
from graph_tool_call import ToolGraph
212+
213+
tg = ToolGraph.from_url("https://api.example.com/openapi.json")
214+
215+
# Auto-generate a multi-step workflow
216+
plan = tg.plan_workflow("process a refund")
217+
for step in plan.steps:
218+
print(f"{step.order}. {step.tool.name}{step.reason}")
219+
# 1. getOrder — prerequisite for requestRefund
220+
# 2. requestRefund — primary action
221+
222+
# Edit the workflow
223+
plan.remove_step("listOrders")
224+
plan.insert_step(0, "getOrder", tools=tg.tools, reason="need order ID")
225+
plan.set_param_mapping("requestRefund", "order_id", "getOrder.response.id")
226+
227+
# Visual editor (opens in browser)
228+
plan.open_editor(tools=tg.tools)
229+
230+
# Save / Load
231+
plan.save("refund_workflow.json")
232+
```
233+
205234
### MCP Server (Claude Code, Cursor, Windsurf, etc.)
206235

207236
Run as an MCP server — any MCP-compatible agent can use tool search with just a config entry:
@@ -219,6 +248,28 @@ Run as an MCP server — any MCP-compatible agent can use tool search with just
219248
}
220249
```
221250

251+
The MCP server also supports remote deployment via SSE and Streamable HTTP transports:
252+
253+
```bash
254+
# Remote deployment (SSE transport)
255+
graph-tool-call serve --source api.json --transport sse --host 0.0.0.0 --port 8000
256+
257+
# Streamable HTTP
258+
graph-tool-call serve --source api.json --transport streamable-http --port 8000
259+
```
260+
261+
Client configuration for remote MCP servers:
262+
263+
```json
264+
{
265+
"mcpServers": {
266+
"tool-search": {
267+
"url": "http://tool-search.internal:8000/sse"
268+
}
269+
}
270+
}
271+
```
272+
222273
The server exposes 6 tools: `search_tools`, `get_tool_schema`, `execute_tool`, `list_categories`, `graph_info`, `load_source`.
223274

224275
Search results include **workflow guidance** — relations between tools and suggested execution order:
@@ -290,6 +341,12 @@ claude mcp list
290341
# (individual servers should be gone)
291342
```
292343

344+
The proxy also supports remote transport:
345+
346+
```bash
347+
graph-tool-call proxy --config backends.json --transport sse --port 8000
348+
```
349+
293350
That's it. The proxy exposes `search_tools`, `get_tool_schema`, and `call_backend_tool`. After searching, matched tools are **dynamically injected** for 1-hop direct calling.
294351

295352
<details>
@@ -563,6 +620,27 @@ On the largest dataset, **Kubernetes core/v1 (248 tools)**, we compared adding e
563620
- **Ontology** **expands the searchable representation itself** when tool descriptions are short or non-standard.
564621
- Using both together may show limited additional gains in end-to-end accuracy, but **the ability to include correct tools in the candidate set becomes strongest**.
565622

623+
### Competitive Benchmark (retrieval strategies)
624+
625+
Compared 6 retrieval strategies across 9 datasets (19–1068 tools):
626+
627+
| Strategy | Recall@5 | MRR | Latency |
628+
|---|:---:|:---:|:---:|
629+
| Vector Only (≈bigtool) | 96.8% | 0.897 | 176ms |
630+
| BM25 Only | 91.6% | 0.819 | 1.5ms |
631+
| BM25 + Graph (default) | 91.6% | 0.819 | 14ms |
632+
| Full Pipeline (with embedding) | 96.8% | 0.897 | 172ms |
633+
634+
**Key finding**: Without embedding, BM25+Graph achieves 91.6% Recall — competitive with vector search at 65x faster speed. With embedding enabled, performance matches pure vector search.
635+
636+
### Scale Test: 1068 tools (GitHub full API)
637+
638+
| Strategy | Recall@5 | MRR | Miss% |
639+
|---|:---:|:---:|:---:|
640+
| Vector Only | 88.0% | 0.761 | 12.0% |
641+
| BM25 + Graph | 78.0% | 0.643 | 22.0% |
642+
| Full Pipeline | 88.0% | 0.761 | 12.0% |
643+
566644
### Reproduce it
567645

568646
```bash

0 commit comments

Comments
 (0)