Skip to content

Commit a30aa06

Browse files
authored
Merge pull request #64 from PlanExeOrg/mcp-improvements
Mcp improvements
2 parents 9ae15d3 + 2f8e664 commit a30aa06

32 files changed

Lines changed: 1907 additions & 329 deletions

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,13 @@ Assuming you have an MCP-compatible client (OpenClaw, Cursor, Codex, LM Studio,
5151
The Tool workflow (tools-only, not MCP tasks protocol)
5252

5353
1. `prompt_examples`
54-
2. `task_create`
55-
3. `task_status` (poll every 5 minutes until done)
56-
4. download the result via `task_download` or via `task_file_info`
54+
2. `model_profiles` (optional, helps choose `model_profile`)
55+
3. non-tool step: draft/approve prompt
56+
4. `task_create`
57+
5. `task_status` (poll every 5 minutes until done)
58+
6. download the result via `task_download` or via `task_file_info`
59+
60+
Concurrency note: each `task_create` call returns a new `task_id`; server-side global per-client concurrency is not capped, so clients should track their own parallel tasks.
5761

5862
### Option A: Remote MCP (fastest path)
5963

database_api/model_taskitem.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,31 @@
55
from sqlalchemy_utils import UUIDType
66
from sqlalchemy import JSON
77
from sqlalchemy.orm import column_property
8+
from sqlalchemy import event
9+
10+
11+
def _sanitize_utf8_text(value):
12+
"""Normalize values into valid UTF-8-safe text for persistence."""
13+
if value is None:
14+
return None
15+
16+
if isinstance(value, str):
17+
text = value
18+
elif isinstance(value, (bytes, bytearray, memoryview)):
19+
text = bytes(value).decode("utf-8", errors="replace")
20+
else:
21+
text = str(value)
22+
23+
# Postgres text does not support embedded NULL bytes.
24+
if "\x00" in text:
25+
text = text.replace("\x00", "")
26+
27+
# Replace unpaired surrogates or other non-encodable code points.
28+
try:
29+
text.encode("utf-8", errors="strict")
30+
except UnicodeEncodeError:
31+
text = text.encode("utf-8", errors="replace").decode("utf-8")
32+
return text
833

934
class TaskState(enum.Enum):
1035
pending = 1
@@ -113,3 +138,10 @@ def demo_items(cls) -> list['TaskItem']:
113138
}
114139
)
115140
return [task1, task2, task3]
141+
142+
143+
@event.listens_for(TaskItem, "before_insert")
144+
@event.listens_for(TaskItem, "before_update")
145+
def _sanitize_taskitem_fields(_mapper, _connection, target):
146+
# Enforce valid UTF-8-safe prompt text regardless of writer path.
147+
target.prompt = _sanitize_utf8_text(target.prompt)

database_api/tests/test_taskitem_model.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,36 @@ def test_stop_request_fields_default(self):
3939
self.assertTrue(hasattr(fetched, "run_activity_overview_json"))
4040
self.assertTrue(hasattr(fetched, "run_artifact_layout_version"))
4141
self.assertFalse(bool(fetched.stop_requested))
42+
43+
def test_prompt_invalid_bytes_are_sanitized(self):
44+
with self.app.app_context():
45+
bad_bytes = b"Hello \xe2\x80 world"
46+
task = TaskItem(
47+
state=TaskState.pending,
48+
prompt=bad_bytes,
49+
user_id="test_user",
50+
)
51+
db.session.add(task)
52+
db.session.commit()
53+
54+
fetched = db.session.get(TaskItem, task.id)
55+
self.assertIsInstance(fetched.prompt, str)
56+
# Must be encodable after sanitization.
57+
fetched.prompt.encode("utf-8")
58+
self.assertIn("Hello", fetched.prompt)
59+
self.assertIn("world", fetched.prompt)
60+
61+
def test_prompt_surrogates_are_sanitized(self):
62+
with self.app.app_context():
63+
task = TaskItem(
64+
state=TaskState.pending,
65+
prompt="prefix \ud800 suffix",
66+
user_id="test_user",
67+
)
68+
db.session.add(task)
69+
db.session.commit()
70+
71+
fetched = db.session.get(TaskItem, task.id)
72+
self.assertIsInstance(fetched.prompt, str)
73+
fetched.prompt.encode("utf-8")
74+
self.assertFalse(any(0xD800 <= ord(ch) <= 0xDFFF for ch in fetched.prompt))

docker-compose.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,8 @@ services:
242242
PLANEXE_WORKER_PLAN_URL: ${PLANEXE_WORKER_PLAN_URL:-http://worker_plan:8000}
243243
ports:
244244
- "${PLANEXE_MCP_HTTP_PORT:-8001}:8001"
245+
volumes:
246+
- ./llm_config:/app/llm_config:ro
245247
restart: unless-stopped
246248
healthcheck:
247249
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8001/healthcheck').read()"]

docs/mcp/antigravity.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,13 @@ My interaction history:
1818
4. I didn't meant outbreak, I meant vulcanic
1919
5. your prompt is a bit shorter than the example prompts
2020
6. go ahead create the plan
21-
7. stop that plan you are creating.
22-
8. now create the plan again, this time with ALL details. Last time you had FAST selected that would leave out most details.
23-
9. check status
21+
7. check status
22+
8. status
23+
9. status
2424
10. status
25-
11. status
26-
12. status
27-
13. download the report
28-
14. summarize the report
29-
15. does it correspond to your expectations?
25+
11. download the report
26+
12. summarize the report
27+
13. does it correspond to your expectations?
3028

3129
I had to manually ask about `check status` to get details how the plan creation was going. It's not something that Antigravity can do.
3230

docs/mcp/cursor.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ My interaction with Cursor for creating a plan is like this:
5151
2. I want you to come up with a good prompt
5252
3. I want something ala winter olympics in Italy 2026
5353
4. Slightly different idea. I want Denmark to switch from DKK to EUR. Use the persona of a person representing Denmark's ministers.
54-
5. go ahead create plan with all details
54+
5. go ahead create the plan
5555
6. *wait for 18 minutes until the plan has been created*
5656
7. download the plan
5757

docs/mcp/inspector.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,18 +68,23 @@ When connected follow these steps:
6868
Now there should be a list with tool names and descriptions:
6969
```
7070
prompt_examples
71+
model_profiles
7172
task_create
7273
task_status
7374
task_stop
7475
task_file_info
7576
```
7677

78+
When you inspect `task_create`, the visible input schema includes `prompt` and optional `model_profile`.
79+
The `speed_vs_detail` parameter is intentionally hidden and only set via tool-specific metadata, since it confuses AI agents.
80+
7781
Follow these steps:
7882
![screenshot of mcp inspector invoke tool](inspector_step5_mcp_planexe_org.webp)
7983

8084
1. In the `Tools` panel; Click on the `prompt_examples` tool.
81-
2. In the `prompt_examples` right sidepanel; Click on `Run Tool`.
82-
3. The MCP server should respond with a list of list of example prompts.
85+
2. In the `prompt_examples` right sidepanel; Click on `Run Tool`.
86+
3. The MCP server should respond with a list of example prompts.
87+
4. Optionally run `model_profiles` to inspect available `model_profile` choices before `task_create`.
8388

8489
## Approach 2. MCP server inside docker
8590

docs/mcp/mcp_details.md

Lines changed: 157 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,13 @@ This document lists the MCP tools exposed by PlanExe and example prompts for age
1010
- The primary MCP server runs in the cloud (see `mcp_cloud`).
1111
- The local MCP proxy (`mcp_local`) forwards calls to the server and adds a local download helper.
1212
- Tool responses return JSON in both `content.text` and `structuredContent`.
13+
- Workflow note: drafting and user approval of the prompt is a non-tool step between setup tools and `task_create`.
1314

1415
## Tool Catalog, `mcp_cloud`
1516

1617
### prompt_examples
1718

18-
Returns around five example prompts that show what good prompts look like. Each sample is typically 300800 words: detailed context, requirements, and success criteria. Usually the AI does the heavy lifting: the user has a vague idea, the agent calls `prompt_examples`, then expands that idea into a high-quality prompt (300800 words). The prompt is shown to the user, who can ask for further changes or confirm it’s good to go. When the user confirms, the agent then calls `task_create`. Shorter or vaguer prompts produce lower-quality plans.
19+
Returns around five example prompts that show what good prompts look like. Each sample is typically 300-800 words. Usually the AI does the heavy lifting: the user has a vague idea, the agent calls `prompt_examples`, then expands that idea into a high-quality prompt (300-800 words). A compact prompt shape works best: objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria. The prompt is shown to the user, who can ask for further changes or confirm it’s good to go. When the user confirms, the agent then calls `task_create`. Shorter or vaguer prompts produce lower-quality plans.
1920

2021
Example prompt:
2122
```
@@ -27,7 +28,33 @@ Example call:
2728
{}
2829
```
2930

30-
Response includes `samples` (array of prompt strings, each 300–800 words) and `message`.
31+
Response includes `samples` (array of prompt strings, each ~300-800 words) and `message`.
32+
33+
### model_profiles
34+
35+
Returns profile guidance and model availability for `task_create.model_profile`.
36+
This helps agents pick a profile without knowing internal `llm_config/*.json` details.
37+
Profiles with zero models are omitted from the `profiles` list.
38+
If no models are available in any profile, `model_profiles` returns `isError=true` with `error.code = MODEL_PROFILES_UNAVAILABLE`.
39+
40+
Example prompt:
41+
```
42+
List available model profiles and models.
43+
```
44+
45+
Example call:
46+
```json
47+
{}
48+
```
49+
50+
Response includes:
51+
- `default_profile`
52+
- `profiles[]` with:
53+
- `profile`
54+
- `title`
55+
- `summary`
56+
- `model_count`
57+
- `models[]` (`key`, `provider_class`, `model`, `priority`)
3158

3259
### task_create
3360

@@ -41,11 +68,71 @@ Example call:
4168
{"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes..."}
4269
```
4370

44-
Optional argument:
71+
Optional visible argument:
72+
```text
73+
model_profile: "baseline" | "premium" | "frontier" | "custom"
4574
```
75+
76+
Developer-only hidden metadata (not part of visible tool schema shown to agents):
77+
```text
4678
speed_vs_detail: "ping" | "fast" | "all"
4779
```
4880

81+
Example with visible `model_profile`:
82+
```json
83+
{"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes...", "model_profile": "premium"}
84+
```
85+
86+
Example with hidden metadata override. The `ping` only checks if the LLMs are connected and doesn't trigger a full plan to be created:
87+
```json
88+
{
89+
"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes...",
90+
"metadata": {
91+
"task_create": {
92+
"speed_vs_detail": "ping"
93+
}
94+
}
95+
}
96+
```
97+
98+
Example with hidden metadata override. The `fast` triggers a plan to be created, where the entire Luigi pipeline gets exercised, while skipping as much detail as possible:
99+
```json
100+
{
101+
"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes...",
102+
"metadata": {
103+
"task_create": {
104+
"speed_vs_detail": "fast"
105+
}
106+
}
107+
}
108+
```
109+
110+
Example with hidden metadata override. The `all` is the default setting. Creates a plan with **ALL** details:
111+
```json
112+
{
113+
"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes...",
114+
"metadata": {
115+
"task_create": {
116+
"speed_vs_detail": "all"
117+
}
118+
}
119+
}
120+
```
121+
122+
Counterexamples (do NOT use PlanExe for these):
123+
124+
- "Give me a 5-point checklist for X."
125+
- "Summarize this paragraph in 6 bullets."
126+
- "Rewrite this email."
127+
- "Identify the risks of this project."
128+
- "Make a SWOT for this document."
129+
130+
What to do instead:
131+
132+
- For one-shot outputs, use a normal LLM response directly.
133+
- For PlanExe, send a substantial multi-phase project prompt with scope, constraints, timeline, budget, stakeholders, and success criteria.
134+
- PlanExe always runs a fixed end-to-end pipeline; it does not support selecting only internal pipeline subsets.
135+
49136
### task_status
50137

51138
Fetch status/progress and recent files for a task.
@@ -60,6 +147,13 @@ Example call:
60147
{"task_id": "2d57a448-1b09-45aa-ad37-e69891ff6ec7"}
61148
```
62149

150+
State contract:
151+
152+
- `pending`: queued and waiting for a worker, keep polling.
153+
- `processing`: picked up by a worker, keep polling.
154+
- `completed`: terminal success, proceed to download.
155+
- `failed`: terminal error.
156+
63157
### task_stop
64158

65159
Request an active task to stop.
@@ -135,11 +229,51 @@ Example call:
135229
{"task_id": "2d57a448-1b09-45aa-ad37-e69891ff6ec7", "artifact": "report"}
136230
```
137231

232+
`PLANEXE_PATH` behavior for `task_download`:
233+
- Save directory is `PLANEXE_PATH`, or current working directory if unset.
234+
- Non-existing directories are created automatically.
235+
- If `PLANEXE_PATH` points to a file, download fails.
236+
- Filename is prefixed with task id (for example `<task_id>-030-report.html`).
237+
- Response includes `saved_path` with the exact local file location.
238+
239+
## Minimal error-handling contract
240+
241+
Error payload shape:
242+
```json
243+
{"error": {"code": "SOME_CODE", "message": "Human readable message", "details": {}}}
244+
```
245+
246+
Common cloud/core error codes:
247+
- `TASK_NOT_FOUND`
248+
- `INVALID_USER_API_KEY`
249+
- `USER_API_KEY_REQUIRED`
250+
- `INSUFFICIENT_CREDITS`
251+
- `INTERNAL_ERROR`
252+
- `MODEL_PROFILES_UNAVAILABLE`
253+
- `generation_failed`
254+
- `content_unavailable`
255+
256+
Common local proxy error codes:
257+
- `REMOTE_ERROR`
258+
- `DOWNLOAD_FAILED`
259+
260+
Special case:
261+
- `task_file_info` may return `{}` while the artifact is not ready yet (not an error).
262+
263+
## Concurrency semantics (practical)
264+
265+
- Each `task_create` call creates a new task with a new `task_id`.
266+
- The server does not enforce a global “one active task per client” cap.
267+
- Parallelism is a client orchestration concern:
268+
- start with 1 task
269+
- scale to 2 in parallel if needed
270+
- avoid more than 4 unless you have strong task-tracking UX
271+
138272
## Typical Flow
139273

140274
### 1. Get example prompts
141275

142-
The user often starts with a vague idea. The AI calls `prompt_examples` first to see what good prompts look like (around five samples, 300800 words each), then expands the user’s idea into a high-quality prompt and shows it to the user.
276+
The user often starts with a vague idea. The AI calls `prompt_examples` first to see what good prompts look like (around five samples, typically 300-800 words each), then expands the user’s idea into a high-quality prompt using this compact shape: objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria.
143277

144278
Prompt:
145279
```
@@ -151,7 +285,23 @@ Tool call:
151285
{}
152286
```
153287

154-
### 2. Create a plan
288+
### 2. Inspect model profiles (optional but recommended)
289+
290+
Prompt:
291+
```
292+
Show model profile options and available models.
293+
```
294+
295+
Tool call:
296+
```json
297+
{}
298+
```
299+
300+
### 3. Draft and approve the prompt (non-tool step)
301+
302+
At this step, the agent writes a high-quality prompt draft (typically 300-800 words, with objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria), shows it to the user, and waits for approval.
303+
304+
### 4. Create a plan
155305

156306
The user reviews the prompt and either asks for further changes or confirms it’s good to go. When the user confirms, the agent calls `task_create` with that prompt.
157307

@@ -160,7 +310,7 @@ Tool call:
160310
{"prompt": "..."}
161311
```
162312

163-
### 3. Get status
313+
### 5. Get status
164314

165315
Prompt:
166316
```
@@ -172,7 +322,7 @@ Tool call:
172322
{"task_id": "<task_id_from_task_create>"}
173323
```
174324

175-
### 4. Download the report
325+
### 6. Download the report
176326

177327
Prompt:
178328
```

0 commit comments

Comments
 (0)