Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
a72feb0
Removed the confusing speed_vs_detail parameter. And made "all" the d…
neoneye Feb 23, 2026
875773a
Counterexamples showcasing how NOT to use PlanExe.
neoneye Feb 23, 2026
26fcc48
Align MCP instructions with exposed tools
neoneye Feb 23, 2026
d92519b
Document task_status state contract for callers
neoneye Feb 23, 2026
035e848
Error scenarios.
neoneye Feb 23, 2026
8cf4224
Align MCP public states to pending processing completed failed
neoneye Feb 23, 2026
f7798d2
Clarify MCP state contract and local/cloud agent guidance
neoneye Feb 23, 2026
8b74659
Use X-API-Key for mcp_local auth header
neoneye Feb 23, 2026
dd6bb0f
Clarify MCP download path and URL environment behavior
neoneye Feb 23, 2026
1e99b2d
Add model_profiles tool for runtime profile selection
neoneye Feb 23, 2026
ce1d782
Document minimal MCP error-handling contract
neoneye Feb 23, 2026
70ebb0f
Clarify MCP flow wording and non-tool approval step
neoneye Feb 23, 2026
64cc1cf
Clarify MCP concurrency semantics and client responsibilities
neoneye Feb 23, 2026
4424425
Hide internal whitelist details from model_profiles UX
neoneye Feb 23, 2026
5c56800
Removed "available" field.
neoneye Feb 23, 2026
ea62e72
task_file_info had PLANEXE_MCP_PUBLIC_BASE_URL in its description. Th…
neoneye Feb 23, 2026
6370d01
No model_profiles shown in the mcp model_profiles tool
neoneye Feb 23, 2026
08b9edf
prompt lengths
neoneye Feb 23, 2026
d703f4f
prompt shape
neoneye Feb 23, 2026
e5e7ace
Docs cleanups
neoneye Feb 23, 2026
f98d4f8
mcp docs cleanups
neoneye Feb 23, 2026
33a9f24
mcp tests
neoneye Feb 23, 2026
8619f48
mcp: document task_stop terminal state and edge cases
neoneye Feb 23, 2026
0d594b3
mcp: clarify task_file_info empty-object behavior and add artifact/do…
neoneye Feb 23, 2026
ea558e8
mcp: remove developer-facing speed_vs_detail mention from task_create…
neoneye Feb 23, 2026
58c621f
mcp: reword MODEL_PROFILES_UNAVAILABLE guidance for LLM callers
neoneye Feb 23, 2026
c5f01b5
mcp: warn that task_id cannot be recovered once lost
neoneye Feb 23, 2026
6d6be46
mcp: document progress_percentage range (0-100) and files array purpose
neoneye Feb 23, 2026
06bda98
mcp_local: add user_api_key to task_create schema and sync tool descr…
neoneye Feb 23, 2026
aaf20f1
mcp_local: update AGENTS.md to include user_api_key in visible schema
neoneye Feb 23, 2026
7d23af8
mcp: add tests verifying all tools have output_schema and task_create…
neoneye Feb 23, 2026
0f50268
mcp: change speed_vs_detail default from ping_llm to all_details_but_…
neoneye Feb 23, 2026
1e7f053
mcp: improve tool descriptions to accurately reflect output breadth
neoneye Feb 23, 2026
f25dedf
mcp task_create default to the "admin" user
neoneye Feb 23, 2026
0ff45c5
documentation tweaks
neoneye Feb 23, 2026
d7af301
Prevent the UI from showing Internal Server Error, when there is garb…
neoneye Feb 23, 2026
874d429
Edit TaskItem rows, by using file upload/download. So the textfield d…
neoneye Feb 24, 2026
2f8e664
CI typecheck error
neoneye Feb 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,13 @@ Assuming you have an MCP-compatible client (OpenClaw, Cursor, Codex, LM Studio,
The Tool workflow (tools-only, not MCP tasks protocol)

1. `prompt_examples`
2. `task_create`
3. `task_status` (poll every 5 minutes until done)
4. download the result via `task_download` or via `task_file_info`
2. `model_profiles` (optional, helps choose `model_profile`)
3. non-tool step: draft/approve prompt
4. `task_create`
5. `task_status` (poll every 5 minutes until done)
6. download the result via `task_download` or via `task_file_info`

Concurrency note: each `task_create` call returns a new `task_id`; server-side global per-client concurrency is not capped, so clients should track their own parallel tasks.

### Option A: Remote MCP (fastest path)

Expand Down
32 changes: 32 additions & 0 deletions database_api/model_taskitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,31 @@
from sqlalchemy_utils import UUIDType
from sqlalchemy import JSON
from sqlalchemy.orm import column_property
from sqlalchemy import event


def _sanitize_utf8_text(value):
"""Normalize values into valid UTF-8-safe text for persistence."""
if value is None:
return None

if isinstance(value, str):
text = value
elif isinstance(value, (bytes, bytearray, memoryview)):
text = bytes(value).decode("utf-8", errors="replace")
else:
text = str(value)

# Postgres text does not support embedded NULL bytes.
if "\x00" in text:
text = text.replace("\x00", "")

# Replace unpaired surrogates or other non-encodable code points.
try:
text.encode("utf-8", errors="strict")
except UnicodeEncodeError:
text = text.encode("utf-8", errors="replace").decode("utf-8")
return text

class TaskState(enum.Enum):
pending = 1
Expand Down Expand Up @@ -113,3 +138,10 @@ def demo_items(cls) -> list['TaskItem']:
}
)
return [task1, task2, task3]


@event.listens_for(TaskItem, "before_insert")
@event.listens_for(TaskItem, "before_update")
def _sanitize_taskitem_fields(_mapper, _connection, target):
# Enforce valid UTF-8-safe prompt text regardless of writer path.
target.prompt = _sanitize_utf8_text(target.prompt)
33 changes: 33 additions & 0 deletions database_api/tests/test_taskitem_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,36 @@ def test_stop_request_fields_default(self):
self.assertTrue(hasattr(fetched, "run_activity_overview_json"))
self.assertTrue(hasattr(fetched, "run_artifact_layout_version"))
self.assertFalse(bool(fetched.stop_requested))

def test_prompt_invalid_bytes_are_sanitized(self):
with self.app.app_context():
bad_bytes = b"Hello \xe2\x80 world"
task = TaskItem(
state=TaskState.pending,
prompt=bad_bytes,
user_id="test_user",
)
db.session.add(task)
db.session.commit()

fetched = db.session.get(TaskItem, task.id)
self.assertIsInstance(fetched.prompt, str)
# Must be encodable after sanitization.
fetched.prompt.encode("utf-8")
self.assertIn("Hello", fetched.prompt)
self.assertIn("world", fetched.prompt)

def test_prompt_surrogates_are_sanitized(self):
with self.app.app_context():
task = TaskItem(
state=TaskState.pending,
prompt="prefix \ud800 suffix",
user_id="test_user",
)
db.session.add(task)
db.session.commit()

fetched = db.session.get(TaskItem, task.id)
self.assertIsInstance(fetched.prompt, str)
fetched.prompt.encode("utf-8")
self.assertFalse(any(0xD800 <= ord(ch) <= 0xDFFF for ch in fetched.prompt))
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,8 @@ services:
PLANEXE_WORKER_PLAN_URL: ${PLANEXE_WORKER_PLAN_URL:-http://worker_plan:8000}
ports:
- "${PLANEXE_MCP_HTTP_PORT:-8001}:8001"
volumes:
- ./llm_config:/app/llm_config:ro
restart: unless-stopped
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8001/healthcheck').read()"]
Expand Down
14 changes: 6 additions & 8 deletions docs/mcp/antigravity.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,13 @@ My interaction history:
4. I didn't meant outbreak, I meant vulcanic
5. your prompt is a bit shorter than the example prompts
6. go ahead create the plan
7. stop that plan you are creating.
8. now create the plan again, this time with ALL details. Last time you had FAST selected that would leave out most details.
9. check status
7. check status
8. status
9. status
10. status
11. status
12. status
13. download the report
14. summarize the report
15. does it correspond to your expectations?
11. download the report
12. summarize the report
13. does it correspond to your expectations?

I had to manually ask about `check status` to get details how the plan creation was going. It's not something that Antigravity can do.

Expand Down
2 changes: 1 addition & 1 deletion docs/mcp/cursor.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ My interaction with Cursor for creating a plan is like this:
2. I want you to come up with a good prompt
3. I want something ala winter olympics in Italy 2026
4. Slightly different idea. I want Denmark to switch from DKK to EUR. Use the persona of a person representing Denmark's ministers.
5. go ahead create plan with all details
5. go ahead create the plan
6. *wait for 18 minutes until the plan has been created*
7. download the plan

Expand Down
9 changes: 7 additions & 2 deletions docs/mcp/inspector.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,18 +68,23 @@ When connected follow these steps:
Now there should be a list with tool names and descriptions:
```
prompt_examples
model_profiles
task_create
task_status
task_stop
task_file_info
```

When you inspect `task_create`, the visible input schema includes `prompt` and optional `model_profile`.
The `speed_vs_detail` parameter is intentionally hidden and only set via tool-specific metadata, since it confuses AI agents.

Follow these steps:
![screenshot of mcp inspector invoke tool](inspector_step5_mcp_planexe_org.webp)

1. In the `Tools` panel; Click on the `prompt_examples` tool.
2. In the `prompt_examples` right sidepanel; Click on `Run Tool`.
3. The MCP server should respond with a list of list of example prompts.
2. In the `prompt_examples` right sidepanel; Click on `Run Tool`.
3. The MCP server should respond with a list of example prompts.
4. Optionally run `model_profiles` to inspect available `model_profile` choices before `task_create`.

## Approach 2. MCP server inside docker

Expand Down
164 changes: 157 additions & 7 deletions docs/mcp/mcp_details.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ This document lists the MCP tools exposed by PlanExe and example prompts for age
- The primary MCP server runs in the cloud (see `mcp_cloud`).
- The local MCP proxy (`mcp_local`) forwards calls to the server and adds a local download helper.
- Tool responses return JSON in both `content.text` and `structuredContent`.
- Workflow note: drafting and user approval of the prompt is a non-tool step between setup tools and `task_create`.

## Tool Catalog, `mcp_cloud`

### prompt_examples

Returns around five example prompts that show what good prompts look like. Each sample is typically 300800 words: detailed context, requirements, and success criteria. Usually the AI does the heavy lifting: the user has a vague idea, the agent calls `prompt_examples`, then expands that idea into a high-quality prompt (300800 words). The prompt is shown to the user, who can ask for further changes or confirm it’s good to go. When the user confirms, the agent then calls `task_create`. Shorter or vaguer prompts produce lower-quality plans.
Returns around five example prompts that show what good prompts look like. Each sample is typically 300-800 words. Usually the AI does the heavy lifting: the user has a vague idea, the agent calls `prompt_examples`, then expands that idea into a high-quality prompt (300-800 words). A compact prompt shape works best: objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria. The prompt is shown to the user, who can ask for further changes or confirm it’s good to go. When the user confirms, the agent then calls `task_create`. Shorter or vaguer prompts produce lower-quality plans.

Example prompt:
```
Expand All @@ -27,7 +28,33 @@ Example call:
{}
```

Response includes `samples` (array of prompt strings, each 300–800 words) and `message`.
Response includes `samples` (array of prompt strings, each ~300-800 words) and `message`.

### model_profiles

Returns profile guidance and model availability for `task_create.model_profile`.
This helps agents pick a profile without knowing internal `llm_config/*.json` details.
Profiles with zero models are omitted from the `profiles` list.
If no models are available in any profile, `model_profiles` returns `isError=true` with `error.code = MODEL_PROFILES_UNAVAILABLE`.

Example prompt:
```
List available model profiles and models.
```

Example call:
```json
{}
```

Response includes:
- `default_profile`
- `profiles[]` with:
- `profile`
- `title`
- `summary`
- `model_count`
- `models[]` (`key`, `provider_class`, `model`, `priority`)

### task_create

Expand All @@ -41,11 +68,71 @@ Example call:
{"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes..."}
```

Optional argument:
Optional visible argument:
```text
model_profile: "baseline" | "premium" | "frontier" | "custom"
```

Developer-only hidden metadata (not part of visible tool schema shown to agents):
```text
speed_vs_detail: "ping" | "fast" | "all"
```

Example with visible `model_profile`:
```json
{"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes...", "model_profile": "premium"}
```

Example with hidden metadata override. The `ping` only checks if the LLMs are connected and doesn't trigger a full plan to be created:
```json
{
"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes...",
"metadata": {
"task_create": {
"speed_vs_detail": "ping"
}
}
}
```

Example with hidden metadata override. The `fast` triggers a plan to be created, where the entire Luigi pipeline gets exercised, while skipping as much detail as possible:
```json
{
"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes...",
"metadata": {
"task_create": {
"speed_vs_detail": "fast"
}
}
}
```

Example with hidden metadata override. The `all` is the default setting. Creates a plan with **ALL** details:
```json
{
"prompt": "Weekly meetup for humans where participants are randomly paired every 5 minutes...",
"metadata": {
"task_create": {
"speed_vs_detail": "all"
}
}
}
```

Counterexamples (do NOT use PlanExe for these):

- "Give me a 5-point checklist for X."
- "Summarize this paragraph in 6 bullets."
- "Rewrite this email."
- "Identify the risks of this project."
- "Make a SWOT for this document."

What to do instead:

- For one-shot outputs, use a normal LLM response directly.
- For PlanExe, send a substantial multi-phase project prompt with scope, constraints, timeline, budget, stakeholders, and success criteria.
- PlanExe always runs a fixed end-to-end pipeline; it does not support selecting only internal pipeline subsets.

### task_status

Fetch status/progress and recent files for a task.
Expand All @@ -60,6 +147,13 @@ Example call:
{"task_id": "2d57a448-1b09-45aa-ad37-e69891ff6ec7"}
```

State contract:

- `pending`: queued and waiting for a worker, keep polling.
- `processing`: picked up by a worker, keep polling.
- `completed`: terminal success, proceed to download.
- `failed`: terminal error.

### task_stop

Request an active task to stop.
Expand Down Expand Up @@ -135,11 +229,51 @@ Example call:
{"task_id": "2d57a448-1b09-45aa-ad37-e69891ff6ec7", "artifact": "report"}
```

`PLANEXE_PATH` behavior for `task_download`:
- Save directory is `PLANEXE_PATH`, or current working directory if unset.
- Non-existing directories are created automatically.
- If `PLANEXE_PATH` points to a file, download fails.
- Filename is prefixed with task id (for example `<task_id>-030-report.html`).
- Response includes `saved_path` with the exact local file location.

## Minimal error-handling contract

Error payload shape:
```json
{"error": {"code": "SOME_CODE", "message": "Human readable message", "details": {}}}
```

Common cloud/core error codes:
- `TASK_NOT_FOUND`
- `INVALID_USER_API_KEY`
- `USER_API_KEY_REQUIRED`
- `INSUFFICIENT_CREDITS`
- `INTERNAL_ERROR`
- `MODEL_PROFILES_UNAVAILABLE`
- `generation_failed`
- `content_unavailable`

Common local proxy error codes:
- `REMOTE_ERROR`
- `DOWNLOAD_FAILED`

Special case:
- `task_file_info` may return `{}` while the artifact is not ready yet (not an error).

## Concurrency semantics (practical)

- Each `task_create` call creates a new task with a new `task_id`.
- The server does not enforce a global “one active task per client” cap.
- Parallelism is a client orchestration concern:
- start with 1 task
- scale to 2 in parallel if needed
- avoid more than 4 unless you have strong task-tracking UX

## Typical Flow

### 1. Get example prompts

The user often starts with a vague idea. The AI calls `prompt_examples` first to see what good prompts look like (around five samples, 300800 words each), then expands the user’s idea into a high-quality prompt and shows it to the user.
The user often starts with a vague idea. The AI calls `prompt_examples` first to see what good prompts look like (around five samples, typically 300-800 words each), then expands the user’s idea into a high-quality prompt using this compact shape: objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria.

Prompt:
```
Expand All @@ -151,7 +285,23 @@ Tool call:
{}
```

### 2. Create a plan
### 2. Inspect model profiles (optional but recommended)

Prompt:
```
Show model profile options and available models.
```

Tool call:
```json
{}
```

### 3. Draft and approve the prompt (non-tool step)

At this step, the agent writes a high-quality prompt draft (typically 300-800 words, with objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria), shows it to the user, and waits for approval.

### 4. Create a plan

The user reviews the prompt and either asks for further changes or confirms it’s good to go. When the user confirms, the agent calls `task_create` with that prompt.

Expand All @@ -160,7 +310,7 @@ Tool call:
{"prompt": "..."}
```

### 3. Get status
### 5. Get status

Prompt:
```
Expand All @@ -172,7 +322,7 @@ Tool call:
{"task_id": "<task_id_from_task_create>"}
```

### 4. Download the report
### 6. Download the report

Prompt:
```
Expand Down
Loading