Add UI-equivalent slide editing to headless API so an agent can do:
- "Make slide 3 more like X"
- update only that slide image
- rebuild PPTX
- return updated deck artifact
This should mirror the current in-UI edit experience, but over API.
Implement Milestone 2A:
- Edit a single slide in an existing job
- Persist edited image in job outputs
- Re-export PPTX using updated slides
- Return updated status + artifact URLs
No major architecture rewrite required.
- Job storage exists at
server/storage/jobs/<jobId>/ manifest.jsontracks slides + status- Slide images exist in
outputs/(e.g.slide-1.png) pptxexport service already works in headless mode
Support either:
application/json(recommended)- or
multipart/form-data(if optionally uploading additional refs)
{
"instruction": "Make this slide cleaner and more executive, reduce text density, emphasize 3 bullet points.",
"provider": "gemini",
"rebuildPptx": true
}provider:gemini | openai(default: job provider orgemini)rebuildPptx: boolean (default:true)preserveAspect: boolean (default:true, forced 16:9 behavior)
{
"jobId": "deck_...",
"slideNumber": 3,
"editJob": {
"id": "edit_2026...",
"status": "queued",
"statusUrl": "/api/deck/jobs/deck_.../edits/edit_2026..."
}
}Note: Can also run synchronously and return final URLs, but async is safer.
{
"jobId": "deck_...",
"editId": "edit_...",
"status": "queued|editing|reexporting|done|error",
"slideNumber": 3,
"error": null,
"updatedAt": "..."
}Rebuild deck from current slide images + manifest.
{
"force": true
}{
"jobId": "deck_...",
"status": "done",
"pptxUrl": "/api/deck/jobs/deck_.../pptx"
}Extend manifest.json to track edit history:
{
"jobId": "deck_...",
"status": "done",
"slides": [
{
"slideNumber": 3,
"title": "...",
"visualPrompt": "...",
"imagePath": "outputs/slide-3.png",
"versions": [
{
"version": 1,
"imagePath": "outputs/slide-3.v1.png",
"source": "initial"
},
{
"version": 2,
"imagePath": "outputs/slide-3.v2.png",
"source": "edit",
"instruction": "Make this slide cleaner...",
"editedAt": "2026-..."
}
]
}
],
"edits": [
{
"editId": "edit_...",
"slideNumber": 3,
"instruction": "...",
"status": "done",
"createdAt": "...",
"completedAt": "..."
}
]
}Minimum requirement:
- preserve previous image version (
slide-3.v1.png) - write edited output (
slide-3.v2.png) - update active
imagePathfor slide 3 to latest version
server/services/slideEdit.ts
Functions:
editSlide(jobId, slideNumber, instruction, options)- load current slide image (base64)
- call provider edit model:
- Gemini: existing image edit flow with prompt + inline image
- write new version file
- update manifest slide entry
- return new image metadata
After successful edit (if rebuildPptx=true):
- call existing
pptxexport service - overwrite
outputs/deck.pptx - update manifest timestamps
- Use existing p-queue
- edit tasks should queue per job to avoid concurrent write conflicts
Use a deterministic edit wrapper prompt:
Edit this existing slide image according to the instruction.
Instruction: "{instruction}"
Constraints:
- Preserve 16:9 aspect ratio
- Keep overall brand style coherent with deck
- Do not alter unrelated regions unnecessarily
- Maintain legibility of visible text unless instruction requests text changesIf you support brand profile style reinforcement, append profile style prompt.
Return explicit errors:
404job not found404slide number not found in manifest400missing/empty instruction409job currently in incompatible state (optional)500provider/edit/export failures with safe message
Examples:
{ "error": "Slide 3 not found for job deck_..." }
{ "error": "instruction is required" }- Generate a deck via
/api/deck/generate - Call slide edit endpoint for slide N
- Poll edit status to
done - Fetch
/api/deck/jobs/:jobId/resultand confirm slide N image URL changed - Download
/api/deck/jobs/:jobId/pptxand confirm updated slide present - Manifest contains edit history and prior version retained
curl -X POST http://<HOST>:8787/api/deck/jobs/<JOB_ID>/slides/3/edit \
-H "Content-Type: application/json" \
-d '{
"instruction": "Make this slide cleaner, higher contrast, fewer words, stronger hierarchy.",
"provider": "gemini",
"rebuildPptx": true
}'curl http://<HOST>:8787/api/deck/jobs/<JOB_ID>/edits/<EDIT_ID>curl -o updated.pptx http://<HOST>:8787/api/deck/jobs/<JOB_ID>/pptxPOST /api/deck/jobs/:jobId/slides/:slideNumber/regenerate(use original visualPrompt, no edit image)POST /api/deck/jobs/:jobId/slides/:slideNumber/revert(switch active image to prior version)PATCH /api/deck/jobs/:jobId/slides/:slideNumber/prompt(update visualPrompt then regenerate)- webhook callback when edit done
- Core implementation: ~2-4 hours
- With versioning + robust tests: ~1 day
This is the smallest addition that unlocks true conversational slide refinement from OpenClaw/agents.