Agents: read this whole page. It is everything you need to create UGC videos with agent-media — no other docs required.
agent-media turns a short description (or a photo) + a script into a finished, captioned, lip-synced vertical UGC video. Works in Claude Code, Cursor, or any MCP / HTTP agent. One Bearer token authenticates everything.
- One-liner (recommended):
npx skills add gitroomhq/agent-media— installs all of agent-media's skills into your agent (Claude Code, Cursor, etc.). - Claude Code plugin (skills + MCP tools): inside a Claude Code session run
/plugin marketplace add gitroomhq/agent-mediathen/plugin install agent-media@agent-media. - Any MCP agent: run the MCP server
npx -y -p @agentmedia/mcp-server@latest agent-media-mcpwith envAGENT_MEDIA_API_KEY=ma_.... All skills self-describe viatools/list. - Plain HTTP: call the REST API directly (below).
Get a Bearer token: npm i -g agent-media-cli && agent-media login (stores it at ~/.agent-media/credentials.json), or grab the ma_* token from the dashboard. Every call uses Authorization: Bearer ma_.... You need credits on the account (buy at agent-media.ai).
make_ugc_video runs the whole pipeline — portrait → character sheet → lip-synced talking head → captions — in a single request.
curl -X POST https://api.agent-media.ai/v1/skills/make_ugc_video/run \
-H "Authorization: Bearer ma_..." -H "Content-Type: application/json" \
-d '{ "description": "a friendly 28-year-old woman, soft daylight",
"script": "Okay, this changed my whole morning routine — you have to try it.",
"duration": 10, "subtitles": true }'
# -> 202 { "skill_run_id": "..." } then poll:
curl https://api.agent-media.ai/v1/skills/runs/<skill_run_id> -H "Authorization: Bearer ma_..."
# when status == "succeeded", final_output.video_url is your MP4.In Claude/Cursor you just say it in words: "Make a 10s UGC video of a friendly woman saying '…' with TikTok captions." — the agent picks the skill.
- REST:
POST https://api.agent-media.ai/v1/skills/<slug>/run(Bearer auth, JSON body) →202with arun_id(orskill_run_idformake_ugc_video). - Poll: composed skill →
GET /v1/skills/runs/<skill_run_id>; single primitive →GET /v1/primitives/runs/<run_id>. Output isfinal_output.video_url/artifacts[].url. - MCP: call the tool of the same name; arguments = the skill's input fields.
- Exact input schema (always current):
GET https://api.agent-media.ai/v1/public/skillsor MCPtools/list. Trust that over any hand-written list.
make_portrait(v1.0.0) — Generate one photoreal portrait. Optionally takes a reference photo (R2-hosted) and a realism preset. Identity is locked from the reference image when provided.make_character_sheet(v1.0.0) — Generate a magazine-style character sheet from a portrait. Provide EITHER portrait_url (must be R2-hosted) OR portrait_image_base64 (PNG/JPEG, ≤10 MB; the API will upload it to R2 first). Optional ≤10-word description for name/age/vibe hints.make_simple_selfie(v1.0.0) — Generate a 5/10/15-second vertical UGC selfie video from a character sheet. Two modes: provide a script for a lip-synced talking-head (2-4 words/sec), OR provide scene_action for a non-speech clip (dancing, b-roll, vibes) with optional background_music and no dialogue. Subject is framed waist-up, hands free, TikTok aesthetic.make_product_in_hands(v1.0.0) — Generate a 5/10/15s vertical UGC video where your character holds, wears, and shows a product. Provide a character_sheet_url (R2-hosted) and the product image (product_image_url — any https URL — OR product_image_base64; re-hosted to R2 automatically). Two modes: script for a lip-synced talking-head product review (2-4 words/sec), OR scene_action for a silent demo / b-roll. Set subject (e.g. "a young woman") to lock the person's gender/appearance so a gendered product can't drift it. framing: "close_up" (chest-up, default) or "full_body" (head-to-toe, for turn-arounds / showing the whole outfit). Both the person and the exact product are locked from the reference images.make_subtitles(v1.0.0) — Burn TikTok / Hormozi-style captions onto any vNext video (R2-hosted). Auto-transcribes via Whisper when transcript is omitted. Styles: hormozi (default), tiktok, minimal.make_wireframe(v1.0.0) — Generate a photographic storyboard / wireframe board from a character sheet (R2-hosted) + script. Multi-panel grid showing the same person performing the action progression, 4 / 6 / 8 / 10 numbered panels.make_lip_sync(v1.0.0) — Bring your own audio: lip-sync a face (an R2-hosted image / character sheet, OR an existing clip) to a provided audio track. No text-to-speech or voice cloning — the character speaks your uploaded recording. Output is a 9:16 talking-head video.make_ugc_video(v1.0.0) — End-to-end UGC video in one call. Provide EITHER a text description of the person, OR a portrait URL (R2-hosted), OR an uploaded image. The pipeline auto-generates the missing portrait, builds a character sheet, and produces a 5/10/15s vertical selfie video with native lip-synced audio of your script.
Rules every skill follows: scripts are paced 2–4 words/sec of duration (or omit the script and pass scene_action for a non-speech / dancing / b-roll clip); all media URLs you pass in must be R2-hosted (the API uploads base64 images for you); each run costs credits. Reuse a character by keeping its character_sheet_url and feeding it to make_simple_selfie for each new script.
Post a generated video to the user's TikTok / Instagram / X — via REST, the CLI, or MCP tools:
POST /v1/social/connect { provider }→ returns an OAuthurlthe user opens to authorize (agents can't OAuth for them). CLI:agent-media social connect x. MCP:social_connect.GET /v1/social/channels→ the user's connected channels[{ id, name, provider, profile }]. CLI:agent-media social channels. MCP:social_channels.POST /v1/social/publish { video_url, channel_ids, caption, type:"now"|"schedule", date? }→ re-hosts the R2 video on the network and posts/schedules it; returns{ success, media_id, post_ids }. CLI:agent-media social publish. MCP:social_publish.
See skills/publish-to-social/SKILL.md for the full flow.
- reference/auth.md — first-time setup
- reference/pacing.md — the 2–4 words-per-second script rule
- reference/realism-rubric.md — realism props baked into every prompt
This repo is generated. The source of truth is the agent-media private monorepo. A GitHub Action mirrors the public-skill/ subtree here on every push. Do not commit hand-edits — they will be overwritten.
License: Apache-2.0.