Build a small TypeScript CLI that takes a Figma URL and API token, fetches the selected node and required related data via the Figma REST API, and emits a deterministic artifact bundle optimized for code agents.
The main value is not raw retrieval. It is the normalization layer that converts noisy Figma JSON into a compact, legible, implementation-oriented representation of structural truth.
The CLI is the primary interface. A thin MCP wrapper may be added later for Claude Desktop, but only as a transport layer over the same core library.
Figma’s REST API supports exactly the primitives needed here: scoped file/node JSON retrieval using ids, optional shallow traversal using depth, optional vector export data via geometry=paths, and rendered images or SVGs for specific nodes. Figma’s own MCP implementation guidance also follows the same pattern: parse URL, fetch structured design context, fetch a screenshot, and fall back to sparse metadata plus targeted child fetches when responses are too large. (developers.figma.com)
Given a Figma node URL, emit:
- one visual reference artifact,
- one normalized structural artifact,
- one sparse outline artifact,
- optional asset exports,
- one small markdown context file intended for direct agent consumption.
The output should be stable, deterministic, diffable, and token-efficient.
Do not build a full replacement for the Figma MCP server.
Specifically out of scope for v1:
- writing back to Figma,
- generalized conversational design exploration across many nodes,
- full-fidelity vector reconstruction for every node,
- design generation,
- Code Connect integration,
- plugin authoring,
- image processing beyond download/export,
- full-file ingestion by default.
The system exists to help agents reliably infer:
- hierarchy,
- layout,
- text semantics,
- component boundaries,
- reusable assets,
- design-token bindings,
- implementation-relevant styling.
Raw Figma JSON is an input format, not a useful output format.
Every fetch must produce a screenshot or render of the target node, because structural interpretation needs a visual backstop. Figma’s own implementation workflow treats the screenshot as the source of truth for validation. (Figma Developer Docs)
Large or deeply nested frames should be handled with a staged retrieval strategy:
- shallow fetch,
- outline generation,
- targeted expansion of relevant children.
This mirrors Figma MCP’s documented fallback when full design context is too large. (Figma Developer Docs)
Prefer compact, opinionated schemas over exhaustive mirroring of the REST API.
Heuristics are acceptable; opaque inference is not. Every normalization rule should be explicit, testable, and stable.
figma-fetch "<figma-url>" --token "$FIGMA_TOKEN" --out ./artifactsOptional flags:
--format json
--depth 2
--include-geometry
--include-plugin-data shared
--include-svg
--include-assets
--debugartifacts/
manifest.json
context.md
visual/
frame.png
frame.svg # optional
structure/
raw-node.json
normalized-node.json
outline.xml
outline.json
tokens/
variables.json
colors.json
typography.json
spacing.json
assets/
...
logs/
fetch-metadata.json # optional debug
A Figma URL containing:
- file key
- node ID
Examples:
https://www.figma.com/design/:fileKey/:fileName?node-id=123-456- equivalent share URLs that still contain
node-id
Figma’s docs confirm the URL shape and that the file key and node ID can be parsed directly from it. (Figma Developer Docs)
Implement a dedicated parser that returns:
type ParsedFigmaUrl = {
fileKey: string
nodeId: string
originalUrl: string
}Normalization rule:
- convert
123-456from URL form into123:456for API requests where required.
If URL parsing fails, exit with a precise error.
For the selected node:
- Fetch node subtree JSON.
- Fetch rendered PNG.
- Optionally fetch SVG if the root appears vector-friendly.
- Build sparse outline from returned JSON.
- Normalize root and immediate descendants.
- Decide whether deeper expansion is needed.
Use:
GET /v1/files/:key/nodes?ids=:nodeIdGET /v1/images/:key?ids=:nodeId&format=png&scale=2
The nodes endpoint is the most direct fit for selected-node retrieval. It supports depth, geometry=paths, and plugin data inclusion. The image endpoint supports PNG/SVG/PDF/JPG exports, scaling, and SVG metadata options such as including element IDs or node IDs. (Figma Developer Docs)
For first pass:
depth=2by default,- no
geometry=paths, - no plugin data unless explicitly requested,
format=png,scale=2.
Rationale:
- keep payload small,
- get enough hierarchy for structural inference,
- avoid vector noise unless needed.
If normalization detects ambiguity or truncation risk, refetch specific child nodes selectively.
Expansion triggers:
- node has many descendants,
- important layout container has incomplete child information,
- text nodes require deeper inspection,
- component instances require referenced component metadata,
- vector/icon nodes need better export data,
- image fills need extraction,
- variable bindings exist but context is incomplete.
This follows the same basic idea as Figma MCP’s “metadata first, then child node fetches” fallback. (Figma Developer Docs)
Support personal access token via:
X-Figma-Tokenheader
Design auth so it can later support OAuth without changing the core fetch pipeline.
Auth module contract:
type FigmaAuth = {
token: string
header(): Record<string, string>
}src/
cli.ts
config.ts
errors.ts
figma/
auth.ts
client.ts
url.ts
endpoints.ts
fetch-node.ts
fetch-image.ts
fetch-assets.ts
types-raw.ts
normalize/
index.ts
classify.ts
node.ts
layout.ts
style.ts
text.ts
components.ts
variables.ts
assets.ts
outline.ts
heuristics.ts
output/
manifest.ts
context-md.ts
write.ts
schemas/
raw.ts
normalized.ts
outline.ts
manifest.ts
util/
fs.ts
hash.ts
log.ts
retry.ts
- TypeScript
- Zod
- native
fetchorundici commanderoryargsp-limittsxfor local devvitest
Do not add heavy framework dependencies.
This is the most important part of the system.
The normalized model should be:
- much smaller than raw Figma JSON,
- recursively structured,
- implementation-oriented,
- explicit about uncertainty.
type NormalizedNode = {
id: string
name: string
type: NormalizedNodeType
role: NormalizedRole | null
visible: boolean
bounds: Bounds | null
rotation: number | null
opacity: number | null
hierarchy: {
parentId: string | null
depth: number
childCount: number
path: string[]
}
layout: NormalizedLayout | null
appearance: NormalizedAppearance | null
text: NormalizedText | null
component: NormalizedComponentInfo | null
variables: NormalizedVariableBindings | null
asset: NormalizedAssetInfo | null
semantics: {
likelyInteractive: boolean
likelyTextInput: boolean
likelyIcon: boolean
likelyImage: boolean
likelyMask: boolean
likelyReusableComponent: boolean
}
children: NormalizedNode[]
diagnostics: {
sourceNodeType: string
omittedFields: string[]
warnings: string[]
confidence: "high" | "medium" | "low"
}
}The normalization layer must answer the questions an agent actually has:
- What is this thing?
- How is it laid out?
- What are its children?
- Which nodes matter for implementation?
- Which values are literal versus token-bound?
- What can be ignored safely?
- What should become an asset versus code?
Map Figma node types into a smaller working set.
Example:
type NormalizedNodeType =
| "document"
| "page"
| "frame"
| "group"
| "component"
| "instance"
| "variant-set"
| "text"
| "shape"
| "vector"
| "image"
| "line"
| "boolean-operation"
| "mask"
| "section"
| "unknown"Use raw Figma node type as input, but normalize to categories that matter for implementation.
Agents care about likely UI semantics more than literal Figma layer labels.
Infer a best-effort role:
type NormalizedRole =
| "screen"
| "container"
| "stack"
| "grid"
| "card"
| "button"
| "icon-button"
| "label"
| "heading"
| "body-text"
| "input"
| "image"
| "icon"
| "divider"
| "badge"
| "avatar"
| "list"
| "list-item"
| "modal"
| "navigation"
| "unknown"Inference sources:
- node type,
- name,
- text content,
- auto-layout configuration,
- dimensions,
- child structure,
- component metadata.
Rules should be conservative. Prefer unknown over overconfident bad guesses.
Layout is the highest-value extraction.
Represent only implementation-relevant properties.
type NormalizedLayout = {
mode: "none" | "horizontal" | "vertical" | "absolute"
sizing: {
horizontal: "fixed" | "fill" | "hug" | "unknown"
vertical: "fixed" | "fill" | "hug" | "unknown"
}
align: {
main: "start" | "center" | "end" | "space-between" | "unknown"
cross: "start" | "center" | "end" | "stretch" | "baseline" | "unknown"
}
padding: {
top: number
right: number
bottom: number
left: number
} | null
gap: number | null
wrap: boolean | null
constraints: {
horizontal: string | null
vertical: string | null
} | null
position: {
x: number
y: number
positioning: "flow" | "absolute"
} | null
clipsContent: boolean | null
}Translate Figma-specific concepts into implementation-facing terms:
- auto layout horizontal →
mode: "horizontal" - auto layout vertical →
mode: "vertical" - no auto layout and positioned children →
mode: "absolute" - Figma hug/fill/fixed semantics → explicit sizing enum
- collapse redundant defaults where possible
Important: retain raw-to-normalized traceability in diagnostics for debugging.
Only include appearance fields that materially affect implementation.
type NormalizedAppearance = {
fills: NormalizedPaint[]
strokes: NormalizedStroke[]
cornerRadius: CornerRadius | null
effects: NormalizedEffect[]
blendMode: string | null
opacity: number | null
}Each fill/stroke should distinguish:
- solid literal,
- gradient,
- image fill,
- token-bound value.
type NormalizedPaint = {
kind: "solid" | "gradient" | "image" | "video" | "unknown"
visible: boolean
color: string | null
opacity: number | null
tokenRef: string | null
imageRef: string | null
}Do not carry every raw paint property through unless needed.
Text nodes should be flattened into a highly legible representation.
type NormalizedText = {
content: string
charactersLength: number
style: {
fontFamily: string | null
fontWeight: number | null
fontSize: number | null
lineHeight: string | number | null
letterSpacing: string | number | null
textCase: string | null
textAlignHorizontal: string | null
textAlignVertical: string | null
}
color: string | null
tokenRefs: string[]
semanticKind: "heading" | "label" | "body" | "caption" | "button" | "unknown"
truncation: {
maxLines: number | null
ellipsis: boolean | null
} | null
}- preserve actual text content,
- strip irrelevant raw text metadata,
- identify typography token references where possible,
- infer semantic kind conservatively from size, weight, name, and context.
Figma exposes component mappings and component-related metadata in file responses, and nodes also expose componentPropertyReferences. That should be surfaced directly because it is high-value implementation context. (Figma Developer Docs)
type NormalizedComponentInfo = {
kind: "none" | "component" | "instance" | "component-set"
componentId: string | null
componentName: string | null
componentSetId: string | null
variantProperties: Record<string, string>
propertyReferences: Record<string, string>
isExposedToAgentAsReusable: boolean
}Help the agent answer:
- is this a reusable component?
- is this an instance of an existing component?
- what variant dimensions matter?
- should this be implemented using an existing local component?
Figma nodes expose boundVariables, explicitVariableModes, and plugin/shared plugin data when requested. These fields are core to the output because they distinguish literal styling from token-driven styling. (Figma Developer Docs)
type NormalizedVariableBindings = {
bindings: Array<{
field: string
tokenId: string
tokenName: string | null
collectionId: string | null
modeId: string | null
resolvedType: "color" | "number" | "string" | "boolean" | "unknown"
}>
explicitModes: Record<string, string>
}-
always distinguish literal values from variable-bound values,
-
where possible, expose both:
- resolved literal value,
- token reference.
Example:
{
"fills": [
{
"kind": "solid",
"color": "#FFFFFF",
"tokenRef": "color.bg.surface"
}
]
}This is much more useful than raw alias objects.
Agents need clear answers about whether a thing should be code or asset.
type NormalizedAssetInfo = {
kind: "none" | "svg" | "bitmap" | "mixed"
exportSuggested: boolean
reason: string | null
exportNodeIds: string[]
imageRefs: string[]
}Mark as likely asset when:
- image fill is present,
- vector complexity exceeds threshold,
- node is named like icon/logo/illustration,
- boolean/vector composition is unlikely to be worth reconstructing in code.
Do not overfit. The goal is “reasonable default export advice.”
The outline is the cheap navigation layer.
Emit both JSON and XML.
type OutlineNode = {
id: string
name: string
type: string
role: string | null
visible: boolean
bounds: Bounds | null
childCount: number
children?: OutlineNode[]
}<frame id="123:456" name="Login Card" role="card" w="320" h="240">
<text id="123:457" name="Title" role="heading" />
<frame id="123:458" name="Form Fields" role="stack" />
<instance id="123:459" name="Primary Button" role="button" />
</frame>The outline exists so an agent can decide what to expand without reading full normalized JSON.
Emit a machine-readable manifest describing produced artifacts.
type Manifest = {
source: {
fileKey: string
nodeId: string
fileName?: string
version?: string
lastModified?: string
}
outputs: {
png?: string
svg?: string
rawNodeJson: string
normalizedNodeJson: string
outlineJson: string
outlineXml: string
contextMd: string
variablesJson?: string
colorsJson?: string
typographyJson?: string
spacingJson?: string
assets: string[]
}
}This file is the primary entrypoint for Claude Code.
It should be short, stable, and explicit.
# Figma context
## Source
- File key: ...
- Node ID: ...
- File version: ...
## Visual reference
- PNG: ./visual/frame.png
## Structural summary
- Root type: frame
- Role: card
- Size: 320x240
- Layout: vertical auto-layout
- Padding: 24/24/24/24
- Gap: 16
## Important children
1. Heading text
2. Email input container
3. Primary button instance
## Tokens used
- color.bg.surface
- color.text.primary
- spacing.24
- radius.md
## Assets
- logo.svg
- hero-illustration.png
## Notes for implementation
- Prefer stack layout
- Root appears reusable
- Button is a component instance
- Use exported asset for icon cluster rather than recreating in codeKeep this concise. It should summarize, not duplicate the JSON artifacts.
These heuristics are the heart of the project.
Implement them as isolated pure functions with tests.
A node is likely a container if:
- frame/group/component/instance with children,
- nontrivial bounds,
- layout or padding properties present.
A node is likely a stack if:
- auto layout exists,
- children are arranged along one axis,
- gap/padding properties are meaningful.
Heuristic inputs:
- font size,
- font weight,
- node name,
- text length,
- parent context.
Example:
- large bold short text near top of card → heading
- small text adjacent to field → label
- text inside button-like instance → button label
Signals:
- instance/component/frame,
- one short text child,
- fixed or hug sizing,
- padded auto-layout,
- visible fill,
- name contains button/cta/primary/secondary.
Signals:
- label + text region + border/fill,
- rectangular container,
- width > height,
- names like input/field/search/email/password.
Signals:
- small vector group,
- square-ish bounds,
- no text,
- simple naming.
Signals:
- image fills,
- many vector descendants,
- masks,
- illustration-like names,
- complex boolean operations.
Collapse or de-emphasize:
- invisible nodes unless structurally important,
- zero-size nodes,
- decorative micro-layers that don’t affect implementation decisions,
- deeply nested wrappers with no distinct style/layout semantics.
Do not fully delete noisy nodes from raw output. Instead mark them as low-priority or omit from normalized children with diagnostics.
Every normalized node should preserve enough provenance to debug bad inferences.
Include:
- source node type,
- omitted raw fields,
- warnings,
- confidence score.
Examples:
"warning": "vector-heavy node simplified as asset candidate""warning": "layout inferred from child bounds; no explicit auto-layout metadata""confidence": "medium"
This matters because heuristic systems will be wrong sometimes, and the operator needs to see why.
Return explicit, typed errors for:
- invalid Figma URL,
- missing node ID,
- auth failure,
- file not found,
- node not found,
- null node response,
- render failure,
- rate limit,
- unsupported node shape,
- normalization failure.
If image export fails but JSON fetch succeeds, continue and mark render failure in manifest.
If normalization partially fails for a subtree, preserve best-effort parent output and record subtree diagnostics.
Add simple local caching from day one.
Cache key should include:
- file key,
- node ID,
- requested depth,
- version if pinned,
- relevant fetch flags.
Use file-based cache under .cache/figma-fetch.
Reason:
- lowers token and API cost during iterative work,
- improves determinism while debugging normalization rules.
Focus heavily on normalization.
Test modules:
- URL parsing
- layout mapping
- role inference
- text normalization
- component/instance extraction
- token binding extraction
- asset classification
- outline generation
Create a set of raw Figma JSON fixtures representing:
- simple card
- form with inputs
- table row
- nav bar
- modal
- icon-only button
- illustration-heavy marketing block
- component instance with variants
- token-bound theme example
For each fixture, assert normalized output snapshots.
For representative real frames:
- raw input fixture,
- expected normalized JSON,
- expected outline XML,
- expected context.md.
The normalization layer should be developed like a compiler pass, not a casual mapper.
- CLI
- URL parsing
- auth/client
- node fetch
- image fetch
- file writing
- manifest
- bounds
- hierarchy
- layout
- text
- appearance
- outline
- context.md
- role inference
- component/instance info
- token/variable extraction
- asset classification
- diagnostics
- child-node targeted refetch
- shallow-to-deep escalation
- better handling of large frames
Expose a minimal tool surface over the same library:
figma_get_outlinefigma_get_nodefigma_get_renderfigma_get_assets
No business logic in the MCP layer.
Given a valid Figma URL and token, the CLI should:
- parse file key and node ID,
- fetch selected node subtree,
- export PNG render,
- emit normalized structural JSON,
- emit sparse outline JSON and XML,
- emit a concise
context.md, - extract variable/token references where available,
- classify likely assets,
- preserve diagnostics for uncertain inferences.
Success criterion:
an agent should usually be able to implement a typical UI frame from context.md + normalized-node.json + frame.png without needing raw Figma JSON.
Implement the project as a small library plus CLI, not a single script.
Prioritize:
- schema design,
- normalization correctness,
- test fixtures,
- deterministic output.
Deprioritize:
- fancy CLI UX,
- premature abstraction,
- full API coverage,
- MCP support.
The normalization layer is the product. Everything else is plumbing.
GET /v1/files/:keyandGET /v1/files/:key/nodesboth support scoped retrieval viaids;depthlimits traversal;geometry=pathsenables vector data; andplugin_dataincludes plugin/shared plugin data when requested. (Figma Developer Docs)GET /v1/images/:keyrenders node images and supports output formats includingpngandsvg, scaling, and SVG options such as including layer IDs or node IDs. (Figma Developer Docs)- Figma node global properties include
pluginData,sharedPluginData,componentPropertyReferences,boundVariables, andexplicitVariableModes, which should be surfaced selectively in normalized outputs because they carry high implementation value. (Figma Developer Docs) - Figma’s own MCP implementation guidance recommends parsing the URL, fetching structured design context, fetching a screenshot, and falling back to sparse metadata plus targeted child fetches when the response is too large. (Figma Developer Docs)