This document describes the internal architecture and design patterns of LayerD.
LayerD is a layer decomposition method that extracts editable layers from raster graphic design images. The system uses a two-stage iterative approach:
- Top-layer matting: Extracts the alpha matte of the topmost layer using BiRefNet
- Inpainting: Fills in the removed content using LaMa to reconstruct the background
The main LayerD class orchestrates this pipeline iteratively to decompose an image into multiple layers (background + foreground layers).
The main pipeline is implemented in src/layerd/models/layerd.py:
LayerDclass: Main interface for layer decompositiondecompose(): Iteratively extracts layers (max 3 iterations by default)_decompose_step(): Single iteration of matting + inpainting- Uses helper functions from
helpers.pyfor refinement operations
LayerD uses a registry pattern to support multiple model backends:
- Base classes:
BaseMattingandBaseInpaintdefine model interfaces - Registry pattern: Implemented in
models/matting/__init__.pyandmodels/inpaint/__init__.py - Factory functions: Use
build_matting()andbuild_inpaint()to instantiate models - Current implementations:
- Matting: BiRefNet (HuggingFace model)
- Inpainting: LaMa (from simple-lama-inpainting)
The decomposition includes optional refinement steps controlled by flags:
use_unblend: Estimates foreground color by unblending (subtracting background)fg_refine: Refines foreground alpha and colors using flat color region detectionbg_refine: Refines background with palette-based color assignment
These refinement steps help improve the quality of the extracted layers, especially for text and flat-color graphics.
Evaluation components are in src/layerd/evaluation/:
LayersEditDist: Main metric for layer decomposition quality- Dynamic Time Warping (DTW): Aligns predicted and ground truth layers
- Edit distance: Computes edit operations (insert, delete, modify) between layer sequences
- Per-layer metrics: RGBL1 (color accuracy), AlphaIoU (alpha mask accuracy)
See evaluation.md for usage details.
src/layerd/
├── pipeline.py # LayerDPipeline (high-level orchestration)
├── types.py # Common types (BoundingBox, Element)
├── cli.py # CLI entry point
├── models/
│ ├── layerd.py # LayerD class (low-level decomposition)
│ ├── helpers.py # Refinement utilities (unblend, mask ops, color estimation)
│ ├── matting/ # Matting model implementations
│ │ ├── base.py # BaseMatting abstract class
│ │ ├── birefnet_matting.py
│ │ └── __init__.py # Registry with build_matting()
│ └── inpaint/ # Inpainting model implementations
│ ├── base.py # BaseInpaint abstract class
│ ├── lama_inpaint.py
│ └── __init__.py # Registry with build_inpaint()
├── classification/ # Element type labeling
│ ├── base.py # ElementLabeler abstract class
│ ├── entropy.py # EntropyLabeler implementation
│ ├── gradient.py # GradientAwareLabeler implementation
│ └── utils.py # Classification utilities
├── postprocess/ # Layer organization
│ └── organizer.py # LayerOrganizer for element extraction
├── export/ # Format exporters
│ ├── base.py # BaseExporter abstract class
│ ├── svg.py # SVGBuilder and SVGParser
│ ├── psd.py # PSDBuilder
│ └── __init__.py # Registry with build_exporter()
├── ocr/ # Optional text detection/recognition
│ ├── base.py # BaseOCR abstract class
│ ├── east_backend.py # EAST detector (lightweight, CPU-compatible)
│ ├── transformers_backend.py # GOT-OCR2 (full OCR with recognition)
│ ├── types.py # OCR types and data structures
│ ├── __init__.py # OCR registry
│ └── README.md # OCR backend documentation
├── matting/birefnet/ # BiRefNet training code
│ ├── train.py # Training loop
│ ├── dataset.py # Dataset implementation
│ ├── loss.py # Loss functions
│ └── image_proc.py # Image preprocessing
├── data/ # Dataset utilities
│ ├── crello.py # Crello dataset handling
│ └── renderer.py # Rendering utilities
├── evaluation/ # Evaluation metrics
│ ├── edit_distance.py # LayersEditDist metric
│ ├── dtw.py # Dynamic Time Warping
│ ├── edits.py # Edit operations
│ └── metrics.py # Per-layer metrics (RGBL1, AlphaIoU)
├── configs/ # Hydra configuration files
│ └── train.yaml # Training hyperparameters
└── _vendor/ # Bundled dependencies (see below)
├── simple_lama_inpainting/
└── cr_renderer/
The high-level LayerDPipeline orchestrates the complete workflow:
Image Input
↓
[LayerD.decompose()]
↓
RGBA Layers (list of PIL Images)
↓
[Optional: OCR Detection] ← EAST or GOT-OCR2 backend
↓
[LayerOrganizer.organize()]
↓
Connected Components (per-layer)
↓
[ElementLabeler.label()]
↓
Classified Elements (text/vector/image)
↓
[SVGBuilder / PSDBuilder]
↓
SVG / PSD Output
For low-level API users, only the first step (LayerD.decompose()) is executed, returning raw RGBA layers for custom processing.
- Factory Pattern: Models are created via factory functions with string identifiers:
build_matting(): Creates matting models (e.g., "birefnet")build_inpaint(): Creates inpainting models (e.g., "lama")build_exporter(): Creates exporters (e.g., "svg", "psd")build_ocr(): Creates OCR backends (e.g., "east", "got-ocr2")
- Abstract Base Classes: All models inherit from base classes with validation:
BaseMatting: Matting model interfaceBaseInpaint: Inpainting model interfaceBaseExporter: Exporter interfaceBaseOCR: OCR backend interface (with fsspec support)ElementLabeler: Classification interface
- Iterative Decomposition:
decompose()runs_decompose_step()until no more layers or max iterations reached - PIL Image Interface: Main API uses PIL Images; internal processing uses numpy arrays
- Pluggable Components: Classification, OCR, and export modules use strategy pattern for extensibility
- Input Requirements: Prefer PNG images to avoid compression artifacts around text edges
- PIL Image Interface: The main API uses PIL Images in RGBA format
- Internal Processing: Uses numpy arrays (float64) for computation
- Alpha Format: Matting models output float64 alpha in [0, 1] range
- Mask Expansion: Uses
kernel_scaleparameter (default 0.015) to expand masks based on image dimensions- This helps capture anti-aliased edges properly
- Scale is relative to image size:
kernel_size = int(min(H, W) * kernel_scale)
- Layer Order:
decompose()returns[background, topmost_fg, ..., bottommost_fg]- The background is always the first layer
- Subsequent layers are ordered from top to bottom as they were extracted
- First run: Downloads models from remote sources
- BiRefNet: ~1GB from HuggingFace (cyberagent/layerd-birefnet)
- LaMa: ~200MB from GitHub (simple-lama-inpainting)
- Caching: Models are cached locally for subsequent runs
- Device placement: Models can run on CPU or CUDA devices
LayerD bundles two dependencies under layerd._vendor to enable numpy 2.0 compatibility:
- Original: https://github.com/enesmsahin/simple-lama-inpainting
- PyPI: https://pypi.org/project/simple-lama-inpainting/ (outdated, numpy 1.x)
- Purpose: LaMa inpainting model wrapper
- License: Apache-2.0
- Reason for bundling: PyPI version uses numpy 1.x (incompatible with LayerD's numpy 2.0 requirement)
- Original: https://github.com/CyberAgentAILab/cr-renderer
- Revision: a17e1fb
- Purpose: Crello dataset rendering
- License: Apache-2.0
- Reason for bundling: Not available on PyPI, patched for numpy 2.0 compatibility
LayerD maintains vendored dependencies in two locations:
vendor/: Source of truth for git subtree operations (tracked in git)src/layerd/_vendor/: Bundled copy for distribution (tracked in git)
Both directories are committed to git to ensure pip install git+... and editable installs work correctly.
The _vendor prefix indicates these are internal dependencies and should not be imported directly by users.
See development.md#vendored-dependencies for syncing instructions.
LayerD uses strict mypy type checking:
disallow_untyped_defs=truedisallow_incomplete_defs=trueno_implicit_optional=true
All functions must have complete type annotations. This helps catch bugs early and provides better IDE support.
- Development Guide - Development setup and workflows
- Training Guide - Training the matting module
- Evaluation Guide - Evaluation metrics and usage