Skip to content

Latest commit

 

History

History
232 lines (179 loc) · 10.1 KB

File metadata and controls

232 lines (179 loc) · 10.1 KB

Architecture

This document describes the internal architecture and design patterns of LayerD.

Overview

LayerD is a layer decomposition method that extracts editable layers from raster graphic design images. The system uses a two-stage iterative approach:

  1. Top-layer matting: Extracts the alpha matte of the topmost layer using BiRefNet
  2. Inpainting: Fills in the removed content using LaMa to reconstruct the background

The main LayerD class orchestrates this pipeline iteratively to decompose an image into multiple layers (background + foreground layers).

Core Components

Main Pipeline

The main pipeline is implemented in src/layerd/models/layerd.py:

  • LayerD class: Main interface for layer decomposition
  • decompose(): Iteratively extracts layers (max 3 iterations by default)
  • _decompose_step(): Single iteration of matting + inpainting
  • Uses helper functions from helpers.py for refinement operations

Model Abstraction

LayerD uses a registry pattern to support multiple model backends:

  • Base classes: BaseMatting and BaseInpaint define model interfaces
  • Registry pattern: Implemented in models/matting/__init__.py and models/inpaint/__init__.py
  • Factory functions: Use build_matting() and build_inpaint() to instantiate models
  • Current implementations:
    • Matting: BiRefNet (HuggingFace model)
    • Inpainting: LaMa (from simple-lama-inpainting)

Refinement Pipeline

The decomposition includes optional refinement steps controlled by flags:

  • use_unblend: Estimates foreground color by unblending (subtracting background)
  • fg_refine: Refines foreground alpha and colors using flat color region detection
  • bg_refine: Refines background with palette-based color assignment

These refinement steps help improve the quality of the extracted layers, especially for text and flat-color graphics.

Evaluation System

Evaluation components are in src/layerd/evaluation/:

  • LayersEditDist: Main metric for layer decomposition quality
  • Dynamic Time Warping (DTW): Aligns predicted and ground truth layers
  • Edit distance: Computes edit operations (insert, delete, modify) between layer sequences
  • Per-layer metrics: RGBL1 (color accuracy), AlphaIoU (alpha mask accuracy)

See evaluation.md for usage details.

Module Organization

src/layerd/
├── pipeline.py            # LayerDPipeline (high-level orchestration)
├── types.py               # Common types (BoundingBox, Element)
├── cli.py                 # CLI entry point
├── models/
│   ├── layerd.py          # LayerD class (low-level decomposition)
│   ├── helpers.py         # Refinement utilities (unblend, mask ops, color estimation)
│   ├── matting/           # Matting model implementations
│   │   ├── base.py        # BaseMatting abstract class
│   │   ├── birefnet_matting.py
│   │   └── __init__.py    # Registry with build_matting()
│   └── inpaint/           # Inpainting model implementations
│       ├── base.py        # BaseInpaint abstract class
│       ├── lama_inpaint.py
│       └── __init__.py    # Registry with build_inpaint()
├── classification/        # Element type labeling
│   ├── base.py            # ElementLabeler abstract class
│   ├── entropy.py         # EntropyLabeler implementation
│   ├── gradient.py        # GradientAwareLabeler implementation
│   └── utils.py           # Classification utilities
├── postprocess/           # Layer organization
│   └── organizer.py       # LayerOrganizer for element extraction
├── export/                # Format exporters
│   ├── base.py            # BaseExporter abstract class
│   ├── svg.py             # SVGBuilder and SVGParser
│   ├── psd.py             # PSDBuilder
│   └── __init__.py        # Registry with build_exporter()
├── ocr/                   # Optional text detection/recognition
│   ├── base.py            # BaseOCR abstract class
│   ├── east_backend.py    # EAST detector (lightweight, CPU-compatible)
│   ├── transformers_backend.py  # GOT-OCR2 (full OCR with recognition)
│   ├── types.py           # OCR types and data structures
│   ├── __init__.py        # OCR registry
│   └── README.md          # OCR backend documentation
├── matting/birefnet/      # BiRefNet training code
│   ├── train.py           # Training loop
│   ├── dataset.py         # Dataset implementation
│   ├── loss.py            # Loss functions
│   └── image_proc.py      # Image preprocessing
├── data/                  # Dataset utilities
│   ├── crello.py          # Crello dataset handling
│   └── renderer.py        # Rendering utilities
├── evaluation/            # Evaluation metrics
│   ├── edit_distance.py   # LayersEditDist metric
│   ├── dtw.py             # Dynamic Time Warping
│   ├── edits.py           # Edit operations
│   └── metrics.py         # Per-layer metrics (RGBL1, AlphaIoU)
├── configs/               # Hydra configuration files
│   └── train.yaml         # Training hyperparameters
└── _vendor/               # Bundled dependencies (see below)
    ├── simple_lama_inpainting/
    └── cr_renderer/

Pipeline Architecture Flow

The high-level LayerDPipeline orchestrates the complete workflow:

Image Input
    ↓
[LayerD.decompose()]
    ↓
RGBA Layers (list of PIL Images)
    ↓
[Optional: OCR Detection] ← EAST or GOT-OCR2 backend
    ↓
[LayerOrganizer.organize()]
    ↓
Connected Components (per-layer)
    ↓
[ElementLabeler.label()]
    ↓
Classified Elements (text/vector/image)
    ↓
[SVGBuilder / PSDBuilder]
    ↓
SVG / PSD Output

For low-level API users, only the first step (LayerD.decompose()) is executed, returning raw RGBA layers for custom processing.

Key Design Patterns

  1. Factory Pattern: Models are created via factory functions with string identifiers:
    • build_matting(): Creates matting models (e.g., "birefnet")
    • build_inpaint(): Creates inpainting models (e.g., "lama")
    • build_exporter(): Creates exporters (e.g., "svg", "psd")
    • build_ocr(): Creates OCR backends (e.g., "east", "got-ocr2")
  2. Abstract Base Classes: All models inherit from base classes with validation:
    • BaseMatting: Matting model interface
    • BaseInpaint: Inpainting model interface
    • BaseExporter: Exporter interface
    • BaseOCR: OCR backend interface (with fsspec support)
    • ElementLabeler: Classification interface
  3. Iterative Decomposition: decompose() runs _decompose_step() until no more layers or max iterations reached
  4. PIL Image Interface: Main API uses PIL Images; internal processing uses numpy arrays
  5. Pluggable Components: Classification, OCR, and export modules use strategy pattern for extensibility

Important Implementation Details

Input and Output Formats

  • Input Requirements: Prefer PNG images to avoid compression artifacts around text edges
  • PIL Image Interface: The main API uses PIL Images in RGBA format
  • Internal Processing: Uses numpy arrays (float64) for computation
  • Alpha Format: Matting models output float64 alpha in [0, 1] range

Layer Extraction Process

  • Mask Expansion: Uses kernel_scale parameter (default 0.015) to expand masks based on image dimensions
    • This helps capture anti-aliased edges properly
    • Scale is relative to image size: kernel_size = int(min(H, W) * kernel_scale)
  • Layer Order: decompose() returns [background, topmost_fg, ..., bottommost_fg]
    • The background is always the first layer
    • Subsequent layers are ordered from top to bottom as they were extracted

Model Loading

  • First run: Downloads models from remote sources
  • Caching: Models are cached locally for subsequent runs
  • Device placement: Models can run on CPU or CUDA devices

Bundled Dependencies

LayerD bundles two dependencies under layerd._vendor to enable numpy 2.0 compatibility:

1. simple-lama-inpainting (layerd._vendor.simple_lama_inpainting)

2. cr-renderer (layerd._vendor.cr_renderer)

Dual Directory Structure

LayerD maintains vendored dependencies in two locations:

  • vendor/: Source of truth for git subtree operations (tracked in git)
  • src/layerd/_vendor/: Bundled copy for distribution (tracked in git)

Both directories are committed to git to ensure pip install git+... and editable installs work correctly.

The _vendor prefix indicates these are internal dependencies and should not be imported directly by users.

See development.md#vendored-dependencies for syncing instructions.

Type System

LayerD uses strict mypy type checking:

  • disallow_untyped_defs=true
  • disallow_incomplete_defs=true
  • no_implicit_optional=true

All functions must have complete type annotations. This helps catch bugs early and provides better IDE support.

Related Documentation