-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Our current SD2.1+LoRA pipeline is great at texture (paper, ink grain) but not at ductus (the tiny, local shape quirks that make real Pecha glyphs look hand-cut/hand-pressed). Global warps (affine/TPS) don’t generalize across volumes: each Pecha has its own end-of-line squeeze, micro-erosion, and stroke breathing. We need a lightweight, learnable module that can nudge clean renders into realistic glyph shapes before diffusion, without breaking legibility.
Proposed Solution / Scope
Introduce a Neural Ductus Stylizer (NDS): a tiny U-Net that, conditioned on 1–3 style patches from the target Pecha, predicts a bounded displacement field (±2 px) plus a mask focused around stroke edges. We apply Y_shape = warp(X, D ⊙ M) (bilinear) and then run our existing SD2.1+LoRA for texture. Training is unpaired.
Architecture
Two-stream architecture inspired by recent font-generation work (e.g. Fu et al., CVPR 2023): a content encoder takes our clean rendered Tibetan text (grayscale + edge maps), while a lightweight style encoder extracts an embedding from a few real Pecha patches. Both streams feed into a small U-Net that predicts a bounded displacement field (±1–2 px) and a mask concentrated at stroke edges. Applying this field gently warps the clean render to mimic the ductus variation of blockprints, while preserving skeleton and stroke topology. Unlike prior font-transfer models, we explicitly add OCR-aware constraints (skeleton/edge alignment, stroke-width regularization, magnitude penalties) so the deformations stay realistic but safe for OCR. Texture (paper, ink, bleed-through) is then handled separately by Stable Diffusion + LoRA, keeping ductus and material generation decoupled.