Skip to content

Color + metadata: field-level retention + cross-codec color-emission policy#17

Open
lilith wants to merge 14 commits into
mainfrom
feat/metadata-policy
Open

Color + metadata: field-level retention + cross-codec color-emission policy#17
lilith wants to merge 14 commits into
mainfrom
feat/metadata-policy

Conversation

@lilith
Copy link
Copy Markdown
Member

@lilith lilith commented May 29, 2026

One PR for the metadata + color work (was two stacked PRs — #21's color commit is now folded in here; #19/#20 superseded and closed).

Two commits

1. Field-level metadata retention (MetadataPolicy + Metadata::filtered + hardened structured EXIF)

  • MetadataPolicy { PreserveExact, Preserve, Web (default), ColorAndRotation, Custom(MetadataFields) }; per-category EXIF pruning via ExifPolicy; IccRetention three-way; exif::Retention.
  • Exif borrowing IFD parser + Exif::to_bytes re-serialization (offset-recomputed); exif::retain is Cow (zero-copy when nothing is dropped).
  • Every record/disposition is #[non_exhaustive] with builders, so the surface grows additively.

2. Cross-codec color-emission policy (zencodec::color)

  • resolve_color_emit(&SourceColor, &EncodeCapabilities, ColorPolicy) -> ColorPlan — pure no_std, no CMS, no codec dependencies; decides which color carriers (ICC vs CICP) to write for a target. Lowers to zenpixels_convert::finalize_for_output_with (SynthesizeFromicc_profile_for_primaries const-fn table; no CMS, no silent drop; sRGB→None).
  • ColorPolicy { Compatibility, Balanced (default), Compact, Verbatim, Custom(ColorFields) }; IccDisposition { KeepSource, SynthesizeFrom(Cicp), Drop }. ColorFields::new makes Custom constructible.
  • EncodeCapabilities gains cicp_is_valid_carrier (standardized carrier incl. PNG cICP) + cicp_safe_sole_carrier (JXL); IccRetention gains the two target-aware variants used by the resolver (the metadata filter treats them as a conservative Keep).
  • helpers::set_exif_orientation closes the double-rotation hazard (offset-preserving inline tag rewrite, applied by the pipeline).

The color side is the validated, grounded surface: an over-built EmitFacts/EmitIntent/EmitPlan scenario model + a TranscodeEncoder trait were dogfooded into 5 codecs and adversarially reviewed, then rejected (nothing produces their inputs; the trait would force every codec to depend on every other). Design + rejected alternatives: docs/color-emit-model.md. The zenpipe transcode-only dispatcher is a separate later piece.

472 lib tests + 16 doctests pass, clippy + fmt clean.

…tured EXIF)

A shared, structured metadata filter for re-encode / recompress pipelines so
codec crates stop hand-rolling EXIF stripping.

- Metadata::filtered(&MetadataPolicy): PreserveExact / Preserve / Web (default)
  / ColorAndRotation / Custom(MetadataFields). IccRetention; MetadataFields
  {icc, exif: ExifPolicy, xmp/cicp/hdr: Retention} with with_* builders.
- zencodec::exif::Exif<'a> — borrowing parse → filtered → to_bytes. ExifPolicy:
  7 keep/discard categories. exif::retain -> Cow.

Verified against CIPA DC-008-2026 (Exif 3.0). Text/attribution semantics:
- Copyright (0x8298) / Artist (0x013B) are 'ASCII or UTF-8' (UTF-8 = type 129,
  the conformant Unicode form). zencodec reads BOTH (a UTF-8-typed field was
  previously dropped); copyright/artist give a lossy-UTF-8 view, *_bytes the
  exact bytes; rewrite preserves value bytes AND TIFF type verbatim. Non-ASCII
  in a type-2 field is decoded as UTF-8, not stripped (pre-129, undeclared UTF-8
  was the de-facto practice — stripping would corrupt it).
- Copyright is the rights *notice*; creator/owner *names* live in Artist and the
  Exif-IFD CameraOwnerName/Photographer/ImageEditor tags — now classified as
  'rights' (kept by a copyright policy; previously stripped as 'other'). Device/
  software identity (serials, firmware, editing-software, unique-ID) → 'camera'.

Hardening (adversarial review + ~80M fuzz executions across 4 targets, 0
crashes): serializer dedups aliased values (anti-amplification DoS) and is
CANONICAL (to_bytes byte-exact fixpoint → idempotent filtering, a fuzz-found
bug); parse skips unreadable/unknown-type/OOB entries + salvages truncated
tables; retain FAILS SAFE on unparseable EXIF under a stripping policy; >4 GiB
blobs pass through; thumbnail length SHORT-or-LONG; short sub-IFD pointers
preserved. HDR/gain-map consistency documented; partial XMP a future field.

Tests: 46 structured-EXIF unit tests, differential vs kamadak-exif, 4 libFuzzer
targets + stable regression harness (crash seed), zero-copy benchmark.
Gates: clippy/fmt/wasm32-wasip1/MSRV 1.93/semver (additive). Bump 0.1.20 -> 0.1.21.

Forward compatibility: every disposition enum (MetadataPolicy, IccRetention,
Retention) and record (Metadata, MetadataFields, ExifPolicy) is
#[non_exhaustive] + builder-constructed, with the guarantee documented in the
metadata module. New policies, ICC modes, EXIF categories, retention fields,
and Metadata fields land additively. The four cross-codec carrier gaps from
imazen/zenpipe#36 (Metadata::orientation emission, decode-side EXIF-orientation
normalization, CICP wiring for native-carrier formats, AVIF EXIF-blob
preservation) are fixable as behavioral changes in the codec adapters —
Metadata already models every value they produce, so none need a type, field,
or signature change here. Query Retention via keeps()/discards().
@lilith lilith force-pushed the feat/metadata-policy branch from 8453317 to df8da6b Compare May 30, 2026 16:19
@lilith lilith changed the title Field-level metadata retention: MetadataPolicy + Metadata::filtered Color + metadata: field-level retention + cross-codec color-emission policy Jun 2, 2026
@lilith lilith force-pushed the feat/metadata-policy branch 2 times, most recently from 00c8bc9 to 11ad1ff Compare June 2, 2026 03:24
lilith added 2 commits June 1, 2026 23:04
…econcile

Color carrier policy: ColorEmitPolicy / ColorEmitPlan / ColorEmitFields
(emit-directional names, distinct from decode-side SourceColor) + the
resolve_color_emit resolver + IccDisposition / CicpEmission. ICC retention
shared with MetadataPolicy via the unified IccRetention enum.

ByteOrder stays exif-scoped (dropped from the crate-root re-export; it is a
TIFF/EXIF header detail, reach it as exif::ByteOrder).

EncodePolicy re-documented as a coarse, best-effort, codec-honored per-channel
embed gate (with_policy default is a no-op) — explicitly NOT the retention
mechanism, which is MetadataPolicy + Metadata::filtered (field-level, always
honored). Resolves the EncodePolicy/MetadataPolicy overlap by responsibility,
not removal.

EXIF orientation reconcile in Metadata::filtered closes the double-rotation
hazard.
EncodePolicy gains color: Option<ColorEmitPolicy> and metadata: Option<MetadataPolicy>
(+ with_color / with_metadata_policy / resolve_color / resolve_metadata). Gives
ColorEmitPolicy a home in the encode API (codecs previously hardcoded Balanced) and
one output-policy object for both encode and transcode: the codec reads .color
(resolve_color -> resolve_color_emit), the pipeline applies .metadata via
Metadata::filtered. embed_* stay as the coarse best-effort codec gate. MetadataPolicy
is now Copy so it bundles by value; the size_of==3 guard relaxed to <=32.

color and icc are now private modules (re-exported at the crate root), so the single
public path is zencodec::ColorEmitPolicy (matches the metadata/orientation/format
convention; drops the redundant color:: paths). The color module's per-format overview
moved onto ColorEmitPolicy's doc so it stays in public rustdoc.

Verified: 473 lib tests + doctests pass, clippy + fmt clean, rustdoc back to the 7
pre-existing warnings (none new), public-api confirms color::/icc:: paths gone.
@lilith lilith force-pushed the feat/metadata-policy branch from 11ad1ff to b7fe81a Compare June 2, 2026 06:38
lilith added 9 commits June 3, 2026 21:08
…ight/Artist editing

API refinements to the unreleased structured EXIF surface (new in PR #17,
not on any published version — latest crates.io is 0.1.20):

- Rename ExifPolicy::datetime → datetimes (field + with_datetime →
  with_datetimes) since the category covers many tags (DateTime,
  DateTimeOriginal/Digitized, OffsetTime*, SubSecTime*). Private
  Category::Datetime → Datetimes to match.

- Replace bare hex in classify()/type_size() with named TAG_*/TIFF_*
  constants (TIFF 6.0 + CIPA DC-008). One source of truth for each tag
  number; the categorization reads as names, not magic numbers.

- Add editing: Exif::set_copyright / set_artist take a new exif::TextEncoding
  { Ascii = Exif 2.x type 2, Utf8 = Exif 3.0 type 129 }. Entry.value is now
  Cow<'a,[u8]> so parsed entries stay borrowed (zero-copy) while injected
  ones are owned; set_* insert-or-replace in IFD0 and serialize via the
  existing canonical to_bytes (offsets recomputed, fixpoint preserved).
  Ascii carries UTF-8 bytes de-facto (max compatibility); Utf8 is conformant
  but thinly supported. copyright()/artist()/(_bytes) now borrow &self (they
  can return an owned entry's bytes). TextEncoding re-exported at crate root.

480 lib + 100 integration + 16 doctests pass; clippy -D warnings clean; fmt
clean; native + wasm32-wasip1 (default & no-default) build.
…tetimes rename

- spec.md: turn the 'Planned: EXIF write/edit path' section into the
  implemented API; document set_copyright/set_artist + TextEncoding and the
  explicit-enum-over-auto-pick rationale; datetime→datetimes; note Cow values
  and &self-borrowing accessors. Fix type-129 attribution (Exif 3.0, not 2.32).
- README.md: datetime→datetimes; mention the parser is now also an editor.
- CHANGELOG.md: add the editing entry under [Unreleased]; fix the unreleased
  0.1.21 EXIF entry to say datetimes and attribute type 129 to Exif 3.0.
…ltering hook

Privacy-by-default without breaking any released API (Metadata is
#[non_exhaustive]; MetadataPolicy was already #[non_exhaustive] + Copy).

- Metadata gains `policy: MetadataPolicy` (default Web). Carries embed-time
  intent only — the raw exif/xmp/icc bytes are untouched until filtering, so the
  bring-your-own-EXIF-library round-trip still sees original bytes.
- Metadata::with_policy(policy) builder; Metadata::for_embedding() = self
  .filtered(&self.policy) — the hook a codec calls inside its existing
  with_metadata so embedding honors the caller's policy with zero EXIF logic in
  the codec (works for every codec, even ones that implement nothing special;
  no trait/signature change).
- filtered() marks its output PreserveExact so for_embedding can't double-strip.
- From<&ImageInfo> sets policy = Web (decoded metadata is privacy-safe by
  default; opt out with with_policy(PreserveExact) for a verbatim transcode).
- EncodePolicy::strip_all()/preserve_all() now carry a real MetadataPolicy
  through the reliable resolve_metadata channel (Custom(DISCARD_ALL) /
  PreserveExact), so a strip can't silently no-op via the advisory embed_* flags
  on codecs that don't implement with_policy.
- size_of::<Metadata>() 104 → 120 (12-byte MetadataPolicy, padded). 51 metadata
  + 28 policy lib tests pass.
Audit findings (all fire only on Custom/partial policies that keep a category;
the Web/ColorAndRotation presets were already safe). New in this unreleased PR.

- MakerNote (0x927C): drop whenever GPS *or* camera is stripped, not only camera.
  MakerNote is opaque and routinely embeds GPS coordinates/serials, so a
  keep-camera + drop-gps policy could leak location through it.
- SubIFDs (0x014A): drop on a rewrite. It's an unmodeled sub-IFD offset pointer
  (only Exif/GPS/Interop are modeled); keeping it emitted a dangling offset.
- IFD1 (thumbnail dir): filter its entries by the same per-category rules as
  IFD0. It carries its own Make/Model/DateTime, which a keep-thumbnail policy
  previously left intact while dropping them from IFD0.
- retain(): a >4 GiB blob under a stripping policy now fails SAFE (None/drop)
  instead of passing the original through unfiltered — the prior fail-open
  contradicted the module's own fail-safe doctrine.

3 new tests (makernote/subifds/ifd1) + full exif suite pass.
- spec.md: document Metadata.policy / with_policy / for_embedding and the
  embed-time privacy model; rewrite the exif Hardening/Limitation notes (retain
  now fails safe on oversize, not pass-through) and add a Privacy paragraph
  (MakerNote/SubIFDs/IFD1, XMP cross-carrier caveat).
- README.md: 'Privacy by default' note — for_embedding hook + bring-your-own-EXIF
  seam.
- CHANGELOG.md: [Unreleased] entries for the policy-on-Metadata feature (b832cdc)
  and the EXIF privacy hardening (d8a2fae).
Pre-existing unresolved rustdoc links that failed cargo doc -D warnings (doc
comments only — no code/behavior change):
- exif.rs module doc: [`retain`] ×2 → qualified crate::exif::retain.
- gainmap.rs: bare [`JpegApp2BodyWithUrn`]/[`AvifTmap`]/[`JpegApp2`] → the
  Iso21496Format:: variant paths; stale [`parse_iso21496_with_urn`] → the real
  fn parse_iso21496_fmt (which emits UrnMismatch).
- limits.rs: [`rayon::ThreadPool::install()`] → plain code span (rayon is an
  optional dep, so the cross-crate link can't resolve in the default doc build).

cargo doc -D warnings now clean (both --no-deps and full); fmt + clippy clean.
Completes the parse/new → edit → to_bytes symmetry: Exif had no from-scratch
constructor, so set_copyright/set_artist only worked on a parsed blob. Exif::new()
returns an empty little-endian tree (no Exif\0\0 prefix); + Default impl.

    let mut e = Exif::new();
    e.set_copyright("© 2026 Lilith", TextEncoding::Utf8);
    let blob = e.to_bytes();   // raw TIFF; codec adds APP1 framing

Additive, non-breaking. Doctest on new() + 2 unit tests; 61 exif lib + 17 doctests.
Higher-level sugar over Exif::new + set_copyright/set_artist: parse the existing
EXIF blob (or start fresh) and merge the field, re-serializing into self.exif.

    let meta = Metadata::none().with_copyright("© 2026 Lilith");

Written ASCII (Exif 2.x, most compatible); for UTF-8/Exif 3.0 or other tags use
exif::Exif + with_exif. Merges into a parseable existing blob (keeps other tags),
replaces an unparseable one. 3 tests; 49 metadata lib tests pass.
spec.md (Metadata methods + exif Exif/new), README (one-line stamping note),
CHANGELOG ([Unreleased]: from-scratch construction b7acd9f, rights sugar 1051288).
lilith added 2 commits June 5, 2026 16:24
…(no silent defaults)

Both unreleased (new this PR); no published API affected. Privacy/compat
decisions must be made explicitly rather than silently defaulted.

EXIF text encoding (Exif 2.x ASCII vs Exif 3.0 UTF-8):
- Exif::new(TextEncoding) now *requires* the compat choice; it's a blob property
  used by set_copyright/set_artist (which drop their per-call encoding arg).
  Default impl uses Ascii. Type 129 (UTF-8) is read by almost nothing today
  (research: Pillow drops it, piexif crashes, kamadak-exif returns Unknown), so
  forcing the choice prevents accidentally shipping unreadable copyright.

Metadata policy:
- MetadataPolicy loses Default (no implicit Web). Callers must name a policy.
- Metadata.policy is now Option<MetadataPolicy> (None = unchosen); for_embedding()
  returns Option<Metadata> — None when no policy is set, so a codec embeds nothing
  (fail-safe: a forgotten policy strips, never leaks). filtered() output carries
  Some(PreserveExact) so re-embedding can't double-strip. From<&ImageInfo> = None.
- size_of::<Metadata>() unchanged at 120 (Option niche-optimized).

493 lib + 100 integration + 17 doctests pass; clippy -D warnings + fmt clean.
spec.md / README / CHANGELOG: Metadata.policy is Option<MetadataPolicy> (no
default), for_embedding() -> Option<Metadata> (None ⇒ embed nothing, fail-safe),
MetadataPolicy has no Default, Exif::new(TextEncoding) required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant