Color + metadata: field-level retention + cross-codec color-emission policy#17
Open
lilith wants to merge 14 commits into
Open
Color + metadata: field-level retention + cross-codec color-emission policy#17lilith wants to merge 14 commits into
lilith wants to merge 14 commits into
Conversation
cc43d0e to
c82e05f
Compare
c82e05f to
8453317
Compare
…tured EXIF)
A shared, structured metadata filter for re-encode / recompress pipelines so
codec crates stop hand-rolling EXIF stripping.
- Metadata::filtered(&MetadataPolicy): PreserveExact / Preserve / Web (default)
/ ColorAndRotation / Custom(MetadataFields). IccRetention; MetadataFields
{icc, exif: ExifPolicy, xmp/cicp/hdr: Retention} with with_* builders.
- zencodec::exif::Exif<'a> — borrowing parse → filtered → to_bytes. ExifPolicy:
7 keep/discard categories. exif::retain -> Cow.
Verified against CIPA DC-008-2026 (Exif 3.0). Text/attribution semantics:
- Copyright (0x8298) / Artist (0x013B) are 'ASCII or UTF-8' (UTF-8 = type 129,
the conformant Unicode form). zencodec reads BOTH (a UTF-8-typed field was
previously dropped); copyright/artist give a lossy-UTF-8 view, *_bytes the
exact bytes; rewrite preserves value bytes AND TIFF type verbatim. Non-ASCII
in a type-2 field is decoded as UTF-8, not stripped (pre-129, undeclared UTF-8
was the de-facto practice — stripping would corrupt it).
- Copyright is the rights *notice*; creator/owner *names* live in Artist and the
Exif-IFD CameraOwnerName/Photographer/ImageEditor tags — now classified as
'rights' (kept by a copyright policy; previously stripped as 'other'). Device/
software identity (serials, firmware, editing-software, unique-ID) → 'camera'.
Hardening (adversarial review + ~80M fuzz executions across 4 targets, 0
crashes): serializer dedups aliased values (anti-amplification DoS) and is
CANONICAL (to_bytes byte-exact fixpoint → idempotent filtering, a fuzz-found
bug); parse skips unreadable/unknown-type/OOB entries + salvages truncated
tables; retain FAILS SAFE on unparseable EXIF under a stripping policy; >4 GiB
blobs pass through; thumbnail length SHORT-or-LONG; short sub-IFD pointers
preserved. HDR/gain-map consistency documented; partial XMP a future field.
Tests: 46 structured-EXIF unit tests, differential vs kamadak-exif, 4 libFuzzer
targets + stable regression harness (crash seed), zero-copy benchmark.
Gates: clippy/fmt/wasm32-wasip1/MSRV 1.93/semver (additive). Bump 0.1.20 -> 0.1.21.
Forward compatibility: every disposition enum (MetadataPolicy, IccRetention,
Retention) and record (Metadata, MetadataFields, ExifPolicy) is
#[non_exhaustive] + builder-constructed, with the guarantee documented in the
metadata module. New policies, ICC modes, EXIF categories, retention fields,
and Metadata fields land additively. The four cross-codec carrier gaps from
imazen/zenpipe#36 (Metadata::orientation emission, decode-side EXIF-orientation
normalization, CICP wiring for native-carrier formats, AVIF EXIF-blob
preservation) are fixable as behavioral changes in the codec adapters —
Metadata already models every value they produce, so none need a type, field,
or signature change here. Query Retention via keeps()/discards().
8453317 to
df8da6b
Compare
This was referenced Jun 1, 2026
00c8bc9 to
11ad1ff
Compare
…econcile Color carrier policy: ColorEmitPolicy / ColorEmitPlan / ColorEmitFields (emit-directional names, distinct from decode-side SourceColor) + the resolve_color_emit resolver + IccDisposition / CicpEmission. ICC retention shared with MetadataPolicy via the unified IccRetention enum. ByteOrder stays exif-scoped (dropped from the crate-root re-export; it is a TIFF/EXIF header detail, reach it as exif::ByteOrder). EncodePolicy re-documented as a coarse, best-effort, codec-honored per-channel embed gate (with_policy default is a no-op) — explicitly NOT the retention mechanism, which is MetadataPolicy + Metadata::filtered (field-level, always honored). Resolves the EncodePolicy/MetadataPolicy overlap by responsibility, not removal. EXIF orientation reconcile in Metadata::filtered closes the double-rotation hazard.
EncodePolicy gains color: Option<ColorEmitPolicy> and metadata: Option<MetadataPolicy> (+ with_color / with_metadata_policy / resolve_color / resolve_metadata). Gives ColorEmitPolicy a home in the encode API (codecs previously hardcoded Balanced) and one output-policy object for both encode and transcode: the codec reads .color (resolve_color -> resolve_color_emit), the pipeline applies .metadata via Metadata::filtered. embed_* stay as the coarse best-effort codec gate. MetadataPolicy is now Copy so it bundles by value; the size_of==3 guard relaxed to <=32. color and icc are now private modules (re-exported at the crate root), so the single public path is zencodec::ColorEmitPolicy (matches the metadata/orientation/format convention; drops the redundant color:: paths). The color module's per-format overview moved onto ColorEmitPolicy's doc so it stays in public rustdoc. Verified: 473 lib tests + doctests pass, clippy + fmt clean, rustdoc back to the 7 pre-existing warnings (none new), public-api confirms color::/icc:: paths gone.
11ad1ff to
b7fe81a
Compare
…ight/Artist editing API refinements to the unreleased structured EXIF surface (new in PR #17, not on any published version — latest crates.io is 0.1.20): - Rename ExifPolicy::datetime → datetimes (field + with_datetime → with_datetimes) since the category covers many tags (DateTime, DateTimeOriginal/Digitized, OffsetTime*, SubSecTime*). Private Category::Datetime → Datetimes to match. - Replace bare hex in classify()/type_size() with named TAG_*/TIFF_* constants (TIFF 6.0 + CIPA DC-008). One source of truth for each tag number; the categorization reads as names, not magic numbers. - Add editing: Exif::set_copyright / set_artist take a new exif::TextEncoding { Ascii = Exif 2.x type 2, Utf8 = Exif 3.0 type 129 }. Entry.value is now Cow<'a,[u8]> so parsed entries stay borrowed (zero-copy) while injected ones are owned; set_* insert-or-replace in IFD0 and serialize via the existing canonical to_bytes (offsets recomputed, fixpoint preserved). Ascii carries UTF-8 bytes de-facto (max compatibility); Utf8 is conformant but thinly supported. copyright()/artist()/(_bytes) now borrow &self (they can return an owned entry's bytes). TextEncoding re-exported at crate root. 480 lib + 100 integration + 16 doctests pass; clippy -D warnings clean; fmt clean; native + wasm32-wasip1 (default & no-default) build.
…tetimes rename - spec.md: turn the 'Planned: EXIF write/edit path' section into the implemented API; document set_copyright/set_artist + TextEncoding and the explicit-enum-over-auto-pick rationale; datetime→datetimes; note Cow values and &self-borrowing accessors. Fix type-129 attribution (Exif 3.0, not 2.32). - README.md: datetime→datetimes; mention the parser is now also an editor. - CHANGELOG.md: add the editing entry under [Unreleased]; fix the unreleased 0.1.21 EXIF entry to say datetimes and attribute type 129 to Exif 3.0.
…ltering hook Privacy-by-default without breaking any released API (Metadata is #[non_exhaustive]; MetadataPolicy was already #[non_exhaustive] + Copy). - Metadata gains `policy: MetadataPolicy` (default Web). Carries embed-time intent only — the raw exif/xmp/icc bytes are untouched until filtering, so the bring-your-own-EXIF-library round-trip still sees original bytes. - Metadata::with_policy(policy) builder; Metadata::for_embedding() = self .filtered(&self.policy) — the hook a codec calls inside its existing with_metadata so embedding honors the caller's policy with zero EXIF logic in the codec (works for every codec, even ones that implement nothing special; no trait/signature change). - filtered() marks its output PreserveExact so for_embedding can't double-strip. - From<&ImageInfo> sets policy = Web (decoded metadata is privacy-safe by default; opt out with with_policy(PreserveExact) for a verbatim transcode). - EncodePolicy::strip_all()/preserve_all() now carry a real MetadataPolicy through the reliable resolve_metadata channel (Custom(DISCARD_ALL) / PreserveExact), so a strip can't silently no-op via the advisory embed_* flags on codecs that don't implement with_policy. - size_of::<Metadata>() 104 → 120 (12-byte MetadataPolicy, padded). 51 metadata + 28 policy lib tests pass.
Audit findings (all fire only on Custom/partial policies that keep a category; the Web/ColorAndRotation presets were already safe). New in this unreleased PR. - MakerNote (0x927C): drop whenever GPS *or* camera is stripped, not only camera. MakerNote is opaque and routinely embeds GPS coordinates/serials, so a keep-camera + drop-gps policy could leak location through it. - SubIFDs (0x014A): drop on a rewrite. It's an unmodeled sub-IFD offset pointer (only Exif/GPS/Interop are modeled); keeping it emitted a dangling offset. - IFD1 (thumbnail dir): filter its entries by the same per-category rules as IFD0. It carries its own Make/Model/DateTime, which a keep-thumbnail policy previously left intact while dropping them from IFD0. - retain(): a >4 GiB blob under a stripping policy now fails SAFE (None/drop) instead of passing the original through unfiltered — the prior fail-open contradicted the module's own fail-safe doctrine. 3 new tests (makernote/subifds/ifd1) + full exif suite pass.
- spec.md: document Metadata.policy / with_policy / for_embedding and the embed-time privacy model; rewrite the exif Hardening/Limitation notes (retain now fails safe on oversize, not pass-through) and add a Privacy paragraph (MakerNote/SubIFDs/IFD1, XMP cross-carrier caveat). - README.md: 'Privacy by default' note — for_embedding hook + bring-your-own-EXIF seam. - CHANGELOG.md: [Unreleased] entries for the policy-on-Metadata feature (b832cdc) and the EXIF privacy hardening (d8a2fae).
Pre-existing unresolved rustdoc links that failed cargo doc -D warnings (doc comments only — no code/behavior change): - exif.rs module doc: [`retain`] ×2 → qualified crate::exif::retain. - gainmap.rs: bare [`JpegApp2BodyWithUrn`]/[`AvifTmap`]/[`JpegApp2`] → the Iso21496Format:: variant paths; stale [`parse_iso21496_with_urn`] → the real fn parse_iso21496_fmt (which emits UrnMismatch). - limits.rs: [`rayon::ThreadPool::install()`] → plain code span (rayon is an optional dep, so the cross-crate link can't resolve in the default doc build). cargo doc -D warnings now clean (both --no-deps and full); fmt + clippy clean.
Completes the parse/new → edit → to_bytes symmetry: Exif had no from-scratch
constructor, so set_copyright/set_artist only worked on a parsed blob. Exif::new()
returns an empty little-endian tree (no Exif\0\0 prefix); + Default impl.
let mut e = Exif::new();
e.set_copyright("© 2026 Lilith", TextEncoding::Utf8);
let blob = e.to_bytes(); // raw TIFF; codec adds APP1 framing
Additive, non-breaking. Doctest on new() + 2 unit tests; 61 exif lib + 17 doctests.
Higher-level sugar over Exif::new + set_copyright/set_artist: parse the existing
EXIF blob (or start fresh) and merge the field, re-serializing into self.exif.
let meta = Metadata::none().with_copyright("© 2026 Lilith");
Written ASCII (Exif 2.x, most compatible); for UTF-8/Exif 3.0 or other tags use
exif::Exif + with_exif. Merges into a parseable existing blob (keeps other tags),
replaces an unparseable one. 3 tests; 49 metadata lib tests pass.
…(no silent defaults) Both unreleased (new this PR); no published API affected. Privacy/compat decisions must be made explicitly rather than silently defaulted. EXIF text encoding (Exif 2.x ASCII vs Exif 3.0 UTF-8): - Exif::new(TextEncoding) now *requires* the compat choice; it's a blob property used by set_copyright/set_artist (which drop their per-call encoding arg). Default impl uses Ascii. Type 129 (UTF-8) is read by almost nothing today (research: Pillow drops it, piexif crashes, kamadak-exif returns Unknown), so forcing the choice prevents accidentally shipping unreadable copyright. Metadata policy: - MetadataPolicy loses Default (no implicit Web). Callers must name a policy. - Metadata.policy is now Option<MetadataPolicy> (None = unchosen); for_embedding() returns Option<Metadata> — None when no policy is set, so a codec embeds nothing (fail-safe: a forgotten policy strips, never leaks). filtered() output carries Some(PreserveExact) so re-embedding can't double-strip. From<&ImageInfo> = None. - size_of::<Metadata>() unchanged at 120 (Option niche-optimized). 493 lib + 100 integration + 17 doctests pass; clippy -D warnings + fmt clean.
spec.md / README / CHANGELOG: Metadata.policy is Option<MetadataPolicy> (no default), for_embedding() -> Option<Metadata> (None ⇒ embed nothing, fail-safe), MetadataPolicy has no Default, Exif::new(TextEncoding) required.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
One PR for the metadata + color work (was two stacked PRs — #21's color commit is now folded in here; #19/#20 superseded and closed).
Two commits
1. Field-level metadata retention (
MetadataPolicy+Metadata::filtered+ hardened structured EXIF)MetadataPolicy { PreserveExact, Preserve, Web (default), ColorAndRotation, Custom(MetadataFields) }; per-category EXIF pruning viaExifPolicy;IccRetentionthree-way;exif::Retention.Exifborrowing IFD parser +Exif::to_bytesre-serialization (offset-recomputed);exif::retainisCow(zero-copy when nothing is dropped).#[non_exhaustive]with builders, so the surface grows additively.2. Cross-codec color-emission policy (
zencodec::color)resolve_color_emit(&SourceColor, &EncodeCapabilities, ColorPolicy) -> ColorPlan— pureno_std, no CMS, no codec dependencies; decides which color carriers (ICC vs CICP) to write for a target. Lowers tozenpixels_convert::finalize_for_output_with(SynthesizeFrom→icc_profile_for_primariesconst-fn table; no CMS, no silent drop; sRGB→None).ColorPolicy { Compatibility, Balanced (default), Compact, Verbatim, Custom(ColorFields) };IccDisposition { KeepSource, SynthesizeFrom(Cicp), Drop }.ColorFields::newmakesCustomconstructible.EncodeCapabilitiesgainscicp_is_valid_carrier(standardized carrier incl. PNGcICP) +cicp_safe_sole_carrier(JXL);IccRetentiongains the two target-aware variants used by the resolver (the metadata filter treats them as a conservativeKeep).helpers::set_exif_orientationcloses the double-rotation hazard (offset-preserving inline tag rewrite, applied by the pipeline).The color side is the validated, grounded surface: an over-built
EmitFacts/EmitIntent/EmitPlanscenario model + aTranscodeEncodertrait were dogfooded into 5 codecs and adversarially reviewed, then rejected (nothing produces their inputs; the trait would force every codec to depend on every other). Design + rejected alternatives:docs/color-emit-model.md. The zenpipe transcode-only dispatcher is a separate later piece.472 lib tests + 16 doctests pass, clippy + fmt clean.