Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,6 @@ Cargo.toml.original.txt
*.profraw
*.profdata
fuzz-*.log

# Local-only dev dependency overrides (build against in-tree zen sources)
.cargo/
151 changes: 151 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,157 @@ All notable changes to zencodec are documented here.

### Added

- **Cross-codec color-emission policy** —
`resolve_color_emit(&SourceColor, &EncodeCapabilities, ColorEmitPolicy) -> ColorEmitPlan`,
a pure `no_std` decision of which color carriers (ICC vs CICP) to write for a
target, with no CMS and no codec dependencies. The `color` module is private;
the types are re-exported at the crate root (`zencodec::ColorEmitPolicy`, …).
- `ColorEmitPolicy { Compatibility, Balanced (default), Compact, Verbatim, Custom(ColorEmitFields) }`;
`ColorEmitPlan { cicp: Option<Cicp>, icc: IccDisposition }`;
`IccDisposition { KeepSource, SynthesizeFrom(Cicp), Drop }`. Handles the
grayscale/CMYK terminal states and never emits a redundant `SynthesizeFrom(sRGB)`.
(Names carry the emit direction so they can't be confused with the decode-side
`SourceColor`.)
- `ColorEmitFields::new` makes `ColorEmitPolicy::Custom` constructible downstream.
- `EncodeCapabilities` gains `cicp_is_valid_carrier` (standardized carrier —
JXL/AVIF/HEIC `nclx`, PNG `cICP`) and `cicp_safe_sole_carrier` (safe CICP-only,
JXL) (+ `with_*`); `IccRetention` gains `DropIfCicpRepresentable`,
`DropIfCicpSafeSoleCarrier`. The plan lowers to `zenpixels_convert`'s
`finalize_for_output_with` (`icc_profile_for_primaries` materializes a
`SynthesizeFrom` from a `const fn` table — no CMS, never a silent drop).
- `EncodePolicy` now bundles the output-emission policy: `color:
Option<ColorEmitPolicy>` and `metadata: Option<MetadataPolicy>` (+ `with_color`,
`with_metadata_policy`, `resolve_color`, `resolve_metadata`), so encode and
transcode select the color carrier and metadata retention through one object —
the codec reads `color`, the pipeline applies `metadata` via `Metadata::filtered`.
Its docs reframe the legacy `embed_*` flags as a coarse best-effort codec gate.
`MetadataPolicy` is now `Copy` so it can be bundled by value.
- `helpers::set_exif_orientation` rewrites a blob's EXIF orientation tag inline
(offset-preserving) so a baked-upright pixel buffer and its embedded tag can't
disagree (the double-rotation hazard). Applied by the pipeline, not by the
color resolver.
- `exif::ByteOrder` is module-scoped (a TIFF/EXIF header detail), not re-exported
at the crate root.
- Design + rejected alternatives: `docs/color-emit-model.md`.
- **EXIF string-field editing** — `Exif::set_copyright` / `set_artist` set (insert
or replace) the IFD0 rights tags, materialized through the existing canonical
`Exif::to_bytes` (offsets recomputed, byte-exact fixpoint preserved). The new
`exif::TextEncoding` (re-exported at the crate root) lets the caller pick the
TIFF field type explicitly: `Ascii` (Exif 2.x, type 2 — carries UTF-8 bytes
de-facto, most compatible) or `Utf8` (Exif 3.0 / CIPA DC-008-2023, type 129 —
spec-conformant Unicode, thinly read). Explicit over auto-upgrade because
auto-promoting non-ASCII to type 129 would silently produce strings most
readers can't parse. `Entry` value bytes are now `Cow` so parsed entries stay
zero-copy while edited ones are owned; the `copyright()` / `artist()` /
`*_bytes()` accessors now borrow `&self`. EXIF tag/type numbers in the parser
are named constants (no bare hex), and the `ExifPolicy` timestamps category is
`datetimes` (plural — it covers DateTime / Original / Digitized / OffsetTime* /
SubSecTime*). (f4b9f1b)
- **Explicit embed-time metadata policy on `Metadata`** — `Metadata` gains
`policy: Option<MetadataPolicy>` (**no implicit default** — privacy is an
explicit choice) and `with_policy()`. `Metadata::for_embedding()` returns
`Option<Metadata>`: the policy-filtered metadata once a policy is set, else
`None` — a codec embeds nothing (fail-safe: a forgotten policy strips, never
leaks). It's the hook a codec calls inside its existing `EncodeJob::with_metadata`
with no EXIF logic and no trait/signature change. Carried bytes stay untouched
until then (bring-your-own-EXIF-library round-trips still see originals);
`From<&ImageInfo>` sets `None`; `filtered()`'s output is `Some(PreserveExact)` so
re-embedding can't double-strip. `MetadataPolicy` has **no `Default`** — callers
name a policy explicitly (`Web` recommended). `EncodePolicy::strip_all` /
`preserve_all` carry a real `MetadataPolicy` through the reliable
`resolve_metadata` channel (`Custom(DISCARD_ALL)` / `PreserveExact`) instead of
the advisory `embed_*` flags that no-op on codecs without `with_policy`.
`Metadata` is `#[non_exhaustive]`; `size_of` 104 → 120 on 64-bit. (b832cdc, 73c5799)
- **EXIF privacy hardening for partial-strip policies** — `MakerNote` (0x927C) is
dropped whenever `gps` **or** `camera` is stripped (it can embed GPS/serials and
can't be selectively scrubbed); `SubIFDs` (0x014A, an unmodeled sub-IFD pointer)
is dropped on a rewrite rather than left dangling; IFD1 (thumbnail-directory)
entries are filtered by the same per-category rules as IFD0 (a keep-thumbnail
policy previously kept their Make/Model/DateTime); and `exif::retain` now fails
**safe** for a >4 GiB blob under a stripping policy (drop, not pass-through). The
`Web`/`ColorAndRotation` presets were already safe — these close gaps for
hand-rolled `Custom` policies. (d8a2fae)
- **From-scratch EXIF construction** — `Exif::new(TextEncoding)` (+ `Default`,
which uses `Ascii`) starts an empty little-endian tree, completing the
`parse`/`new` → edit → `to_bytes` flow so you can build a blob with no source:
`Exif::new(TextEncoding::Ascii)` → `set_copyright(…)` → `to_bytes()` (raw TIFF;
the codec adds the APP1 `Exif\0\0` framing). The `TextEncoding` is required — the
Exif 2.x ASCII (type 2) vs Exif 3.0 UTF-8 (type 129) choice is a blob property
used by `set_copyright`/`set_artist` (type 129 is read by almost nothing, so it
can't be a silent default). (b7acd9f, 73c5799)
- **`Metadata::with_copyright(&str)` / `with_artist(&str)`** — one-liner rights
stamping that builds an EXIF blob if there is none and merges into a parseable
existing one (keeping other tags), replacing an unparseable one. Written ASCII
(Exif 2.x, most compatible); for UTF-8/Exif 3.0 or other tags, build via
`exif::Exif` + `with_exif`. (1051288)

## [0.1.21] - 2026-05-29

### Added

- **Field-level metadata retention** — `Metadata::filtered(&MetadataPolicy)`,
the shared filter for re-encode / recompress pipelines: keep what a
downstream image needs, strip the rest, without callers hand-parsing EXIF.
- `MetadataPolicy`: `PreserveExact` (keep all, incl. a redundant sRGB ICC),
`Preserve` (keep all but drop a redundant sRGB ICC), `Web` (**default** —
ICC non-sRGB + EXIF orientation/rights + CICP/HDR; drop the rest of EXIF
and all XMP), `ColorAndRotation` (only what places pixels: ICC non-sRGB +
CICP/HDR + orientation), and `Custom(MetadataFields)`.
- `MetadataFields` (`#[non_exhaustive]`, `with_*` builders): `icc:
IccRetention` (`#[non_exhaustive]`; `Drop` / `KeepNonSrgb` / `Keep` —
three-way sRGB handling), `exif: ExifPolicy`, and `xmp` / `cicp` / `hdr:
Retention`.
- `exif::Retention` (`#[non_exhaustive]`; `Keep` / `Discard`, query via
`keeps`/`discards`) — explicit per-field intent, no `bool`-direction
ambiguity.
- Every disposition type (`MetadataPolicy`, `IccRetention`, `Retention`) and
every record (`Metadata`, `MetadataFields`, `ExifPolicy`) is
`#[non_exhaustive]` with builder construction, so new policies, ICC modes,
EXIF categories, retention fields, and `Metadata` fields land additively —
the surface never needs a semver-major break (see the module's *Forward
compatibility* docs).
- **Structured EXIF** (`zencodec::exif`) — `Exif<'a>` parses a TIFF/EXIF blob
into a borrowing IFD tree (zero-copy; thumbnails/values are never copied),
`Exif::filtered(&ExifPolicy)` prunes by category, and `Exif::to_bytes`
re-serializes a valid TIFF with recomputed offsets. `ExifPolicy`
(`#[non_exhaustive]`, `with_*` builders) has seven categories: `orientation`,
`rights`, `thumbnail`, `gps`, `datetimes`, `camera`, `other` — so e.g.
"drop only the thumbnail" or "strip GPS" is one field. `exif::retain` is the
`Cow` entry point: borrows the source unchanged when nothing is dropped
(so `Metadata::filtered` is a cheap `Arc` clone), allocates only on a real
rewrite. Bounds-checked, no panics on untrusted input; preserves byte order
and `Exif\0\0` framing. (`helpers::parse_exif_orientation` now delegates
here.)
- Hardened (adversarial review + 80M+ fuzz executions across four targets):
the serializer **deduplicates aliased out-of-line values** so a malformed
IFD pointing many entries at one blob can't amplify the rewrite ~1000×
(DoS); Copyright/Artist accessors read both **ASCII (type 2) and UTF-8
(type 129, Exif 3.0)** per CIPA DC-008 (a UTF-8-typed field was previously
dropped as unknown), expose raw bytes (`copyright_bytes` / `artist_bytes`)
alongside the lossy-UTF-8 text view, and a pruning rewrite preserves field
bytes **and TIFF type** verbatim (never transcoded — neither corrupted nor
"corrected"); EXIF categories were corrected per the spec's tag tables —
the Exif-IFD creator/owner *name* tags (CameraOwnerName 0xA430, Photographer
0xA437, ImageEditor 0xA438) are attribution (`rights`, kept by a copyright
policy — they were previously stripped as "other"), and firmware / editing-
software / unique-ID tags are device identity (`camera`); the thumbnail
length tag is read as SHORT *or* LONG (real cameras use SHORT — was silently
dropping valid thumbnails);
structural sub-IFD pointers too short to hold an offset are preserved
(peek-before-remove) instead of dropping the sub-IFD; and `retain` passes a
>4 GiB blob through untouched rather than risk `u32` offset truncation.
- Robust error model: `Exif::parse` returns `None` on structural failure but
**gracefully skips** an individual unreadable / unknown-type / out-of-bounds
entry (and salvages a truncated entry table) — one bad or future-typed
entry no longer discards the whole IFD; `retain` **fails safe** (drops EXIF
it can't parse under a stripping policy rather than leaking it through); and
`to_bytes` is **canonical** (a byte-exact fixpoint), so filtering is
idempotent (a fuzz-found non-idempotence, now a regression seed).
- Test infrastructure: differential tests against `kamadak-exif`
(`tests/exif_differential.rs`), four libFuzzer targets (`fuzz/` — parse,
roundtrip, filter, and `Metadata::filtered`), a stable regression harness
with a committed crash seed (`tests/fuzz_regression.rs`), and a zero-copy
benchmark over 1 KiB–1 MiB thumbnails (`benches/exif_filter.rs`).
- `ThreadingPolicy::resolve_thread_count()` — cross-codec shared helper that
translates a [`ThreadingPolicy`] to the integer thread count that
native-threaded encoder libraries (rav1e/ravif, dav1d/rav1d, libwebp, etc.)
Expand Down
25 changes: 24 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,4 +66,27 @@ Tiny, stable crate defining the common interface that all zen* codecs implement:

## Known Issues

(none)
Three bugs verified during the cross-codec color/metadata scenario-matrix
research (2026-06-01). The first is in this crate; the other two are recorded
here as cross-repo findings (do NOT edit those repos from here — flag to the
owner). Full design context: [`docs/color-emit-model.md`](docs/color-emit-model.md).

1. **Double-rotation hazard — FIXED (this crate, `src/metadata.rs`).** When a
decoder bakes orientation upright it sets `Metadata::orientation = Identity`
while the EXIF blob still carries the original `Orientation` tag (e.g. `6`); a
consumer that re-applied the tag would rotate twice. `Metadata::filtered` now
reconciles them — it rewrites the embedded tag to match the authoritative
`orientation` field via `helpers::set_exif_orientation` (offset-preserving,
fires only on a mismatch so the matched case keeps the zero-copy `Arc` clone).
Regression: `filtered_reconciles_baked_orientation_tag`.

2. **AVIF descriptor-CICP override (zenavif, `src/codec.rs:824-831`).**
`apply_descriptor_color` overrides a metadata-set CICP unconditionally,
ignoring a CICP explicitly provided via `Metadata`. It should check for a
caller-supplied CICP before overriding from the pixel descriptor.

3. **Missing signal-range conversion kernels (zenpixels-convert).** No
`Narrow <-> Full` range conversion kernels exist, so a range mismatch refuses
zero-copy but can relabel without rescaling — a black-crush risk. Needs
`ConvertStep::{Expand,Contract}NarrowToFull`. Until then, range must be
preserved verbatim, never relabeled.
15 changes: 13 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "zencodec"
version = "0.1.20"
version = "0.1.21"
edition = "2024"
rust-version = "1.88"
license = "Apache-2.0 OR MIT"
Expand All @@ -21,13 +21,24 @@ include = [
name = "zencodec"

[dependencies]
# Published manifest uses crates.io versions (CI + `cargo publish` build the
# real artifact). Local development builds against the in-tree zen sources via
# a gitignored `.cargo/config.toml` `paths` override — see CONTRIBUTING note.
zenpixels = { version = "0.2.10", features = ["icc"] }
almost-enough = { version = "0.4.4", default-features = false, features = ["alloc"] }
enough = "0.4.4"
whereat = { version = "0.1.5"}
whereat = { version = "0.1.5" }

[dev-dependencies]
# Differential-test oracle for the EXIF parser (tests/exif_differential.rs).
# Pure-Rust, BSD-2-Clause; only built for tests, never shipped.
kamadak-exif = "0.6.1"
zenbench = "0.1.8"
thiserror = "2"
walkdir = "2.5.0"
rayon = "1.10.0"
moxcms = { version = "0.8.1", features = ["options"] }

[[bench]]
name = "exif_filter"
harness = false
34 changes: 32 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,15 +77,45 @@ let pixels = decoded.into_buffer();

**Pixel types from `zenpixels`.** All pixel interchange types (`PixelSlice`, `PixelBuffer`, `PixelDescriptor`, etc.) are defined in the `zenpixels` crate. All zen\* crates depend on `zenpixels` directly.

## Metadata Retention

Re-encode and recompress pipelines need to decide what metadata survives. `Metadata::filtered` applies a `MetadataPolicy`, so callers never hand-parse EXIF:

```rust,ignore
use zencodec::{MetadataPolicy, MetadataFields, IccRetention, exif::{ExifPolicy, Retention}};

// Decode → filter → re-encode. `Web` (the default) keeps the ICC profile
// (unless a redundant sRGB), EXIF orientation + rights, and CICP/HDR color
// signaling — and strips GPS, timestamps, camera info, thumbnail, and XMP.
let kept = decoded_meta.filtered(&MetadataPolicy::Web);

// Presets: PreserveExact (keep all, incl. duplicate sRGB), Preserve (drop dup
// sRGB), Web, ColorAndRotation (only what places pixels), Custom.
let minimal = decoded_meta.filtered(&MetadataPolicy::ColorAndRotation);

// Per-field control — drop only the thumbnail, keep everything else:
let policy = MetadataPolicy::Custom(
MetadataFields::KEEP_ALL.with_exif(ExifPolicy::KEEP_ALL.with_thumbnail(Retention::Discard)),
);
let no_thumb = decoded_meta.filtered(&policy);
```

`MetadataFields` encapsulates EXIF in an `ExifPolicy` with seven keep/discard categories — `orientation`, `rights`, `thumbnail`, `gps`, `datetimes`, `camera`, `other` — and three-way ICC handling (`IccRetention::{Drop, KeepNonSrgb, Keep}`). EXIF passes through byte-unchanged (zero-copy) when no category is dropped, and is rewritten — offsets recomputed — only when pruning. CICP/HDR are color *signaling* (dropping them changes displayed pixels), so the presets keep them; a `Custom` policy can drop them. The structured parser/editor is public as [`zencodec::exif::Exif`](https://docs.rs/zencodec) (`parse` → `filtered`/edit → `to_bytes`) for direct EXIF work — including setting Copyright/Artist (`set_copyright` / `set_artist`, with a `TextEncoding` choice of Exif 2.x ASCII or Exif 3.0 UTF-8).

**Privacy is an explicit choice.** `Metadata` carries an `Option<MetadataPolicy>` — there's **no implicit default**: you choose retention with `with_policy(MetadataPolicy::Web)` (privacy-safe) or `PreserveExact` (verbatim). `Metadata::for_embedding()` returns `Option<Metadata>` — the filtered metadata a codec embeds, or `None` when no policy was chosen, which a codec treats as "embed nothing." So a forgotten policy **strips, never leaks**. A codec calls it inside its existing `with_metadata` (no trait change). The carried bytes stay untouched until embed, so you can still pull `metadata.exif` out, edit it with any EXIF library, and put it back via `with_exif`.

To **stamp** rights in one line — `Metadata::none().with_copyright("© 2026 You")` builds (or merges into) the EXIF blob (ASCII); or build it directly with `Exif::new(TextEncoding::Ascii).set_copyright(…)` → `to_bytes()` — `Exif::new` requires the Exif 2.x-vs-3.0 field-type choice (type 129 is read by almost nothing, so it's never a silent default).

## What's in this crate

| Module | Contents |
|--------|----------|
| `zencodec::encode` | `EncoderConfig`, `EncodeJob`, `Encoder`, `AnimationFrameEncoder`, `EncodeOutput`, `EncodeCapabilities`, `EncodePolicy`, `best_encode_format`, dyn dispatch traits (`DynEncoderConfig`, `DynEncodeJob`, `DynEncoder`, `DynAnimationFrameEncoder`) |
| `zencodec::decode` | `DecoderConfig`, `DecodeJob`, `Decode`, `StreamingDecode`, `AnimationFrameDecoder`, `DecodeOutput`, `DecodeCapabilities`, `DecodePolicy`, `DecodeRowSink`, `SinkError`, `OutputInfo`, `SourceEncodingDetails`, `negotiate_pixel_format`, `is_format_available`, dyn dispatch traits (`DynDecoderConfig`, `DynDecodeJob`, `DynDecoder`, `DynStreamingDecoder`, `DynAnimationFrameDecoder`) |
| `zencodec::gainmap` | `GainMapInfo`, `GainMapParams`, `GainMapChannel`, `GainMapDirection`, `GainMapPresence`, `Iso21496Format` (with variants `JxlJhgm`, `AvifTmap`, `JpegApp2BodyWithUrn`), `ISO_21496_1_URN`, `ISO_21496_1_PRIMARY_APP2_BODY`, `serialize_iso21496_fmt` / `serialize_iso21496_fmt_into` / `parse_iso21496_fmt`, `GainMapParseError` — cross-codec gain map types and wire-format helpers (ISO 21496-1) |
| `zencodec::helpers` | Codec implementation helpers (not consumer API) — shared boilerplate for trait implementors |
| root | `ImageFormat`, `ImageFormatDefinition`, `ImageFormatRegistry` (format detection via `ImageFormatRegistry::detect()`), `ImageInfo`, `Metadata`, `Orientation`, `OrientationHint`, `ResourceLimits`, `LimitExceeded`, `ThreadingPolicy`, `UnsupportedOperation`, `CodecErrorExt`, `find_cause`, `Unsupported`, `Extensions`, `AnimationFrame`, `OwnedAnimationFrame`, `Cicp`, `ContentLightLevel`, `MasteringDisplay`, `StopToken`, `Unstoppable` |
| `zencodec::exif` | Structured EXIF/TIFF: `Exif` (borrowing parse → prune → serialize), `ExifPolicy` (7 keep/discard categories), `Retention`, `ByteOrder`, `retain` |
| `zencodec::helpers` | Codec implementation helpers (not consumer API) — shared boilerplate for trait implementors, plus the lightweight `parse_exif_orientation` accessor |
| root | `ImageFormat`, `ImageFormatDefinition`, `ImageFormatRegistry` (format detection via `ImageFormatRegistry::detect()`), `ImageInfo`, `Metadata`, `MetadataPolicy`, `MetadataFields`, `IccRetention`, `Exif`, `ExifPolicy`, `Retention`, `ByteOrder`, `Orientation`, `OrientationHint`, `ResourceLimits`, `LimitExceeded`, `ThreadingPolicy`, `UnsupportedOperation`, `CodecErrorExt`, `find_cause`, `Unsupported`, `Extensions`, `AnimationFrame`, `OwnedAnimationFrame`, `Cicp`, `ContentLightLevel`, `MasteringDisplay`, `StopToken`, `Unstoppable` |

zencodec has no feature flags. The full API is always available.

Expand Down
Loading