Skip to content

jgorostegui/ios-media-toolkit

Repository files navigation

iOS Media Toolkit

A command-line tool for processing iPhone photos and videos with Dolby Vision preservation, multiple encoding pipelines, and smart sync.

Features

  • Album-based processing: Process entire folders with a single command
  • Dolby Vision preservation: Re-encode videos while keeping DV metadata intact
  • Multiple encoding pipelines: x265 (CPU) and NVENC (GPU) with configurable quality presets
  • Smart sync: SHA256 checksum comparison to skip identical files
  • Favorites detection: Reads XMP sidecar metadata to identify favorites
  • Live Photo grouping: Automatically pairs photo + video components
  • iPhone compatibility verification: Validates codec tags, HDR metadata, and container boxes
  • Metadata preservation: GPS location, creation date, device info

Terminology

Before diving in, here are key terms used throughout this project:

Term Description
HDR High Dynamic Range - wider brightness range than standard video (SDR)
Dolby Vision (DV) Premium HDR format with frame-by-frame dynamic metadata
HLG Hybrid Log-Gamma - HDR standard used by iPhone as DV base layer
RPU Reference Processing Unit - Dolby Vision's dynamic metadata
HEVC/H.265 Video codec used by iPhone for 4K HDR recording
x265 Software (CPU) encoder for HEVC - slow but high quality
NVENC NVIDIA hardware (GPU) encoder - fast but requires NVIDIA GPU
CRF Constant Rate Factor - quality-based encoding (lower = better quality, larger file)
VBR Variable Bit Rate - target bitrate encoding with quality fluctuation
Codec tag Container metadata identifying video format (hvc1 required for iPhone)

Background: The Dolby Vision Re-encoding Problem

Motivation

iPhone 12 and later record 4K video in Dolby Vision Profile 8.4, a dual-layer HDR format using HLG as the base layer with dynamic metadata overlay. At bitrates around 70-80 Mbps, this results in large files: a typical 30-second clip produces 200+ MB. For archival and storage optimization, re-encoding with x265 can achieve 80-90% compression with minimal perceptual quality loss.

However, standard re-encoding pipelines strip the Dolby Vision metadata. The resulting files play back as standard HDR (HLG), losing the dynamic tone mapping capabilities that distinguish Dolby Vision from static HDR formats. This is a known limitation - while FFmpeg has made progress on DV support, it still cannot generate the required container signaling when re-encoding.

Major platforms face similar challenges. Meta's engineering team documented their work on Instagram's HDR pipeline, noting the complexity of preserving DV metadata through transcoding workflows. Apple's developer documentation on Dolby Vision describes the format architecture but offers no re-encoding solution.

Understanding the Two-Layer Metadata Problem

Initial testing with ffmpeg revealed that re-encoded files displayed "HDR" rather than "Dolby Vision" on iOS devices. Analysis of the container structure showed two distinct metadata components that must both be preserved - a challenge discussed extensively in the dovi_tool community:

Stream-level metadata (RPU): Dolby Vision embeds Reference Processing Unit data as supplemental enhancement information (SEI) NAL units within the HEVC bitstream. These contain per-frame tone mapping parameters, color volume transforms, and content-adaptive metadata. When the video is re-encoded, these NAL units are discarded since they reference the original frame data.

Container-level signaling: MP4 files use dvcC and dvvC configuration boxes to signal Dolby Vision presence to decoders. ffmpeg can copy these boxes when remuxing (-c:v copy), but cannot generate them when re-encoding because the stream-level metadata no longer exists.

Solution Architecture

The preservation workflow requires three tools working in sequence:

  1. dovi_tool (quietvoid/dovi_tool): Extracts RPU data from the original HEVC stream before re-encoding, then injects it into the re-encoded stream. This works because RPU metadata describes relative adjustments rather than absolute values—the tone mapping parameters remain valid for re-encoded content with similar characteristics.

  2. mp4muxer (DolbyLaboratories/dlb_mp4base): Dolby's reference MP4 muxer that generates proper dvcC/dvvC container boxes by parsing the RPU data in the input HEVC stream.

  3. ffmpeg: Handles the actual video re-encoding (x265 or NVENC) and final audio/metadata multiplexing.

Implementation Details

The complete pipeline executes six operations:

  1. Extract raw HEVC bitstream from source MOV container
  2. Extract RPU binary data using dovi_tool
  3. Re-encode video with x265/NVENC (compression occurs here)
  4. Inject original RPU into re-encoded bitstream
  5. Mux with mp4muxer to generate DV container boxes
  6. Combine with audio track and copy metadata

Testing confirmed 85-91% size reduction while preserving the Dolby Vision signaling chain. Output files display "Dolby Vision" on iOS and trigger the appropriate HDR processing path on compatible displays.

Notable Findings

Codec tag compatibility: HEVC streams can use either hvc1 or hev1 codec tags in MP4 containers. The difference lies in parameter set storage: hvc1 stores them in the sample entry (out-of-band), while hev1 stores them inline. iOS requires hvc1 for playback - a common pitfall when transcoding. The mp4muxer --hvc1flag 0 parameter and ffmpeg -tag:v hvc1 flag ensure correct tagging.

RPU validity after re-encoding: The RPU metadata remains functionally valid after re-encoding because it describes relative tone mapping adjustments. The re-encoded stream maintains similar scene brightness and color characteristics, allowing the original dynamic metadata to apply correctly. This is the key insight that makes the preservation workflow possible.

NVENC compatibility: The workflow supports both CPU (x265) and GPU (NVENC) encoding. NVENC provides 10-20x faster encoding with acceptable quality for most use cases.

Requirements

  • Python 3.14+
  • ffmpeg (with NVENC support for GPU encoding)
  • exiftool

For Dolby Vision preservation (all pipelines use these):

Installation

git clone https://github.com/jgorostegui/ios-media-toolkit.git
cd ios-media-toolkit
uv sync

Installing DV Tools

# dovi_tool (pre-built binary)
wget https://github.com/quietvoid/dovi_tool/releases/download/2.3.1/dovi_tool-2.3.1-x86_64-unknown-linux-musl.tar.gz
tar -xzf dovi_tool-*.tar.gz && sudo mv dovi_tool /usr/local/bin/

# mp4muxer (build from source)
git clone https://github.com/DolbyLaboratories/dlb_mp4base.git
cd dlb_mp4base/make/mp4muxer/linux_amd64 && make
sudo cp mp4muxer_release /usr/local/bin/mp4muxer

See docs/dolby-vision-tools.md for detailed instructions.

Docker

The Docker image includes all dependencies (ffmpeg, dovi_tool, mp4muxer, exiftool) pre-configured.

Quick Start

# Pull from Docker Hub
docker pull jgorostegui/ios-media-toolkit

# Or build locally
docker build -t imt .

CPU Encoding (x265)

docker run --rm \
  -v /path/to/media:/media \
  jgorostegui/ios-media-toolkit \
  transcode /media/input.MOV -o /media/output --profile balanced

GPU Encoding (NVENC)

Requires NVIDIA GPU with nvidia-container-toolkit installed on the host.

docker run --rm --gpus all \
  -v /path/to/media:/media \
  jgorostegui/ios-media-toolkit \
  transcode /media/input.MOV -o /media/output --profile nvenc_4k

Process Albums

docker run --rm --gpus all \
  -v /mnt/nas_photos:/media \
  jgorostegui/ios-media-toolkit \
  process /media/2025_Thailand/iPhone -o /media/output --profile nvenc_4k

Verify Output

docker run --rm \
  -v /path/to/media:/media \
  jgorostegui/ios-media-toolkit \
  verify /media/output.mp4 -r /media/original.MOV

Check Dependencies

docker run --rm jgorostegui/ios-media-toolkit check

Usage

Process Albums

imt process <album-name>            # Process a single album
imt process --all                   # Process all albums
imt process <album-name> --dry-run  # Preview without changes

Compare Encoding Pipelines

imt compare video.MOV                          # Run all pipelines
imt compare video.MOV -p nvenc_4k -p balanced  # Run specific pipelines
imt list-pipelines                             # List available pipelines

Verify Output Files

imt verify output.mp4                  # Check iPhone compatibility
imt verify output.mp4 -r original.MOV  # Compare against original

Other Commands

imt albums              # List available albums
imt status <album>      # Show processing status
imt favorites <album>   # List favorite files
imt check               # Check system dependencies
imt transcode video.MOV --run  # Transcode single file

Encoding Pipelines

All pipelines preserve Dolby Vision metadata when source has DV.

Pipeline Encoder Resolution Quality Speed Use Case
archival x265 4K CRF 20 0.02x Maximum quality archival
balanced x265 4K CRF 25 0.24x Good quality, reasonable time
nvenc_4k NVENC 4K VBR 15M 0.42x Fast GPU encoding
nvenc_1080p NVENC 1080p VBR 8M 0.86x Fast + smallest files
compact x265 4K CRF 28 ~0.2x Maximum compression
preview NVENC 1080p VBR 4M ~1x Quick preview

Speed is relative to video duration (1x = realtime, 0.5x = 2x slower than realtime).

Choosing a Pipeline

  • Archival: Use when storage isn't a concern and you want best quality
  • Balanced: Good default for most use cases
  • NVENC 4K: When you have an NVIDIA GPU and want fast encoding
  • NVENC 1080p: Maximum compression with acceptable quality for sharing
  • Compact: When storage is critical and you accept quality loss
  • Preview: Quick check before committing to a slow encode

Why Dolby Vision Matters

The Problem

iPhone records video in Dolby Vision Profile 8, which provides:

  • Dynamic HDR: Frame-by-frame brightness/contrast optimization
  • Better highlights: Preserves detail in bright areas (sun, lights)
  • Better shadows: Maintains detail in dark areas
  • Scene adaptation: Adjusts tone mapping per-scene

When you re-encode with standard tools like ffmpeg:

ffmpeg -i input.MOV -c:v libx265 -crf 25 output.mp4  # DV is LOST!

The output plays as basic "HDR" - you lose the dynamic metadata.

The Solution

This tool implements a 6-step workflow that preserves Dolby Vision:

flowchart TD
    A[iPhone Video<br/>DV Profile 8] --> B[Extract HEVC stream]
    B --> C[Extract RPU<br/>dovi_tool]
    B --> D[Re-encode<br/>x265 or NVENC]
    C --> E[Inject RPU<br/>dovi_tool]
    D --> E
    E --> F[Create DV container<br/>mp4muxer]
    F --> G[Add audio/metadata<br/>ffmpeg]
    G --> H[Output with DV preserved]
Loading

Two Levels of Dolby Vision

DV requires metadata at two levels, which is why standard tools fail:

Level What Tool Without It
Stream RPU NAL units in HEVC bitstream dovi_tool No DV data in video
Container dvcC/dvvC boxes in MP4 mp4muxer iPhone shows "HDR" not "Dolby Vision"

ffmpeg can copy existing container boxes but cannot generate them when re-encoding.

Configuration

Edit config/global.yaml:

paths:
  source_base: "/path/to/input/albums"
  output_base: "/path/to/output"

tools:
  dovi_tool: "/usr/local/bin/dovi_tool"
  mp4muxer: "/usr/local/bin/mp4muxer"

transcode:
  enabled: true
  default_pipeline: "balanced"

Verification

The imt verify command checks output files:

imt verify output.mp4 -r original.MOV

Checks performed:

  • Codec tag: Must be hvc1 (not hev1) for iPhone
  • DV container boxes: dvcC/dvvC for "Dolby Vision" badge
  • HDR metadata: BT.2020 primaries, HLG/PQ transfer
  • Metadata: GPS, creation date, device info

iPhone Compatibility

Codec Tag Status
hvc1 ✓ Compatible
dvh1 ✓ Compatible (DV-specific)
hev1 ✗ Won't play on iPhone

Test Results

Tested on 12.4 MB iPhone video (4K, DV Profile 8, ~4 seconds):

Pipeline Output Size Compression Time DV Preserved
archival 17.2 MB -38%* 251s
balanced 10.1 MB 19% 16s
nvenc_4k 8.5 MB 31% 9s
nvenc_1080p 4.7 MB 62% 4.5s

*Archival at CRF 20 can produce larger files than source if source was heavily compressed.

Complete Processing Pipeline

Recommended Settings by Device

Media Type Device Profile Command
Video Any iPhone nvenc_4k imt transcode video.MOV --profile nvenc_4k
ProRAW iPhone 17+ balanced imt dng compress photo.DNG --profile balanced
ProRAW iPhone 12-16 jpeg imt dng compress photo.DNG --profile jpeg

Batch Processing Example

# Process all videos in album with GPU encoding
imt process /media/2025_Trip/iPhone --profile nvenc_4k

# Process all DNGs in a folder
for f in *.DNG; do
  imt dng compress "$f" --profile balanced
done

Expected Results

Input Output Reduction
4K Dolby Vision video (70Mbps) HEVC 15Mbps ~80%
iPhone 17 ProRAW (50MB JXL) DNG balanced ~75%
iPhone 12 ProRAW (32MB LJPEG) JPEG ~88%

ProRAW DNG Processing

New DNG Commands

imt dng info photo.DNG              # Show DNG type (JXL/LJPEG) and metadata
imt dng compress photo.DNG          # Compress with default profile (balanced)
imt dng compress photo.DNG -p jpeg  # Extract Apple JPEG
imt dng list-profiles               # List available profiles

DNG Compression Profiles

Profile Method For Size Reduction RAW Editing
balanced JXL recompress iPhone 17+ JXL ~75-85% ✓ Preserved
lossless JXL lossless iPhone 17+ JXL ~5-12% ✓ Preserved
jpeg Apple Preview Any iPhone ~85-90% ✗ Lost
jpeg_max Apple Preview (q100) Any iPhone ~80-85% ✗ Lost

Recommended Processing Pipeline

For iPhone 17 Pro Max (JXL DNGs):

# Best: JXL recompress - keeps RAW editing, 75-85% smaller
imt dng compress photo.DNG --profile balanced

# Alternative: Apple JPEG if RAW not needed
imt dng compress photo.DNG --profile jpeg

For iPhone 12-16 Pro Max (LJPEG DNGs):

# Only option: Apple JPEG extraction (JXL recompress not supported)
imt dng compress photo.DNG --profile jpeg

Metadata Preservation

DNG outputs (balanced/lossless):

  • ✓ ColorMatrix1/2 (RAW color processing)
  • ✓ BaselineExposure (exposure compensation)
  • ✓ WhiteLevel/BlackLevel (sensor calibration)
  • ✓ GPS coordinates
  • ✓ Date/time, camera info

JPEG outputs:

  • ✓ GPS coordinates
  • ✓ Date/time, camera info
  • ✓ Display P3 color profile
  • ✗ RAW-specific metadata (not applicable)

Why JXL Recompress Only Works on iPhone 17+

iPhone 17 uses JPEG XL compression inside DNG. We can:

  1. Decode JXL tiles with djxl (lossless decode)
  2. Re-encode with lossy JXL at d=1.0 (visually lossless)
  3. Replace tiles in DNG container

iPhone 12-16 uses LJPEG compression. The only decoder (dcraw) applies color matrix during decode, making the pixel values incompatible with the DNG metadata. Result: wrong colors when viewed.

Solution for LJPEG: Extract Apple's embedded preview JPEG, which has correct HDR tone mapping applied at capture time.


Background: iPhone ProRAW (DNG) Technical Details

DNG Format and LJPEG Compression

iPhone 12 Pro and later capture photos in Apple ProRAW, using Adobe's DNG 1.6 specification. The raw Bayer data is compressed using Lossless JPEG (ITU-T.81 Annex H), specifically with predictor mode 7.

LJPEG uses Differential Pulse-Code Modulation (DPCM) with seven predictor modes. Given neighboring pixels A (left), B (above), and C (diagonal):

Mode Formula Type
1 A 1D horizontal
2 B 1D vertical
3 C 1D diagonal
4 A + B − C 2D
5 A + (B − C)/2 2D
6 B + (A − C)/2 2D
7 (A + B)/2 2D average

Apple uses mode 7 Px = (A + B) / 2 for ProRAW compression. This 2D predictor averages horizontal and vertical neighbors, achieving ~2:1 compression on the 12-bit Bayer data.

Decoder Compatibility

The rawspeed library (used by darktable and ImageMagick 7's delegate) only implements predictors 1 and 6. Attempting to decode iPhone ProRAW produces:

LJpegDecompressor.cpp:91 decodeScan(): Unsupported predictor mode: 7

See rawspeed#258 for status. A PR#334 adding modes 2-7 exists but remains unmerged.

Decoder Predictor 7 Notes
LibRaw (dcraw_emu) Full LJ92 implementation
ImageMagick 6 (ufraw delegate) Uses dcraw internally
ImageMagick 7 (darktable delegate) rawspeed limitation
darktable-cli rawspeed limitation
RawTherapee LibRaw-based decoder

Dynamic Range Analysis

16-bit linear output from a 4032×3024 ProRAW file (iPhone 12 Pro Max):

Pipeline Max pixel value % of 16-bit range
LibRaw -6 -T 65,535 100%
ImageMagick 6 -depth 16 65,535 100%
RawTherapee (default profile) 34,322 52%

RawTherapee applies a tone curve and gamut mapping by default, clipping ~48% of highlight data. For linear output, use -p neutral.pp3 with a flat profile or LibRaw directly.

Conversion Pipeline

DNG to HEIC via libheif (heif-enc):

# LibRaw: 16-bit linear TIFF → PNG → HEIC
dcraw_emu -T -6 -W input.DNG          # -6: 16-bit, -T: TIFF, -W: no auto-bright
convert input.tiff -depth 16 temp.png
heif-enc -L temp.png -o output.heic   # -L: lossless

# ImageMagick 6: direct DNG → PNG → HEIC
convert input.DNG -depth 16 temp.png
heif-enc -q 95 temp.png -o output.heic

# Metadata (not preserved by conversion)
exiftool -TagsFromFile input.DNG -all:all output.heic

Size Comparison

Source: IMG_1854.DNG (28.9 MB, 4032×3024, iPhone 12 Pro Max)

Output Pipeline Size vs DNG
01_im6_lossless_FULL_DR.heic IM6 → heif-enc -L 8.9 MB 31%
07_libraw_lossless_FULL_DR.heic dcraw_emu → heif-enc -L 9.2 MB 32%
08_libraw_q95_FULL_DR.heic dcraw_emu → heif-enc -q95 5.2 MB 18%
03_libraw_q85.heic dcraw_emu → heif-enc -q85 5.5 MB 19%
04_im6_q85.heic IM6 → heif-enc -q85 4.6 MB 16%
02_rt_lossless_CLIPPED.heic RawTherapee → heif-enc -L 6.5 MB 23%

LibRaw produces slightly larger lossless output than IM6 due to differences in intermediate PNG encoding (gamma handling). Both preserve full dynamic range. RawTherapee files are smaller because the tone curve discards highlight data.

References

Tools

Dolby Vision:

  • dovi_tool - Dolby Vision RPU extraction and injection
  • dlb_mp4base - Dolby's official MP4 muxer for DV container boxes
  • DoViMuxer - Automated DV muxing wrapper

RAW/DNG Processing:

  • LibRaw - RAW image decoder library with iPhone ProRAW support
  • libheif - HEIF/HEIC encoder/decoder (heif-enc)
  • ImageMagick - Image conversion (v6 uses ufraw delegate for DNG)
  • rawspeed - darktable's RAW decoder (lacks ProRAW support)

Documentation

Dolby Vision:

DNG/ProRAW:

Community Discussions

License

MIT

About

CLI tool for processing iPhone media with Dolby Vision preservation, HEVC transcoding, and Live Photo handling

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages