iOS Media Toolkit

A command-line tool for processing iPhone photos and videos with Dolby Vision preservation, multiple encoding pipelines, and smart sync.

Features

Album-based processing: Process entire folders with a single command
Dolby Vision preservation: Re-encode videos while keeping DV metadata intact
Multiple encoding pipelines: x265 (CPU) and NVENC (GPU) with configurable quality presets
Smart sync: SHA256 checksum comparison to skip identical files
Favorites detection: Reads XMP sidecar metadata to identify favorites
Live Photo grouping: Automatically pairs photo + video components
iPhone compatibility verification: Validates codec tags, HDR metadata, and container boxes
Metadata preservation: GPS location, creation date, device info

Terminology

Before diving in, here are key terms used throughout this project:

Term	Description
HDR	High Dynamic Range - wider brightness range than standard video (SDR)
Dolby Vision (DV)	Premium HDR format with frame-by-frame dynamic metadata
HLG	Hybrid Log-Gamma - HDR standard used by iPhone as DV base layer
RPU	Reference Processing Unit - Dolby Vision's dynamic metadata
HEVC/H.265	Video codec used by iPhone for 4K HDR recording
x265	Software (CPU) encoder for HEVC - slow but high quality
NVENC	NVIDIA hardware (GPU) encoder - fast but requires NVIDIA GPU
CRF	Constant Rate Factor - quality-based encoding (lower = better quality, larger file)
VBR	Variable Bit Rate - target bitrate encoding with quality fluctuation
Codec tag	Container metadata identifying video format (`hvc1` required for iPhone)

Background: The Dolby Vision Re-encoding Problem

Motivation

iPhone 12 and later record 4K video in Dolby Vision Profile 8.4, a dual-layer HDR format using HLG as the base layer with dynamic metadata overlay. At bitrates around 70-80 Mbps, this results in large files: a typical 30-second clip produces 200+ MB. For archival and storage optimization, re-encoding with x265 can achieve 80-90% compression with minimal perceptual quality loss.

However, standard re-encoding pipelines strip the Dolby Vision metadata. The resulting files play back as standard HDR (HLG), losing the dynamic tone mapping capabilities that distinguish Dolby Vision from static HDR formats. This is a known limitation - while FFmpeg has made progress on DV support, it still cannot generate the required container signaling when re-encoding.

Major platforms face similar challenges. Meta's engineering team documented their work on Instagram's HDR pipeline, noting the complexity of preserving DV metadata through transcoding workflows. Apple's developer documentation on Dolby Vision describes the format architecture but offers no re-encoding solution.

Understanding the Two-Layer Metadata Problem

Initial testing with ffmpeg revealed that re-encoded files displayed "HDR" rather than "Dolby Vision" on iOS devices. Analysis of the container structure showed two distinct metadata components that must both be preserved - a challenge discussed extensively in the dovi_tool community:

Stream-level metadata (RPU): Dolby Vision embeds Reference Processing Unit data as supplemental enhancement information (SEI) NAL units within the HEVC bitstream. These contain per-frame tone mapping parameters, color volume transforms, and content-adaptive metadata. When the video is re-encoded, these NAL units are discarded since they reference the original frame data.

Container-level signaling: MP4 files use dvcC and dvvC configuration boxes to signal Dolby Vision presence to decoders. ffmpeg can copy these boxes when remuxing (-c:v copy), but cannot generate them when re-encoding because the stream-level metadata no longer exists.

Solution Architecture

The preservation workflow requires three tools working in sequence:

dovi_tool (quietvoid/dovi_tool): Extracts RPU data from the original HEVC stream before re-encoding, then injects it into the re-encoded stream. This works because RPU metadata describes relative adjustments rather than absolute values—the tone mapping parameters remain valid for re-encoded content with similar characteristics.
mp4muxer (DolbyLaboratories/dlb_mp4base): Dolby's reference MP4 muxer that generates proper dvcC/dvvC container boxes by parsing the RPU data in the input HEVC stream.
ffmpeg: Handles the actual video re-encoding (x265 or NVENC) and final audio/metadata multiplexing.

Implementation Details

The complete pipeline executes six operations:

Extract raw HEVC bitstream from source MOV container
Extract RPU binary data using dovi_tool
Re-encode video with x265/NVENC (compression occurs here)
Inject original RPU into re-encoded bitstream
Mux with mp4muxer to generate DV container boxes
Combine with audio track and copy metadata

Testing confirmed 85-91% size reduction while preserving the Dolby Vision signaling chain. Output files display "Dolby Vision" on iOS and trigger the appropriate HDR processing path on compatible displays.

Notable Findings

Codec tag compatibility: HEVC streams can use either hvc1 or hev1 codec tags in MP4 containers. The difference lies in parameter set storage: hvc1 stores them in the sample entry (out-of-band), while hev1 stores them inline. iOS requires hvc1 for playback - a common pitfall when transcoding. The mp4muxer --hvc1flag 0 parameter and ffmpeg -tag:v hvc1 flag ensure correct tagging.

RPU validity after re-encoding: The RPU metadata remains functionally valid after re-encoding because it describes relative tone mapping adjustments. The re-encoded stream maintains similar scene brightness and color characteristics, allowing the original dynamic metadata to apply correctly. This is the key insight that makes the preservation workflow possible.

NVENC compatibility: The workflow supports both CPU (x265) and GPU (NVENC) encoding. NVENC provides 10-20x faster encoding with acceptable quality for most use cases.

Requirements

Python 3.14+
ffmpeg (with NVENC support for GPU encoding)
exiftool

For Dolby Vision preservation (all pipelines use these):

dovi_tool - RPU extraction/injection
mp4muxer - DV container muxing

Installation

git clone https://github.com/jgorostegui/ios-media-toolkit.git
cd ios-media-toolkit
uv sync

Installing DV Tools

# dovi_tool (pre-built binary)
wget https://github.com/quietvoid/dovi_tool/releases/download/2.3.1/dovi_tool-2.3.1-x86_64-unknown-linux-musl.tar.gz
tar -xzf dovi_tool-*.tar.gz && sudo mv dovi_tool /usr/local/bin/

# mp4muxer (build from source)
git clone https://github.com/DolbyLaboratories/dlb_mp4base.git
cd dlb_mp4base/make/mp4muxer/linux_amd64 && make
sudo cp mp4muxer_release /usr/local/bin/mp4muxer

See docs/dolby-vision-tools.md for detailed instructions.

Docker

The Docker image includes all dependencies (ffmpeg, dovi_tool, mp4muxer, exiftool) pre-configured.

Quick Start

# Pull from Docker Hub
docker pull jgorostegui/ios-media-toolkit

# Or build locally
docker build -t imt .

CPU Encoding (x265)

docker run --rm \
  -v /path/to/media:/media \
  jgorostegui/ios-media-toolkit \
  transcode /media/input.MOV -o /media/output --profile balanced

GPU Encoding (NVENC)

Requires NVIDIA GPU with nvidia-container-toolkit installed on the host.

docker run --rm --gpus all \
  -v /path/to/media:/media \
  jgorostegui/ios-media-toolkit \
  transcode /media/input.MOV -o /media/output --profile nvenc_4k

Process Albums

docker run --rm --gpus all \
  -v /mnt/nas_photos:/media \
  jgorostegui/ios-media-toolkit \
  process /media/2025_Thailand/iPhone -o /media/output --profile nvenc_4k

Verify Output

docker run --rm \
  -v /path/to/media:/media \
  jgorostegui/ios-media-toolkit \
  verify /media/output.mp4 -r /media/original.MOV

Check Dependencies

docker run --rm jgorostegui/ios-media-toolkit check

Usage

Process Albums

imt process <album-name>            # Process a single album
imt process --all                   # Process all albums
imt process <album-name> --dry-run  # Preview without changes

Compare Encoding Pipelines

imt compare video.MOV                          # Run all pipelines
imt compare video.MOV -p nvenc_4k -p balanced  # Run specific pipelines
imt list-pipelines                             # List available pipelines

Verify Output Files

imt verify output.mp4                  # Check iPhone compatibility
imt verify output.mp4 -r original.MOV  # Compare against original

Other Commands

imt albums              # List available albums
imt status <album>      # Show processing status
imt favorites <album>   # List favorite files
imt check               # Check system dependencies
imt transcode video.MOV --run  # Transcode single file

Encoding Pipelines

All pipelines preserve Dolby Vision metadata when source has DV.

Pipeline	Encoder	Resolution	Quality	Speed	Use Case
`archival`	x265	4K	CRF 20	0.02x	Maximum quality archival
`balanced`	x265	4K	CRF 25	0.24x	Good quality, reasonable time
`nvenc_4k`	NVENC	4K	VBR 15M	0.42x	Fast GPU encoding
`nvenc_1080p`	NVENC	1080p	VBR 8M	0.86x	Fast + smallest files
`compact`	x265	4K	CRF 28	~0.2x	Maximum compression
`preview`	NVENC	1080p	VBR 4M	~1x	Quick preview

Speed is relative to video duration (1x = realtime, 0.5x = 2x slower than realtime).

Choosing a Pipeline

Archival: Use when storage isn't a concern and you want best quality
Balanced: Good default for most use cases
NVENC 4K: When you have an NVIDIA GPU and want fast encoding
NVENC 1080p: Maximum compression with acceptable quality for sharing
Compact: When storage is critical and you accept quality loss
Preview: Quick check before committing to a slow encode

Why Dolby Vision Matters

The Problem

iPhone records video in Dolby Vision Profile 8, which provides:

Dynamic HDR: Frame-by-frame brightness/contrast optimization
Better highlights: Preserves detail in bright areas (sun, lights)
Better shadows: Maintains detail in dark areas
Scene adaptation: Adjusts tone mapping per-scene

When you re-encode with standard tools like ffmpeg:

ffmpeg -i input.MOV -c:v libx265 -crf 25 output.mp4  # DV is LOST!

The output plays as basic "HDR" - you lose the dynamic metadata.

The Solution

This tool implements a 6-step workflow that preserves Dolby Vision:

flowchart TD
    A[iPhone Video<br/>DV Profile 8] --> B[Extract HEVC stream]
    B --> C[Extract RPU<br/>dovi_tool]
    B --> D[Re-encode<br/>x265 or NVENC]
    C --> E[Inject RPU<br/>dovi_tool]
    D --> E
    E --> F[Create DV container<br/>mp4muxer]
    F --> G[Add audio/metadata<br/>ffmpeg]
    G --> H[Output with DV preserved]

Two Levels of Dolby Vision

DV requires metadata at two levels, which is why standard tools fail:

Level	What	Tool	Without It
Stream	RPU NAL units in HEVC bitstream	dovi_tool	No DV data in video
Container	`dvcC`/`dvvC` boxes in MP4	mp4muxer	iPhone shows "HDR" not "Dolby Vision"

ffmpeg can copy existing container boxes but cannot generate them when re-encoding.

Configuration

Edit config/global.yaml:

paths:
  source_base: "/path/to/input/albums"
  output_base: "/path/to/output"

tools:
  dovi_tool: "/usr/local/bin/dovi_tool"
  mp4muxer: "/usr/local/bin/mp4muxer"

transcode:
  enabled: true
  default_pipeline: "balanced"

Verification

The imt verify command checks output files:

imt verify output.mp4 -r original.MOV

Checks performed:

Codec tag: Must be hvc1 (not hev1) for iPhone
DV container boxes: dvcC/dvvC for "Dolby Vision" badge
HDR metadata: BT.2020 primaries, HLG/PQ transfer
Metadata: GPS, creation date, device info

iPhone Compatibility

Codec Tag	Status
`hvc1`	✓ Compatible
`dvh1`	✓ Compatible (DV-specific)
`hev1`	✗ Won't play on iPhone

Test Results

Tested on 12.4 MB iPhone video (4K, DV Profile 8, ~4 seconds):

Pipeline	Output Size	Compression	Time	DV Preserved
archival	17.2 MB	-38%*	251s	✓
balanced	10.1 MB	19%	16s	✓
nvenc_4k	8.5 MB	31%	9s	✓
nvenc_1080p	4.7 MB	62%	4.5s	✓

*Archival at CRF 20 can produce larger files than source if source was heavily compressed.

Complete Processing Pipeline

Recommended Settings by Device

Media Type	Device	Profile	Command
Video	Any iPhone	`nvenc_4k`	`imt transcode video.MOV --profile nvenc_4k`
ProRAW	iPhone 17+	`balanced`	`imt dng compress photo.DNG --profile balanced`
ProRAW	iPhone 12-16	`jpeg`	`imt dng compress photo.DNG --profile jpeg`

Batch Processing Example

# Process all videos in album with GPU encoding
imt process /media/2025_Trip/iPhone --profile nvenc_4k

# Process all DNGs in a folder
for f in *.DNG; do
  imt dng compress "$f" --profile balanced
done

Expected Results

Input	Output	Reduction
4K Dolby Vision video (70Mbps)	HEVC 15Mbps	~80%
iPhone 17 ProRAW (50MB JXL)	DNG balanced	~75%
iPhone 12 ProRAW (32MB LJPEG)	JPEG	~88%

ProRAW DNG Processing

New DNG Commands

imt dng info photo.DNG              # Show DNG type (JXL/LJPEG) and metadata
imt dng compress photo.DNG          # Compress with default profile (balanced)
imt dng compress photo.DNG -p jpeg  # Extract Apple JPEG
imt dng list-profiles               # List available profiles

DNG Compression Profiles

Profile	Method	For	Size Reduction	RAW Editing
`balanced`	JXL recompress	iPhone 17+ JXL	~75-85%	✓ Preserved
`lossless`	JXL lossless	iPhone 17+ JXL	~5-12%	✓ Preserved
`jpeg`	Apple Preview	Any iPhone	~85-90%	✗ Lost
`jpeg_max`	Apple Preview (q100)	Any iPhone	~80-85%	✗ Lost

Recommended Processing Pipeline

For iPhone 17 Pro Max (JXL DNGs):

# Best: JXL recompress - keeps RAW editing, 75-85% smaller
imt dng compress photo.DNG --profile balanced

# Alternative: Apple JPEG if RAW not needed
imt dng compress photo.DNG --profile jpeg

For iPhone 12-16 Pro Max (LJPEG DNGs):

# Only option: Apple JPEG extraction (JXL recompress not supported)
imt dng compress photo.DNG --profile jpeg

Metadata Preservation

DNG outputs (balanced/lossless):

✓ ColorMatrix1/2 (RAW color processing)
✓ BaselineExposure (exposure compensation)
✓ WhiteLevel/BlackLevel (sensor calibration)
✓ GPS coordinates
✓ Date/time, camera info

JPEG outputs:

✓ GPS coordinates
✓ Date/time, camera info
✓ Display P3 color profile
✗ RAW-specific metadata (not applicable)

Why JXL Recompress Only Works on iPhone 17+

iPhone 17 uses JPEG XL compression inside DNG. We can:

Decode JXL tiles with djxl (lossless decode)
Re-encode with lossy JXL at d=1.0 (visually lossless)
Replace tiles in DNG container

iPhone 12-16 uses LJPEG compression. The only decoder (dcraw) applies color matrix during decode, making the pixel values incompatible with the DNG metadata. Result: wrong colors when viewed.

Solution for LJPEG: Extract Apple's embedded preview JPEG, which has correct HDR tone mapping applied at capture time.

Background: iPhone ProRAW (DNG) Technical Details

DNG Format and LJPEG Compression

iPhone 12 Pro and later capture photos in Apple ProRAW, using Adobe's DNG 1.6 specification. The raw Bayer data is compressed using Lossless JPEG (ITU-T.81 Annex H), specifically with predictor mode 7.

LJPEG uses Differential Pulse-Code Modulation (DPCM) with seven predictor modes. Given neighboring pixels A (left), B (above), and C (diagonal):

Mode	Formula	Type
1	A	1D horizontal
2	B	1D vertical
3	C	1D diagonal
4	A + B − C	2D
5	A + (B − C)/2	2D
6	B + (A − C)/2	2D
7	(A + B)/2	2D average

Apple uses mode 7 Px = (A + B) / 2 for ProRAW compression. This 2D predictor averages horizontal and vertical neighbors, achieving ~2:1 compression on the 12-bit Bayer data.

Decoder Compatibility

The rawspeed library (used by darktable and ImageMagick 7's delegate) only implements predictors 1 and 6. Attempting to decode iPhone ProRAW produces:

LJpegDecompressor.cpp:91 decodeScan(): Unsupported predictor mode: 7

See rawspeed#258 for status. A PR#334 adding modes 2-7 exists but remains unmerged.

Decoder	Predictor 7	Notes
LibRaw (dcraw_emu)	✓	Full LJ92 implementation
ImageMagick 6 (ufraw delegate)	✓	Uses dcraw internally
ImageMagick 7 (darktable delegate)	✗	rawspeed limitation
darktable-cli	✗	rawspeed limitation
RawTherapee	✓	LibRaw-based decoder

Dynamic Range Analysis

16-bit linear output from a 4032×3024 ProRAW file (iPhone 12 Pro Max):

Pipeline	Max pixel value	% of 16-bit range
LibRaw `-6 -T`	65,535	100%
ImageMagick 6 `-depth 16`	65,535	100%
RawTherapee (default profile)	34,322	52%

RawTherapee applies a tone curve and gamut mapping by default, clipping ~48% of highlight data. For linear output, use -p neutral.pp3 with a flat profile or LibRaw directly.

Conversion Pipeline

DNG to HEIC via libheif (heif-enc):

# LibRaw: 16-bit linear TIFF → PNG → HEIC
dcraw_emu -T -6 -W input.DNG          # -6: 16-bit, -T: TIFF, -W: no auto-bright
convert input.tiff -depth 16 temp.png
heif-enc -L temp.png -o output.heic   # -L: lossless

# ImageMagick 6: direct DNG → PNG → HEIC
convert input.DNG -depth 16 temp.png
heif-enc -q 95 temp.png -o output.heic

# Metadata (not preserved by conversion)
exiftool -TagsFromFile input.DNG -all:all output.heic

Size Comparison

Source: IMG_1854.DNG (28.9 MB, 4032×3024, iPhone 12 Pro Max)

Output	Pipeline	Size	vs DNG
`01_im6_lossless_FULL_DR.heic`	IM6 → heif-enc -L	8.9 MB	31%
`07_libraw_lossless_FULL_DR.heic`	dcraw_emu → heif-enc -L	9.2 MB	32%
`08_libraw_q95_FULL_DR.heic`	dcraw_emu → heif-enc -q95	5.2 MB	18%
`03_libraw_q85.heic`	dcraw_emu → heif-enc -q85	5.5 MB	19%
`04_im6_q85.heic`	IM6 → heif-enc -q85	4.6 MB	16%
`02_rt_lossless_CLIPPED.heic`	RawTherapee → heif-enc -L	6.5 MB	23%

LibRaw produces slightly larger lossless output than IM6 due to differences in intermediate PNG encoding (gamma handling). Both preserve full dynamic range. RawTherapee files are smaller because the tone curve discards highlight data.

References

Tools

Dolby Vision:

dovi_tool - Dolby Vision RPU extraction and injection
dlb_mp4base - Dolby's official MP4 muxer for DV container boxes
DoViMuxer - Automated DV muxing wrapper

RAW/DNG Processing:

LibRaw - RAW image decoder library with iPhone ProRAW support
libheif - HEIF/HEIC encoder/decoder (heif-enc)
ImageMagick - Image conversion (v6 uses ufraw delegate for DNG)
rawspeed - darktable's RAW decoder (lacks ProRAW support)

Documentation

Dolby Vision:

Apple: Incorporating HDR video with Dolby Vision - Official Apple developer guide
Dolby: iPhone 12 as DV source - Profile 8.4 specifications
Bitmovin: hvc1 vs hev1 - Codec tag differences explained

DNG/ProRAW:

Adobe DNG Specification 1.7 - Official format specification
Lossless JPEG in DNG - LJ92 compression technical details
ITU-T.81 (JPEG) - Annex H defines lossless mode predictors

Community Discussions

dovi_tool: Re-encoding with RPU preservation - Workflow discussion
HandBrake: Apple HEVC compatibility - hvc1 requirement for iOS
FFmpeg Dolby Vision progress - Current FFmpeg DV support status
rawspeed#258: Predictor mode 7 - iPhone ProRAW decoding issue

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
config		config
docs		docs
src/ios_media_toolkit		src/ios_media_toolkit
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

iOS Media Toolkit

Features

Terminology

Background: The Dolby Vision Re-encoding Problem

Motivation

Understanding the Two-Layer Metadata Problem

Solution Architecture

Implementation Details

Notable Findings

Requirements

Installation

Installing DV Tools

Docker

Quick Start

CPU Encoding (x265)

GPU Encoding (NVENC)

Process Albums

Verify Output

Check Dependencies

Usage

Process Albums

Compare Encoding Pipelines

Verify Output Files

Other Commands

Encoding Pipelines

Choosing a Pipeline

Why Dolby Vision Matters

The Problem

The Solution

Two Levels of Dolby Vision

Configuration

Verification

iPhone Compatibility

Test Results

Complete Processing Pipeline

Recommended Settings by Device

Batch Processing Example

Expected Results

ProRAW DNG Processing

New DNG Commands

DNG Compression Profiles

Recommended Processing Pipeline

Metadata Preservation

Why JXL Recompress Only Works on iPhone 17+

Background: iPhone ProRAW (DNG) Technical Details

DNG Format and LJPEG Compression

Decoder Compatibility

Dynamic Range Analysis

Conversion Pipeline

Size Comparison

References

Tools

Documentation

Community Discussions

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages