A command-line tool for processing iPhone photos and videos with Dolby Vision preservation, multiple encoding pipelines, and smart sync.
- Album-based processing: Process entire folders with a single command
- Dolby Vision preservation: Re-encode videos while keeping DV metadata intact
- Multiple encoding pipelines: x265 (CPU) and NVENC (GPU) with configurable quality presets
- Smart sync: SHA256 checksum comparison to skip identical files
- Favorites detection: Reads XMP sidecar metadata to identify favorites
- Live Photo grouping: Automatically pairs photo + video components
- iPhone compatibility verification: Validates codec tags, HDR metadata, and container boxes
- Metadata preservation: GPS location, creation date, device info
Before diving in, here are key terms used throughout this project:
| Term | Description |
|---|---|
| HDR | High Dynamic Range - wider brightness range than standard video (SDR) |
| Dolby Vision (DV) | Premium HDR format with frame-by-frame dynamic metadata |
| HLG | Hybrid Log-Gamma - HDR standard used by iPhone as DV base layer |
| RPU | Reference Processing Unit - Dolby Vision's dynamic metadata |
| HEVC/H.265 | Video codec used by iPhone for 4K HDR recording |
| x265 | Software (CPU) encoder for HEVC - slow but high quality |
| NVENC | NVIDIA hardware (GPU) encoder - fast but requires NVIDIA GPU |
| CRF | Constant Rate Factor - quality-based encoding (lower = better quality, larger file) |
| VBR | Variable Bit Rate - target bitrate encoding with quality fluctuation |
| Codec tag | Container metadata identifying video format (hvc1 required for iPhone) |
iPhone 12 and later record 4K video in Dolby Vision Profile 8.4, a dual-layer HDR format using HLG as the base layer with dynamic metadata overlay. At bitrates around 70-80 Mbps, this results in large files: a typical 30-second clip produces 200+ MB. For archival and storage optimization, re-encoding with x265 can achieve 80-90% compression with minimal perceptual quality loss.
However, standard re-encoding pipelines strip the Dolby Vision metadata. The resulting files play back as standard HDR (HLG), losing the dynamic tone mapping capabilities that distinguish Dolby Vision from static HDR formats. This is a known limitation - while FFmpeg has made progress on DV support, it still cannot generate the required container signaling when re-encoding.
Major platforms face similar challenges. Meta's engineering team documented their work on Instagram's HDR pipeline, noting the complexity of preserving DV metadata through transcoding workflows. Apple's developer documentation on Dolby Vision describes the format architecture but offers no re-encoding solution.
Initial testing with ffmpeg revealed that re-encoded files displayed "HDR" rather than "Dolby Vision" on iOS devices. Analysis of the container structure showed two distinct metadata components that must both be preserved - a challenge discussed extensively in the dovi_tool community:
Stream-level metadata (RPU): Dolby Vision embeds Reference Processing Unit data as supplemental enhancement information (SEI) NAL units within the HEVC bitstream. These contain per-frame tone mapping parameters, color volume transforms, and content-adaptive metadata. When the video is re-encoded, these NAL units are discarded since they reference the original frame data.
Container-level signaling: MP4 files use dvcC and dvvC configuration boxes to signal Dolby Vision presence to decoders. ffmpeg can copy these boxes when remuxing (-c:v copy), but cannot generate them when re-encoding because the stream-level metadata no longer exists.
The preservation workflow requires three tools working in sequence:
-
dovi_tool (quietvoid/dovi_tool): Extracts RPU data from the original HEVC stream before re-encoding, then injects it into the re-encoded stream. This works because RPU metadata describes relative adjustments rather than absolute values—the tone mapping parameters remain valid for re-encoded content with similar characteristics.
-
mp4muxer (DolbyLaboratories/dlb_mp4base): Dolby's reference MP4 muxer that generates proper
dvcC/dvvCcontainer boxes by parsing the RPU data in the input HEVC stream. -
ffmpeg: Handles the actual video re-encoding (x265 or NVENC) and final audio/metadata multiplexing.
The complete pipeline executes six operations:
- Extract raw HEVC bitstream from source MOV container
- Extract RPU binary data using dovi_tool
- Re-encode video with x265/NVENC (compression occurs here)
- Inject original RPU into re-encoded bitstream
- Mux with mp4muxer to generate DV container boxes
- Combine with audio track and copy metadata
Testing confirmed 85-91% size reduction while preserving the Dolby Vision signaling chain. Output files display "Dolby Vision" on iOS and trigger the appropriate HDR processing path on compatible displays.
Codec tag compatibility: HEVC streams can use either hvc1 or hev1 codec tags in MP4 containers. The difference lies in parameter set storage: hvc1 stores them in the sample entry (out-of-band), while hev1 stores them inline. iOS requires hvc1 for playback - a common pitfall when transcoding. The mp4muxer --hvc1flag 0 parameter and ffmpeg -tag:v hvc1 flag ensure correct tagging.
RPU validity after re-encoding: The RPU metadata remains functionally valid after re-encoding because it describes relative tone mapping adjustments. The re-encoded stream maintains similar scene brightness and color characteristics, allowing the original dynamic metadata to apply correctly. This is the key insight that makes the preservation workflow possible.
NVENC compatibility: The workflow supports both CPU (x265) and GPU (NVENC) encoding. NVENC provides 10-20x faster encoding with acceptable quality for most use cases.
- Python 3.14+
- ffmpeg (with NVENC support for GPU encoding)
- exiftool
For Dolby Vision preservation (all pipelines use these):
git clone https://github.com/jgorostegui/ios-media-toolkit.git
cd ios-media-toolkit
uv sync# dovi_tool (pre-built binary)
wget https://github.com/quietvoid/dovi_tool/releases/download/2.3.1/dovi_tool-2.3.1-x86_64-unknown-linux-musl.tar.gz
tar -xzf dovi_tool-*.tar.gz && sudo mv dovi_tool /usr/local/bin/
# mp4muxer (build from source)
git clone https://github.com/DolbyLaboratories/dlb_mp4base.git
cd dlb_mp4base/make/mp4muxer/linux_amd64 && make
sudo cp mp4muxer_release /usr/local/bin/mp4muxerSee docs/dolby-vision-tools.md for detailed instructions.
The Docker image includes all dependencies (ffmpeg, dovi_tool, mp4muxer, exiftool) pre-configured.
# Pull from Docker Hub
docker pull jgorostegui/ios-media-toolkit
# Or build locally
docker build -t imt .docker run --rm \
-v /path/to/media:/media \
jgorostegui/ios-media-toolkit \
transcode /media/input.MOV -o /media/output --profile balancedRequires NVIDIA GPU with nvidia-container-toolkit installed on the host.
docker run --rm --gpus all \
-v /path/to/media:/media \
jgorostegui/ios-media-toolkit \
transcode /media/input.MOV -o /media/output --profile nvenc_4kdocker run --rm --gpus all \
-v /mnt/nas_photos:/media \
jgorostegui/ios-media-toolkit \
process /media/2025_Thailand/iPhone -o /media/output --profile nvenc_4kdocker run --rm \
-v /path/to/media:/media \
jgorostegui/ios-media-toolkit \
verify /media/output.mp4 -r /media/original.MOVdocker run --rm jgorostegui/ios-media-toolkit checkimt process <album-name> # Process a single album
imt process --all # Process all albums
imt process <album-name> --dry-run # Preview without changesimt compare video.MOV # Run all pipelines
imt compare video.MOV -p nvenc_4k -p balanced # Run specific pipelines
imt list-pipelines # List available pipelinesimt verify output.mp4 # Check iPhone compatibility
imt verify output.mp4 -r original.MOV # Compare against originalimt albums # List available albums
imt status <album> # Show processing status
imt favorites <album> # List favorite files
imt check # Check system dependencies
imt transcode video.MOV --run # Transcode single fileAll pipelines preserve Dolby Vision metadata when source has DV.
| Pipeline | Encoder | Resolution | Quality | Speed | Use Case |
|---|---|---|---|---|---|
archival |
x265 | 4K | CRF 20 | 0.02x | Maximum quality archival |
balanced |
x265 | 4K | CRF 25 | 0.24x | Good quality, reasonable time |
nvenc_4k |
NVENC | 4K | VBR 15M | 0.42x | Fast GPU encoding |
nvenc_1080p |
NVENC | 1080p | VBR 8M | 0.86x | Fast + smallest files |
compact |
x265 | 4K | CRF 28 | ~0.2x | Maximum compression |
preview |
NVENC | 1080p | VBR 4M | ~1x | Quick preview |
Speed is relative to video duration (1x = realtime, 0.5x = 2x slower than realtime).
- Archival: Use when storage isn't a concern and you want best quality
- Balanced: Good default for most use cases
- NVENC 4K: When you have an NVIDIA GPU and want fast encoding
- NVENC 1080p: Maximum compression with acceptable quality for sharing
- Compact: When storage is critical and you accept quality loss
- Preview: Quick check before committing to a slow encode
iPhone records video in Dolby Vision Profile 8, which provides:
- Dynamic HDR: Frame-by-frame brightness/contrast optimization
- Better highlights: Preserves detail in bright areas (sun, lights)
- Better shadows: Maintains detail in dark areas
- Scene adaptation: Adjusts tone mapping per-scene
When you re-encode with standard tools like ffmpeg:
ffmpeg -i input.MOV -c:v libx265 -crf 25 output.mp4 # DV is LOST!The output plays as basic "HDR" - you lose the dynamic metadata.
This tool implements a 6-step workflow that preserves Dolby Vision:
flowchart TD
A[iPhone Video<br/>DV Profile 8] --> B[Extract HEVC stream]
B --> C[Extract RPU<br/>dovi_tool]
B --> D[Re-encode<br/>x265 or NVENC]
C --> E[Inject RPU<br/>dovi_tool]
D --> E
E --> F[Create DV container<br/>mp4muxer]
F --> G[Add audio/metadata<br/>ffmpeg]
G --> H[Output with DV preserved]
DV requires metadata at two levels, which is why standard tools fail:
| Level | What | Tool | Without It |
|---|---|---|---|
| Stream | RPU NAL units in HEVC bitstream | dovi_tool | No DV data in video |
| Container | dvcC/dvvC boxes in MP4 |
mp4muxer | iPhone shows "HDR" not "Dolby Vision" |
ffmpeg can copy existing container boxes but cannot generate them when re-encoding.
Edit config/global.yaml:
paths:
source_base: "/path/to/input/albums"
output_base: "/path/to/output"
tools:
dovi_tool: "/usr/local/bin/dovi_tool"
mp4muxer: "/usr/local/bin/mp4muxer"
transcode:
enabled: true
default_pipeline: "balanced"The imt verify command checks output files:
imt verify output.mp4 -r original.MOVChecks performed:
- Codec tag: Must be
hvc1(nothev1) for iPhone - DV container boxes:
dvcC/dvvCfor "Dolby Vision" badge - HDR metadata: BT.2020 primaries, HLG/PQ transfer
- Metadata: GPS, creation date, device info
| Codec Tag | Status |
|---|---|
hvc1 |
✓ Compatible |
dvh1 |
✓ Compatible (DV-specific) |
hev1 |
✗ Won't play on iPhone |
Tested on 12.4 MB iPhone video (4K, DV Profile 8, ~4 seconds):
| Pipeline | Output Size | Compression | Time | DV Preserved |
|---|---|---|---|---|
| archival | 17.2 MB | -38%* | 251s | ✓ |
| balanced | 10.1 MB | 19% | 16s | ✓ |
| nvenc_4k | 8.5 MB | 31% | 9s | ✓ |
| nvenc_1080p | 4.7 MB | 62% | 4.5s | ✓ |
*Archival at CRF 20 can produce larger files than source if source was heavily compressed.
| Media Type | Device | Profile | Command |
|---|---|---|---|
| Video | Any iPhone | nvenc_4k |
imt transcode video.MOV --profile nvenc_4k |
| ProRAW | iPhone 17+ | balanced |
imt dng compress photo.DNG --profile balanced |
| ProRAW | iPhone 12-16 | jpeg |
imt dng compress photo.DNG --profile jpeg |
# Process all videos in album with GPU encoding
imt process /media/2025_Trip/iPhone --profile nvenc_4k
# Process all DNGs in a folder
for f in *.DNG; do
imt dng compress "$f" --profile balanced
done| Input | Output | Reduction |
|---|---|---|
| 4K Dolby Vision video (70Mbps) | HEVC 15Mbps | ~80% |
| iPhone 17 ProRAW (50MB JXL) | DNG balanced | ~75% |
| iPhone 12 ProRAW (32MB LJPEG) | JPEG | ~88% |
imt dng info photo.DNG # Show DNG type (JXL/LJPEG) and metadata
imt dng compress photo.DNG # Compress with default profile (balanced)
imt dng compress photo.DNG -p jpeg # Extract Apple JPEG
imt dng list-profiles # List available profiles| Profile | Method | For | Size Reduction | RAW Editing |
|---|---|---|---|---|
balanced |
JXL recompress | iPhone 17+ JXL | ~75-85% | ✓ Preserved |
lossless |
JXL lossless | iPhone 17+ JXL | ~5-12% | ✓ Preserved |
jpeg |
Apple Preview | Any iPhone | ~85-90% | ✗ Lost |
jpeg_max |
Apple Preview (q100) | Any iPhone | ~80-85% | ✗ Lost |
For iPhone 17 Pro Max (JXL DNGs):
# Best: JXL recompress - keeps RAW editing, 75-85% smaller
imt dng compress photo.DNG --profile balanced
# Alternative: Apple JPEG if RAW not needed
imt dng compress photo.DNG --profile jpegFor iPhone 12-16 Pro Max (LJPEG DNGs):
# Only option: Apple JPEG extraction (JXL recompress not supported)
imt dng compress photo.DNG --profile jpegDNG outputs (balanced/lossless):
- ✓ ColorMatrix1/2 (RAW color processing)
- ✓ BaselineExposure (exposure compensation)
- ✓ WhiteLevel/BlackLevel (sensor calibration)
- ✓ GPS coordinates
- ✓ Date/time, camera info
JPEG outputs:
- ✓ GPS coordinates
- ✓ Date/time, camera info
- ✓ Display P3 color profile
- ✗ RAW-specific metadata (not applicable)
iPhone 17 uses JPEG XL compression inside DNG. We can:
- Decode JXL tiles with
djxl(lossless decode) - Re-encode with lossy JXL at d=1.0 (visually lossless)
- Replace tiles in DNG container
iPhone 12-16 uses LJPEG compression. The only decoder (dcraw) applies color matrix during decode, making the pixel values incompatible with the DNG metadata. Result: wrong colors when viewed.
Solution for LJPEG: Extract Apple's embedded preview JPEG, which has correct HDR tone mapping applied at capture time.
iPhone 12 Pro and later capture photos in Apple ProRAW, using Adobe's DNG 1.6 specification. The raw Bayer data is compressed using Lossless JPEG (ITU-T.81 Annex H), specifically with predictor mode 7.
LJPEG uses Differential Pulse-Code Modulation (DPCM) with seven predictor modes. Given neighboring pixels A (left), B (above), and C (diagonal):
| Mode | Formula | Type |
|---|---|---|
| 1 | A | 1D horizontal |
| 2 | B | 1D vertical |
| 3 | C | 1D diagonal |
| 4 | A + B − C | 2D |
| 5 | A + (B − C)/2 | 2D |
| 6 | B + (A − C)/2 | 2D |
| 7 | (A + B)/2 | 2D average |
Apple uses mode 7 Px = (A + B) / 2 for ProRAW compression. This 2D predictor averages horizontal and vertical neighbors, achieving ~2:1 compression on the 12-bit Bayer data.
The rawspeed library (used by darktable and ImageMagick 7's delegate) only implements predictors 1 and 6. Attempting to decode iPhone ProRAW produces:
LJpegDecompressor.cpp:91 decodeScan(): Unsupported predictor mode: 7
See rawspeed#258 for status. A PR#334 adding modes 2-7 exists but remains unmerged.
| Decoder | Predictor 7 | Notes |
|---|---|---|
| LibRaw (dcraw_emu) | ✓ | Full LJ92 implementation |
| ImageMagick 6 (ufraw delegate) | ✓ | Uses dcraw internally |
| ImageMagick 7 (darktable delegate) | ✗ | rawspeed limitation |
| darktable-cli | ✗ | rawspeed limitation |
| RawTherapee | ✓ | LibRaw-based decoder |
16-bit linear output from a 4032×3024 ProRAW file (iPhone 12 Pro Max):
| Pipeline | Max pixel value | % of 16-bit range |
|---|---|---|
LibRaw -6 -T |
65,535 | 100% |
ImageMagick 6 -depth 16 |
65,535 | 100% |
| RawTherapee (default profile) | 34,322 | 52% |
RawTherapee applies a tone curve and gamut mapping by default, clipping ~48% of highlight data. For linear output, use -p neutral.pp3 with a flat profile or LibRaw directly.
DNG to HEIC via libheif (heif-enc):
# LibRaw: 16-bit linear TIFF → PNG → HEIC
dcraw_emu -T -6 -W input.DNG # -6: 16-bit, -T: TIFF, -W: no auto-bright
convert input.tiff -depth 16 temp.png
heif-enc -L temp.png -o output.heic # -L: lossless
# ImageMagick 6: direct DNG → PNG → HEIC
convert input.DNG -depth 16 temp.png
heif-enc -q 95 temp.png -o output.heic
# Metadata (not preserved by conversion)
exiftool -TagsFromFile input.DNG -all:all output.heicSource: IMG_1854.DNG (28.9 MB, 4032×3024, iPhone 12 Pro Max)
| Output | Pipeline | Size | vs DNG |
|---|---|---|---|
01_im6_lossless_FULL_DR.heic |
IM6 → heif-enc -L | 8.9 MB | 31% |
07_libraw_lossless_FULL_DR.heic |
dcraw_emu → heif-enc -L | 9.2 MB | 32% |
08_libraw_q95_FULL_DR.heic |
dcraw_emu → heif-enc -q95 | 5.2 MB | 18% |
03_libraw_q85.heic |
dcraw_emu → heif-enc -q85 | 5.5 MB | 19% |
04_im6_q85.heic |
IM6 → heif-enc -q85 | 4.6 MB | 16% |
02_rt_lossless_CLIPPED.heic |
RawTherapee → heif-enc -L | 6.5 MB | 23% |
LibRaw produces slightly larger lossless output than IM6 due to differences in intermediate PNG encoding (gamma handling). Both preserve full dynamic range. RawTherapee files are smaller because the tone curve discards highlight data.
Dolby Vision:
- dovi_tool - Dolby Vision RPU extraction and injection
- dlb_mp4base - Dolby's official MP4 muxer for DV container boxes
- DoViMuxer - Automated DV muxing wrapper
RAW/DNG Processing:
- LibRaw - RAW image decoder library with iPhone ProRAW support
- libheif - HEIF/HEIC encoder/decoder (
heif-enc) - ImageMagick - Image conversion (v6 uses ufraw delegate for DNG)
- rawspeed - darktable's RAW decoder (lacks ProRAW support)
Dolby Vision:
- Apple: Incorporating HDR video with Dolby Vision - Official Apple developer guide
- Dolby: iPhone 12 as DV source - Profile 8.4 specifications
- Bitmovin: hvc1 vs hev1 - Codec tag differences explained
DNG/ProRAW:
- Adobe DNG Specification 1.7 - Official format specification
- Lossless JPEG in DNG - LJ92 compression technical details
- ITU-T.81 (JPEG) - Annex H defines lossless mode predictors
- dovi_tool: Re-encoding with RPU preservation - Workflow discussion
- HandBrake: Apple HEVC compatibility - hvc1 requirement for iOS
- FFmpeg Dolby Vision progress - Current FFmpeg DV support status
- rawspeed#258: Predictor mode 7 - iPhone ProRAW decoding issue
MIT