This repository contains a minimal reproducible example demonstrating a severe CoreML inference latency issue when Neural Engine compute units are executing on frames sourced directly from AVCaptureSession.
When running over a synthetic CVPixelBuffer, inference for SCI_output_image_int8.mlpackage takes ~1.0ms.
When pulling 640x480 frames from the camera, the exact same inference takes ~17.0ms, despite preprocessing/resizing taking <0.5ms combined.
- macOS 14.3+ (Tested on M2 Max)
- Apple Silicon Mac with Neural Engine
- Access to the onboard built-in camera
1. Compile the tool (ensure optimizations are on)
xcrun swiftc -O camera_coreml_benchmark.swift -o camera_coreml_benchmark2. Run Synthetic Benchmark (Expected: Fast)
./camera_coreml_benchmark \
--model SCI_output_image_int8.mlpackage \
--source synthetic \
--compute cpuAndNeuralEngine \
--warmup 20 \
--iterations 100(Notice the Inference (ms) mean is normally around ~1.0 ms).
3. Run Camera Benchmark (Issue: Slow)
./camera_coreml_benchmark \
--model SCI_output_image_int8.mlpackage \
--source camera \
--preset vga \
--fps 30 \
--compute cpuAndNeuralEngine \
--warmup 20 \
--iterations 100(Notice the Inference (ms) mean balloons to ~17-20 ms even after warmup, showing a massive penalty specifically when Neural Engine is used on live AVFoundation buffers).
Restricting compute exclusively to .cpuOnly dramatically reduces the camera-source penalty overhead compared to .cpuAndNeuralEngine. It appears the ANE specifically is imposing massive overhead when touching camera-sourced CoreVideo buffers (perhaps due to deep unoptimized format conversions, implicit copies breaking zero-copy, or poor ANE power state wake signaling between 30fps frames).