Yeti is a face recognition proof of concept with three layers in one repo:
- A Python engine that runs detection, quality checks, optional liveness, alignment, embedding, and matching
- A FastAPI service that exposes that engine over HTTP
- A Next.js web app for camera-based inspection, enrollment, identification, and comparison
The project is designed to be configurable, local-first, and easy to inspect. Models are loaded from disk, runtime behavior is controlled by YAML, and enrolled templates are stored in a simple local numpy/JSON store under .yeti-store/.
Yeti supports four core workflows:
inspect: analyze one image and return detections, quality metrics, optional liveness, and timing breakdownsonboard: enroll a subject by extracting and storing a face embedding with optional metadataidentify: compare a probe image against enrolled templates and return top matchescompare: compare two images directly without using the enrollment store
In practice, the typical flow is:
- Detect a face in an image
- Reject low-quality captures
- Optionally reject spoof attacks
- Align the face to a canonical pose
- Extract a 512-dimensional embedding
- Match that embedding against stored templates using cosine similarity
.
├── apps/
│ ├── api/ # FastAPI wrapper around the engine
│ └── web/ # Next.js camera-first demo app
├── docs/ # Earlier project notes and reference docs
├── examples/ # Example engine config and model manifest
├── models/ # Local ONNX models downloaded at setup time
├── scripts/ # Small repository utilities
├── src/yeti/ # Core engine implementation
├── tests/ # Python tests
├── pyproject.toml # Python package config
└── uv.lock # Locked Python dependency graph for uv users
- Python 3.12+
- FastAPI
- OpenCV
- ONNX Runtime
- InsightFace
- Next.js 15
- React 18
- TypeScript
Model binaries are not meant to be committed. Download them from the checked-in manifest at examples/model-manifest.yaml into a local models/ directory.
The checked-in config at examples/config.yaml already points to the default filenames that the manifest downloads.
This repo is set up for uv and targets Python 3.12+ via requires-python = ">=3.12" in pyproject.toml.
See the official docs: https://docs.astral.sh/uv/
On macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | shIf you are creating a brand new project from scratch, the usual bootstrap flow is:
uv init --python 3.12That step is not needed for this repo because pyproject.toml and uv.lock already exist.
uv python install 3.12
uv sync --extra dev
uv run yeti-download-models --manifest examples/model-manifest.yaml --output-dir modelsWhat this does:
uv python install 3.12ensures a compatible interpreter is availableuv sync --extra devcreates the project environment if needed, resolves againstuv.lock, and installs the dev dependenciesuv run yeti-download-models ...downloads the ONNX model files into your localmodels/directory
The downloader reads a YAML manifest with this structure:
models:
model-name:
url: https://example.com/path/to/model.onnxThe output filename defaults to the URL basename. You can override it per entry with filename: your-model.onnx.
About the virtual environment:
- You do not need to create a virtual environment manually when using
uv uv sync,uv run, and similar project commands will create and manage the project environment automatically, typically at.venv/- You also do not need to activate it to use the project;
uv run ...is usually the better default
If you want an activated shell anyway:
source .venv/bin/activateIf you prefer the script path instead of the console entrypoint:
uv run python scripts/download_models.py --manifest examples/model-manifest.yaml --output-dir modelscd apps/web
npm install
cd ../..Run commands through uv so you do not depend on shell activation.
Inspect an image:
uv run yeti inspect --config examples/config.yaml --image path/to/image.jpgEnroll a subject:
uv run yeti onboard --config examples/config.yaml --subject-id alice --image path/to/alice.jpgEnroll with metadata:
uv run yeti onboard \
--config examples/config.yaml \
--subject-id alice \
--image path/to/alice.jpg \
--metadata '{"team":"demo","role":"admin"}'Identify a subject:
uv run yeti identify --config examples/config.yaml --image path/to/query.jpg --top-k 3Enable liveness explicitly on any command:
uv run yeti inspect --config examples/config.yaml --image path/to/image.jpg --livenessStart the API:
uv run uvicorn apps.api.main:app --reloadBy default the API reads examples/config.yaml. Override that with:
export YETI_CONFIG=/absolute/path/to/config.yamlHealth check:
curl -s http://127.0.0.1:8000/api/v1/healthInspect:
curl -s -X POST \
-F image=@path/to/image.jpg \
-F run_liveness=false \
http://127.0.0.1:8000/api/v1/inspectOnboard:
curl -s -X POST \
-F image=@path/to/alice.jpg \
-F subject_id=alice \
-F 'metadata={"team":"demo"}' \
http://127.0.0.1:8000/api/v1/onboardIdentify:
curl -s -X POST \
-F image=@path/to/query.jpg \
-F top_k=3 \
-F run_liveness=true \
http://127.0.0.1:8000/api/v1/identifyCompare:
curl -s -X POST \
-F source_image=@path/to/source.jpg \
-F target_image=@path/to/target.jpg \
-F run_liveness=false \
http://127.0.0.1:8000/api/v1/compareList enrolled subjects:
curl -s http://127.0.0.1:8000/api/v1/subjectsStart the frontend:
cd apps/web
npm run devOpen http://localhost:3000.
If the API is not running on http://127.0.0.1:8000, set:
export NEXT_PUBLIC_YETI_API_BASE=http://127.0.0.1:8000The web app is a camera-first Next.js app that lets you:
- inspect a live frame or uploaded image
- capture burst frames for enrollment
- identify a live or uploaded face
- compare two images side by side
- browse enrolled subject records
All engine behavior comes from examples/config.yaml. The main sections are:
runtime: execution providers and runtime loggingmodels: local ONNX pathsdetection: detector thresholds, input size, face-selection rulequality: gating thresholds for image usabilityliveness: whether liveness runs by default and what spoof threshold is usedalignment: aligned crop size, fixed at112embedding: embedding dimension and normalization behaviormatcher: similarity backend, threshold, andtop_kstorage: local template and metadata persistence paths
The default matcher uses cosine similarity over normalized embeddings and stores data locally in:
.yeti-store/templates.npz.yeti-store/metadata.json
The current matcher is LocalNumpyMatcher, which keeps enrollment data in two files:
.yeti-store/templates.npz: a numpy archive containing oneembeddingsarray.yeti-store/metadata.json: subject records, metadata, and storage version
The embeddings array is stored as float32 with shape:
(N, 512)
Where:
Nis the number of successful onboarded templates512is the embedding dimension produced byglintr100.onnx
Each successful onboard call appends one row to that array and one matching record to metadata.json. The matcher validates on load that the number of embedding rows matches the number of metadata records.
Embeddings are normalized during extraction when embedding.normalize: true, which is the default config behavior.
That means each embedding vector is scaled to unit length:
v_normalized = v / ||v||
Once vectors are unit-normalized, cosine similarity becomes the same as a dot product:
cosine(a, b) = a · b
That is exactly how matching is implemented:
- identification scores all enrolled templates with
self._embeddings @ query - pairwise comparison scores two images with
source @ target
This is efficient because the enrolled embeddings are already stacked in one numpy matrix, so one query can be scored against the full gallery in a single matrix-vector multiply.
The match decision is then:
- sort candidates by descending score
- take the top
k - accept only if the best score is greater than or equal to
matcher.threshold
Yeti uses YuNet through OpenCV FaceDetectorYN. For each detected face it returns:
- bounding box
- confidence score
- five landmarks: left eye, right eye, nose, left mouth corner, right mouth corner
If multiple faces are found, the pipeline keeps one face for downstream processing based on detection.face_selection, which supports:
largest_facehighest_score
Before Yeti spends compute on embedding or matching, it checks whether the face is usable. The quality gate measures:
- face size
- brightness
- sharpness
- inter-eye distance
- yaw
- pitch
- roll
It can reject captures for reasons such as:
face_too_smallunderexposedoverexposedblurryeyes_too_closeyaw_too_largepitch_too_largeroll_too_large
This is a practical FR lesson: most recognition errors start with poor input quality, not with the matcher.
Yeti can optionally run anti-spoofing with 2.7_80x80_MiniFASNetV2.onnx. That model is downloaded by the manifest and used to distinguish a real face from a replay or print attack.
Current implementation details:
- the liveness crop is larger than the tight face crop
- input is converted from BGR to RGB
- the crop is resized to
80x80 - pixel values are normalized to
[0, 1] - class index
2is treated as the live class
The engine reports:
spoof_scorepassedlabelasliveorspoof
Liveness is disabled by default in the example config for inspect, onboard, and identify, but it can be enabled per request or via config.
Recognition models are sensitive to pose. Yeti aligns each accepted face to a standard 112x112 view using the five landmarks and an affine warp. This is what makes embeddings from two different captures comparable.
Conceptually, alignment reduces variation from:
- head rotation
- scale
- translation
- small pose differences
Without alignment, even a strong embedding model becomes much less stable.
Yeti uses glintr100.onnx to convert the aligned face crop into a 512-dimensional embedding vector.
This is the core FR idea: the model does not classify people directly. Instead, it maps each face into a vector space where:
- same-person vectors should be close together
- different-person vectors should be farther apart
The implementation normalizes embeddings so cosine similarity can be used directly as a dot product.
The local matcher stores embeddings in numpy format and compares a query vector against all stored templates.
Two matching modes exist:
- identification: one query against the enrolled gallery
- verification/comparison: one image against one other image
For identification, Yeti:
- scores the query against all stored embeddings
- sorts by descending score
- returns the top
kcandidates - accepts the match only if the best score meets
matcher.threshold
If you want to understand face recognition beyond “run the model,” these are the core concepts this repo exposes directly.
Finding a face in an image is a different task from telling whose face it is. A system can detect well and still recognize poorly.
The five landmark points are not just visualization data. They drive pose estimation, quality checks, and alignment.
Recognition systems degrade quickly on:
- tiny faces
- blur
- harsh brightness
- strong yaw, pitch, or roll
- low inter-eye resolution
Good gating often improves production behavior more than lowering a match threshold.
A recognition embedding is built to ignore nuisance factors and preserve identity. A liveness model is built to notice texture and presentation artifacts. One does not replace the other.
Modern FR systems usually do not identify users by classification at inference time. They compare embeddings by similarity. That is why threshold tuning matters.
The matcher threshold controls the balance between false accepts and false rejects. Lower thresholds are more permissive; higher thresholds are stricter. There is no universally correct value without evaluation data.
If you enroll bad templates, the gallery becomes noisy. A recognition system cannot recover from poor enrollment examples.
- verification asks: “do these two images belong to the same person?”
- identification asks: “which enrolled person is this?”
The risk profile is different because identification compares against a full gallery, not a single claimed identity.
This repo stores one embedding per successful onboard call. That keeps the design simple, but it means:
- duplicate enrollments create multiple templates
- no clustering or template consolidation happens automatically
- metadata is descriptive only; it is not used for scoring
Run Python tests:
uv run pytestRun frontend checks:
cd apps/web
npm run typecheck
npm run build- If the models directory is empty, rerun
uv run yeti-download-models --manifest examples/model-manifest.yaml --output-dir models. - If the detector fails to initialize, verify that
models/face_detection_yunet_2023mar.onnxis a real ONNX model file and not a bad download. - If
glintr100.onnxfails to load, verify the file is complete and not truncated. - If every liveness result looks wrong, check the chosen model file, the configured crop contract, and whether you actually enabled liveness for that request.
- If the web app cannot reach the API, confirm the API is serving at
http://127.0.0.1:8000or setNEXT_PUBLIC_YETI_API_BASE. - If onboarding appears to work but later identification finds nothing, inspect
.yeti-store/templates.npzand.yeti-store/metadata.jsonto confirm templates were written where you expect.