Yeti

Yeti is a face recognition proof of concept with three layers in one repo:

A Python engine that runs detection, quality checks, optional liveness, alignment, embedding, and matching
A FastAPI service that exposes that engine over HTTP
A Next.js web app for camera-based inspection, enrollment, identification, and comparison

The project is designed to be configurable, local-first, and easy to inspect. Models are loaded from disk, runtime behavior is controlled by YAML, and enrolled templates are stored in a simple local numpy/JSON store under .yeti-store/.

What This Project Does

Yeti supports four core workflows:

inspect: analyze one image and return detections, quality metrics, optional liveness, and timing breakdowns
onboard: enroll a subject by extracting and storing a face embedding with optional metadata
identify: compare a probe image against enrolled templates and return top matches
compare: compare two images directly without using the enrollment store

In practice, the typical flow is:

Detect a face in an image
Reject low-quality captures
Optionally reject spoof attacks
Align the face to a canonical pose
Extract a 512-dimensional embedding
Match that embedding against stored templates using cosine similarity

Repo Layout

.
├── apps/
│   ├── api/        # FastAPI wrapper around the engine
│   └── web/        # Next.js camera-first demo app
├── docs/           # Earlier project notes and reference docs
├── examples/       # Example engine config and model manifest
├── models/         # Local ONNX models downloaded at setup time
├── scripts/        # Small repository utilities
├── src/yeti/       # Core engine implementation
├── tests/          # Python tests
├── pyproject.toml  # Python package config
└── uv.lock         # Locked Python dependency graph for uv users

Stack

Python 3.12+
FastAPI
OpenCV
ONNX Runtime
InsightFace
Next.js 15
React 18
TypeScript

Requirements

Model binaries are not meant to be committed. Download them from the checked-in manifest at examples/model-manifest.yaml into a local models/ directory.

The checked-in config at examples/config.yaml already points to the default filenames that the manifest downloads.

Setup

This repo is set up for uv and targets Python 3.12+ via requires-python = ">=3.12" in pyproject.toml.

Install `uv`

See the official docs: https://docs.astral.sh/uv/

On macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

`uv` project basics

If you are creating a brand new project from scratch, the usual bootstrap flow is:

uv init --python 3.12

That step is not needed for this repo because pyproject.toml and uv.lock already exist.

Python engine and API with `uv`

uv python install 3.12
uv sync --extra dev
uv run yeti-download-models --manifest examples/model-manifest.yaml --output-dir models

What this does:

uv python install 3.12 ensures a compatible interpreter is available
uv sync --extra dev creates the project environment if needed, resolves against uv.lock, and installs the dev dependencies
uv run yeti-download-models ... downloads the ONNX model files into your local models/ directory

The downloader reads a YAML manifest with this structure:

models:
  model-name:
    url: https://example.com/path/to/model.onnx

The output filename defaults to the URL basename. You can override it per entry with filename: your-model.onnx.

About the virtual environment:

You do not need to create a virtual environment manually when using uv
uv sync, uv run, and similar project commands will create and manage the project environment automatically, typically at .venv/
You also do not need to activate it to use the project; uv run ... is usually the better default

If you want an activated shell anyway:

source .venv/bin/activate

If you prefer the script path instead of the console entrypoint:

uv run python scripts/download_models.py --manifest examples/model-manifest.yaml --output-dir models

Web app

cd apps/web
npm install
cd ../..

How To Run

CLI

Run commands through uv so you do not depend on shell activation.

Inspect an image:

uv run yeti inspect --config examples/config.yaml --image path/to/image.jpg

Enroll a subject:

uv run yeti onboard --config examples/config.yaml --subject-id alice --image path/to/alice.jpg

Enroll with metadata:

uv run yeti onboard \
  --config examples/config.yaml \
  --subject-id alice \
  --image path/to/alice.jpg \
  --metadata '{"team":"demo","role":"admin"}'

Identify a subject:

uv run yeti identify --config examples/config.yaml --image path/to/query.jpg --top-k 3

Enable liveness explicitly on any command:

uv run yeti inspect --config examples/config.yaml --image path/to/image.jpg --liveness

API

Start the API:

uv run uvicorn apps.api.main:app --reload

By default the API reads examples/config.yaml. Override that with:

export YETI_CONFIG=/absolute/path/to/config.yaml

Health check:

curl -s http://127.0.0.1:8000/api/v1/health

Inspect:

curl -s -X POST \
  -F image=@path/to/image.jpg \
  -F run_liveness=false \
  http://127.0.0.1:8000/api/v1/inspect

Onboard:

curl -s -X POST \
  -F image=@path/to/alice.jpg \
  -F subject_id=alice \
  -F 'metadata={"team":"demo"}' \
  http://127.0.0.1:8000/api/v1/onboard

Identify:

curl -s -X POST \
  -F image=@path/to/query.jpg \
  -F top_k=3 \
  -F run_liveness=true \
  http://127.0.0.1:8000/api/v1/identify

Compare:

curl -s -X POST \
  -F source_image=@path/to/source.jpg \
  -F target_image=@path/to/target.jpg \
  -F run_liveness=false \
  http://127.0.0.1:8000/api/v1/compare

List enrolled subjects:

curl -s http://127.0.0.1:8000/api/v1/subjects

Web app

Start the frontend:

cd apps/web
npm run dev

Open http://localhost:3000.

If the API is not running on http://127.0.0.1:8000, set:

export NEXT_PUBLIC_YETI_API_BASE=http://127.0.0.1:8000

The web app is a camera-first Next.js app that lets you:

inspect a live frame or uploaded image
capture burst frames for enrollment
identify a live or uploaded face
compare two images side by side
browse enrolled subject records

Configuration Model

All engine behavior comes from examples/config.yaml. The main sections are:

runtime: execution providers and runtime logging
models: local ONNX paths
detection: detector thresholds, input size, face-selection rule
quality: gating thresholds for image usability
liveness: whether liveness runs by default and what spoof threshold is used
alignment: aligned crop size, fixed at 112
embedding: embedding dimension and normalization behavior
matcher: similarity backend, threshold, and top_k
storage: local template and metadata persistence paths

The default matcher uses cosine similarity over normalized embeddings and stores data locally in:

.yeti-store/templates.npz
.yeti-store/metadata.json

How embeddings are stored

The current matcher is LocalNumpyMatcher, which keeps enrollment data in two files:

.yeti-store/templates.npz: a numpy archive containing one embeddings array
.yeti-store/metadata.json: subject records, metadata, and storage version

The embeddings array is stored as float32 with shape:

(N, 512)

Where:

N is the number of successful onboarded templates
512 is the embedding dimension produced by glintr100.onnx

Each successful onboard call appends one row to that array and one matching record to metadata.json. The matcher validates on load that the number of embedding rows matches the number of metadata records.

How cosine similarity is calculated

Embeddings are normalized during extraction when embedding.normalize: true, which is the default config behavior.

That means each embedding vector is scaled to unit length:

v_normalized = v / ||v||

Once vectors are unit-normalized, cosine similarity becomes the same as a dot product:

cosine(a, b) = a · b

That is exactly how matching is implemented:

identification scores all enrolled templates with self._embeddings @ query
pairwise comparison scores two images with source @ target

This is efficient because the enrolled embeddings are already stacked in one numpy matrix, so one query can be scored against the full gallery in a single matrix-vector multiply.

The match decision is then:

sort candidates by descending score
take the top k
accept only if the best score is greater than or equal to matcher.threshold

How The Pipeline Works

1. Detection

Yeti uses YuNet through OpenCV FaceDetectorYN. For each detected face it returns:

bounding box
confidence score
five landmarks: left eye, right eye, nose, left mouth corner, right mouth corner

If multiple faces are found, the pipeline keeps one face for downstream processing based on detection.face_selection, which supports:

largest_face
highest_score

2. Quality Gate

Before Yeti spends compute on embedding or matching, it checks whether the face is usable. The quality gate measures:

face size
brightness
sharpness
inter-eye distance
yaw
pitch
roll

It can reject captures for reasons such as:

face_too_small
underexposed
overexposed
blurry
eyes_too_close
yaw_too_large
pitch_too_large
roll_too_large

This is a practical FR lesson: most recognition errors start with poor input quality, not with the matcher.

3. Liveness

Yeti can optionally run anti-spoofing with 2.7_80x80_MiniFASNetV2.onnx. That model is downloaded by the manifest and used to distinguish a real face from a replay or print attack.

Current implementation details:

the liveness crop is larger than the tight face crop
input is converted from BGR to RGB
the crop is resized to 80x80
pixel values are normalized to [0, 1]
class index 2 is treated as the live class

The engine reports:

spoof_score
passed
label as live or spoof

Liveness is disabled by default in the example config for inspect, onboard, and identify, but it can be enabled per request or via config.

4. Alignment

Recognition models are sensitive to pose. Yeti aligns each accepted face to a standard 112x112 view using the five landmarks and an affine warp. This is what makes embeddings from two different captures comparable.

Conceptually, alignment reduces variation from:

head rotation
scale
translation
small pose differences

Without alignment, even a strong embedding model becomes much less stable.

5. Embedding

Yeti uses glintr100.onnx to convert the aligned face crop into a 512-dimensional embedding vector.

This is the core FR idea: the model does not classify people directly. Instead, it maps each face into a vector space where:

same-person vectors should be close together
different-person vectors should be farther apart

The implementation normalizes embeddings so cosine similarity can be used directly as a dot product.

6. Matching

The local matcher stores embeddings in numpy format and compares a query vector against all stored templates.

Two matching modes exist:

identification: one query against the enrolled gallery
verification/comparison: one image against one other image

For identification, Yeti:

scores the query against all stored embeddings
sorts by descending score
returns the top k candidates
accepts the match only if the best score meets matcher.threshold

Concepts You Need To Master regarding FR

If you want to understand face recognition beyond “run the model,” these are the core concepts this repo exposes directly.

Detection is not recognition

Finding a face in an image is a different task from telling whose face it is. A system can detect well and still recognize poorly.

Landmarks are structural anchors

The five landmark points are not just visualization data. They drive pose estimation, quality checks, and alignment.

Input quality dominates outcomes

Recognition systems degrade quickly on:

tiny faces
blur
harsh brightness
strong yaw, pitch, or roll
low inter-eye resolution

Good gating often improves production behavior more than lowering a match threshold.

Liveness and recognition solve different problems

A recognition embedding is built to ignore nuisance factors and preserve identity. A liveness model is built to notice texture and presentation artifacts. One does not replace the other.

Embeddings are metric-space representations

Modern FR systems usually do not identify users by classification at inference time. They compare embeddings by similarity. That is why threshold tuning matters.

Thresholds define the operating point

The matcher threshold controls the balance between false accepts and false rejects. Lower thresholds are more permissive; higher thresholds are stricter. There is no universally correct value without evaluation data.

Enrollment quality matters as much as query quality

If you enroll bad templates, the gallery becomes noisy. A recognition system cannot recover from poor enrollment examples.

Identification and verification are different tasks

verification asks: “do these two images belong to the same person?”
identification asks: “which enrolled person is this?”

The risk profile is different because identification compares against a full gallery, not a single claimed identity.

Local storage design affects product behavior

This repo stores one embedding per successful onboard call. That keeps the design simple, but it means:

duplicate enrollments create multiple templates
no clustering or template consolidation happens automatically
metadata is descriptive only; it is not used for scoring

Testing

Run Python tests:

uv run pytest

Run frontend checks:

cd apps/web
npm run typecheck
npm run build

Troubleshooting

If the models directory is empty, rerun uv run yeti-download-models --manifest examples/model-manifest.yaml --output-dir models.
If the detector fails to initialize, verify that models/face_detection_yunet_2023mar.onnx is a real ONNX model file and not a bad download.
If glintr100.onnx fails to load, verify the file is complete and not truncated.
If every liveness result looks wrong, check the chosen model file, the configured crop contract, and whether you actually enabled liveness for that request.
If the web app cannot reach the API, confirm the API is serving at http://127.0.0.1:8000 or set NEXT_PUBLIC_YETI_API_BASE.
If onboarding appears to work but later identification finds nothing, inspect .yeti-store/templates.npz and .yeti-store/metadata.json to confirm templates were written where you expect.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
apps		apps
examples		examples
models		models
scripts		scripts
src/yeti		src/yeti
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Yeti

What This Project Does

Repo Layout

Stack

Requirements

Setup

Install uv

uv project basics

Python engine and API with uv

Web app

How To Run

CLI

API

Web app

Configuration Model

How embeddings are stored

How cosine similarity is calculated

How The Pipeline Works

1. Detection

2. Quality Gate

3. Liveness

4. Alignment

5. Embedding

6. Matching

Concepts You Need To Master regarding FR

Detection is not recognition

Landmarks are structural anchors

Input quality dominates outcomes

Liveness and recognition solve different problems

Embeddings are metric-space representations

Thresholds define the operating point

Enrollment quality matters as much as query quality

Identification and verification are different tasks

Local storage design affects product behavior

Testing

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Install `uv`

`uv` project basics

Python engine and API with `uv`

Packages