Skip to content

Face Quality and Adaptive Density-Aware Face Clustering#1290

Open
rohan-pandeyy wants to merge 6 commits into
AOSSIE-Org:mainfrom
rohan-pandeyy:feat/adaptive-density-aware-face-clustering
Open

Face Quality and Adaptive Density-Aware Face Clustering#1290
rohan-pandeyy wants to merge 6 commits into
AOSSIE-Org:mainfrom
rohan-pandeyy:feat/adaptive-density-aware-face-clustering

Conversation

@rohan-pandeyy
Copy link
Copy Markdown
Contributor

@rohan-pandeyy rohan-pandeyy commented May 25, 2026

Fixes #1271

Summary

This pull request introduces a comprehensive quality gate for face detection and clustering, making the clustering process more robust by filtering out low-quality faces based on configurable thresholds. It also tracks and reports the number of faces skipped due to these quality checks throughout the pipeline, from detection to clustering and API responses. Additionally, the clustering logic now adapts the DBSCAN eps parameter based on the data distribution for improved clustering accuracy.

Key improvements and features:

Face Quality Filtering

  • Added a face_passes_quality_gate utility in face_quality.py to filter detected faces based on confidence, size, and blur, using configurable thresholds (PICTO_CLUSTERING_CONF_THRESHOLD, PICTO_CLUSTERING_BLUR_THRESHOLD, PICTO_CLUSTERING_MIN_FACE_SIZE). This is now integrated into the face detection pipeline, ensuring only high-quality faces are processed. [1] [2] [3] [4]

Configurability

  • Introduced new clustering and quality thresholds in settings.py (e.g., PICTO_CLUSTERING_EPS, PICTO_CLUSTERING_SIMILARITY_THRESHOLD, etc.) to allow runtime tuning of clustering and filtering behavior.

Clustering Logic Enhancements

  • The clustering function now adaptively estimates the DBSCAN eps parameter based on the data using a new estimate_eps function, leading to more data-driven clustering. [1] [2]
  • All clustering and cluster assignment functions now return the number of faces skipped due to quality or embedding validation, and this is propagated through the pipeline. [1] [2] [3] [4] [5] [6] [7]

API and Schema Updates

  • The /recluster endpoint and corresponding response schema now report the number of faces skipped during clustering, giving users visibility into data quality issues. [1] [2]

Pipeline Integration

  • The image processing pipeline and face detection now accumulate and return the total number of faces skipped due to quality checks, ensuring accurate reporting and traceability. [1] [2] [3]

These changes collectively improve clustering reliability, transparency, and configurability, and provide better diagnostics for skipped/filtered faces.

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: TODO

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • New Features

    • Face detection adds automated quality checks and per-face skip counting.
    • Reclustering now reports skipped faces alongside clusters created.
    • Clustering gains adaptive parameter estimation and configurable clustering thresholds.
    • New global alert UI notifies when reclustering skips faces.
  • Tests

    • Unit tests added for clustering behavior, adaptive estimation, and face-quality checks.
  • Docs

    • OpenAPI schema updated to include skipped-face info in recluster responses.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a72e8e0-d0a4-472e-8810-9e2f61f842b5

📥 Commits

Reviewing files that changed from the base of the PR and between 7ad239f and 31b5b59.

📒 Files selected for processing (2)
  • backend/app/config/settings.py
  • backend/tests/test_face_clusters.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/tests/test_face_clusters.py

Walkthrough

Centralizes clustering config, adds a pre-embedding face quality gate, implements adaptive eps estimation, propagates skipped-face counts through clustering utilities and API, and surfaces skipped counts in the frontend via a global alert.

Changes

Adaptive Clustering and Quality Gating

Layer / File(s) Summary
Clustering Configuration Foundation
backend/app/config/settings.py
Introduces _get_env_float/_get_env_int helpers and module-level PICTO_CLUSTERING_* constants populated from env vars with defaults and bounds.
Face Quality Gate & Detection Integration
backend/app/utils/face_quality.py, backend/app/models/FaceDetector.py, backend/app/utils/images.py
Adds face_passes_quality_gate (confidence, bbox size, Laplacian blur) and integrates it into FaceDetector; detection path now counts and returns faces_skipped. Images utility accumulates skipped counts.
Adaptive EPS and Clustering Utilities
backend/app/utils/face_clusters.py
Adds estimate_eps() using k‑NN distances; updates clustering functions to validate embeddings, propagate total_faces_skipped, return (results, total_faces_skipped), and use config merge threshold when needed; adaptive eps falls back to config when insufficient data.
API Response Contract Extension
backend/app/schemas/face_clusters.py, backend/app/routes/face_clusters.py, docs/backend/backend_python/openapi.json
Extends GlobalReclusterData with optional faces_skipped. Route unpacks (result, total_faces_skipped) and populates both clusters_created and faces_skipped in responses. OpenAPI updated (schema relocation/reorder observed).
Frontend Type Safety and Redux State
frontend/src/api/api-functions/face_clusters.ts, frontend/src/features/globalAlertSlice.ts, frontend/src/app/store.ts
Adds GlobalReclusterData TS interface and typed API return; introduces globalAlert slice and wires its reducer into the store.
Global Alert UI and Integration
frontend/src/components/GlobalAlert/GlobalAlert.tsx, frontend/src/App.tsx, frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx
Adds GlobalAlert component (auto-dismiss, manual close), mounts it in App, and dispatches showGlobalAlert from Settings page when faces_skipped > 0.
Clustering and Quality Gate Tests
backend/tests/test_face_clusters.py
Adds synthetic embedding generators, mock DB fixtures, and tests verifying clustering regression, adaptive- eps stability/fallback, and quality-gate pass/fail cases.

Sequence Diagram(s)

sequenceDiagram
  participant FaceDetector
  participant QualityGate as face_passes_quality_gate
  participant EmbeddingSvc as FaceNetEmbedding
  participant EpsEstimator as estimate_eps
  participant DBSCAN
  participant Assigner as ClusterAssigner
  participant API as trigger_global_reclustering
  participant Client

  FaceDetector->>QualityGate: crop, bbox, conf_score
  QualityGate-->>FaceDetector: pass / fail (increment faces_skipped if fail)
  FaceDetector->>EmbeddingSvc: preprocess face crop -> embedding
  EmbeddingSvc->>EpsEstimator: embeddings (k = min_samples)
  EpsEstimator-->>DBSCAN: eps (or None -> use config)
  DBSCAN->>Assigner: cluster labels
  Assigner-->>API: (clusters_created, total_faces_skipped)
  API-->>Client: GlobalReclusterResponse{clusters_created, faces_skipped}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AOSSIE-Org/PictoPy#560: Related global reclustering endpoint changes; this PR extends the response to include faces_skipped.
  • AOSSIE-Org/PictoPy#771: Overlaps clustering logic changes in backend/app/utils/face_clusters.py.

Suggested labels

Python, TypeScript/JavaScript, Documentation

Suggested reviewers

  • rahulharpal1603

Poem

🐰 I hop through embeddings late at night,
Skipping the blurred and the tiny in sight,
DBSCAN learns the local beat,
Clusters settle tidy and neat,
Alerts chirp gently — count the faces skipped right.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Face Quality and Adaptive Density-Aware Face Clustering' clearly summarizes the main changes: introduction of a face quality gate and adaptive DBSCAN eps estimation, which are the primary objectives of the linked issue #1271.
Linked Issues check ✅ Passed All coding requirements from issue #1271 are met: face quality gate implemented with blur/size/confidence checks [face_quality.py], adaptive eps estimation via k-NN distances added [estimate_eps], clustering parameters centralized in settings.py with env-var overrides, faces_skipped counts propagated through backend pipeline and API responses, frontend TypeScript typing added, and UI toast alert implemented for skipped faces.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #1271 objectives: face quality gate, adaptive eps estimation, settings centralization, faces_skipped propagation, API schema/response updates, and frontend UI integration. The GlobalAlert component addition supports the required UI toast feature for reporting skipped faces.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/utils/face_clusters.py (1)

232-233: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix tuple contract on empty-input return path.

At Line 233, cluster_util_cluster_all_face_embeddings returns [], but cluster_util_face_clusters_sync (Line 112) unpacks two values. This raises unpacking errors when there are no faces.

Suggested fix
     if not faces_data:
-        return []
+        return [], 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/utils/face_clusters.py` around lines 232 - 233, The early-return
in cluster_util_cluster_all_face_embeddings currently returns [] but
cluster_util_face_clusters_sync expects two values; change the empty-input
return to match the tuple contract (e.g., return an empty list and an empty
dict) so callers can safely unpack: update
cluster_util_cluster_all_face_embeddings to return ([], {}) when faces_data is
falsy, ensuring the shape matches what cluster_util_face_clusters_sync unpacks.
🧹 Nitpick comments (1)
backend/tests/test_face_clusters.py (1)

468-614: ⚡ Quick win

Add an empty-dataset regression test for the new tuple return contract.

Please add a test that patches db_get_all_faces_with_cluster_names to [] and asserts cluster_util_cluster_all_face_embeddings() returns ([], 0). This would catch the current unpacking failure path early.

As per coding guidelines "Verify that all critical functionality is covered by tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/test_face_clusters.py` around lines 468 - 614, Add a unit test
that patches db_get_all_faces_with_cluster_names to return an empty list and
asserts cluster_util_cluster_all_face_embeddings() returns ([], 0);
specifically, create a new test method (e.g.,
test_empty_dataset_returns_empty_and_zero) in TestFaceClusteringAlgo, use
`@patch`("app.utils.face_clusters.db_get_all_faces_with_cluster_names") to set
mock_db_get.return_value = [], call cluster_util_cluster_all_face_embeddings()
with default args, and assert the returned tuple equals ([], 0) to prevent the
current unpacking failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/config/settings.py`:
- Around line 40-56: The clustering env var parsing (PICTO_CLUSTERING_EPS,
PICTO_CLUSTERING_MIN_SAMPLES, PICTO_CLUSTERING_SIMILARITY_THRESHOLD,
PICTO_CLUSTERING_MERGE_THRESHOLD, PICTO_CLUSTERING_CONF_THRESHOLD,
PICTO_CLUSTERING_BLUR_THRESHOLD, PICTO_CLUSTERING_MIN_FACE_SIZE) needs robust
validation: replace the direct int/float casts with guarded parsing (try/except)
or helper functions that return defaults on parse failure, then clamp numeric
results to safe ranges (e.g., eps/similarity/merge/conf in [0.0,1.0],
min_samples >= 1, blur_threshold >= 0, min_face_size >= 1) and log or raise a
clear configuration error if values are out-of-bounds; update the constants
initialization to use these validators so malformed env vars won't crash startup
or produce invalid clustering behavior.

In `@backend/app/utils/face_quality.py`:
- Around line 27-35: In face_passes_quality_gate(), add an early-return guard
for empty/invalid face crops before the grayscale conversion: check that
face_crop is not None and face_crop.size > 0 (and that shape dimensions are
valid) right before the existing "if len(face_crop.shape) == 3:" block; if the
crop is empty or invalid, return False to avoid calling cv2.cvtColor /
cv2.Laplacian and crashing (this protects the variance = cv2.Laplacian(...) line
that uses blur_threshold).

In `@frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx`:
- Line 45: The alert message currently states that skipped faces were due to
"invalid embeddings" but facesSkipped includes other filters (e.g.,
quality-gated faces); update the text in ApplicationControlsCard (where
facesSkipped is used) to reflect the broader reason (e.g., "were skipped during
clustering due to filtering or invalid embeddings" or "due to filtering (e.g.,
low quality) or invalid embeddings") so the alert is accurate and not
misleading.

---

Outside diff comments:
In `@backend/app/utils/face_clusters.py`:
- Around line 232-233: The early-return in
cluster_util_cluster_all_face_embeddings currently returns [] but
cluster_util_face_clusters_sync expects two values; change the empty-input
return to match the tuple contract (e.g., return an empty list and an empty
dict) so callers can safely unpack: update
cluster_util_cluster_all_face_embeddings to return ([], {}) when faces_data is
falsy, ensuring the shape matches what cluster_util_face_clusters_sync unpacks.

---

Nitpick comments:
In `@backend/tests/test_face_clusters.py`:
- Around line 468-614: Add a unit test that patches
db_get_all_faces_with_cluster_names to return an empty list and asserts
cluster_util_cluster_all_face_embeddings() returns ([], 0); specifically, create
a new test method (e.g., test_empty_dataset_returns_empty_and_zero) in
TestFaceClusteringAlgo, use
`@patch`("app.utils.face_clusters.db_get_all_faces_with_cluster_names") to set
mock_db_get.return_value = [], call cluster_util_cluster_all_face_embeddings()
with default args, and assert the returned tuple equals ([], 0) to prevent the
current unpacking failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07fa7eb3-5305-4d3e-ab5e-e93fb9da28fd

📥 Commits

Reviewing files that changed from the base of the PR and between 0528751 and e11cc68.

📒 Files selected for processing (15)
  • backend/app/config/settings.py
  • backend/app/models/FaceDetector.py
  • backend/app/routes/face_clusters.py
  • backend/app/schemas/face_clusters.py
  • backend/app/utils/face_clusters.py
  • backend/app/utils/face_quality.py
  • backend/app/utils/images.py
  • backend/tests/test_face_clusters.py
  • docs/backend/backend_python/openapi.json
  • frontend/src/App.tsx
  • frontend/src/api/api-functions/face_clusters.ts
  • frontend/src/app/store.ts
  • frontend/src/components/GlobalAlert/GlobalAlert.tsx
  • frontend/src/features/globalAlertSlice.ts
  • frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx

Comment thread backend/app/config/settings.py Outdated
Comment thread backend/app/utils/face_quality.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/config/settings.py`:
- Around line 115-117: The PICTO_CLUSTERING_MIN_SAMPLES configuration currently
allows a value of 1 which can reintroduce the chaining regression; update the
call to _get_env_int that defines PICTO_CLUSTERING_MIN_SAMPLES so its min_value
is 2 (i.e., change the min_value argument from 1 to 2) to enforce min_samples >=
2 and prevent the chaining issue.

In `@backend/tests/test_face_clusters.py`:
- Around line 615-627: The test currently uses bbox=(0, 0, 0, 0) so the
min-face-size check triggers before empty-crop handling; change the bbox to a
non-zero, sufficiently large box (for example bbox=(0, 0, 500, 500)) so the
min_face_size gate won't fail first and the call to
face_passes_quality_gate(face_crop=empty_crop, bbox=..., ...) will exercise the
empty-crop branch; keep the other args (conf_score, conf_threshold,
blur_threshold, min_face_size) the same and assert the result is False.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c21ef6ee-8549-43b7-b361-5a8946270f3e

📥 Commits

Reviewing files that changed from the base of the PR and between e11cc68 and 7ad239f.

📒 Files selected for processing (3)
  • backend/app/config/settings.py
  • backend/app/utils/face_quality.py
  • backend/tests/test_face_clusters.py

Comment thread backend/app/config/settings.py
Comment thread backend/tests/test_face_clusters.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adaptive Density-Aware Face Clustering

1 participant