Face Quality and Adaptive Density-Aware Face Clustering by rohan-pandeyy · Pull Request #1290 · AOSSIE-Org/PictoPy

rohan-pandeyy · 2026-05-25T11:24:22Z

Summary

This pull request introduces a comprehensive quality gate for face detection and clustering, making the clustering process more robust by filtering out low-quality faces based on configurable thresholds. It also tracks and reports the number of faces skipped due to these quality checks throughout the pipeline, from detection to clustering and API responses. Additionally, the clustering logic now adapts the DBSCAN eps parameter based on the data distribution for improved clustering accuracy.

Key improvements and features:

Face Quality Filtering

Added a face_passes_quality_gate utility in face_quality.py to filter detected faces based on confidence, size, and blur, using configurable thresholds (PICTO_CLUSTERING_CONF_THRESHOLD, PICTO_CLUSTERING_BLUR_THRESHOLD, PICTO_CLUSTERING_MIN_FACE_SIZE). This is now integrated into the face detection pipeline, ensuring only high-quality faces are processed. [1] [2] [3] [4]

Configurability

Introduced new clustering and quality thresholds in settings.py (e.g., PICTO_CLUSTERING_EPS, PICTO_CLUSTERING_SIMILARITY_THRESHOLD, etc.) to allow runtime tuning of clustering and filtering behavior.

Clustering Logic Enhancements

The clustering function now adaptively estimates the DBSCAN eps parameter based on the data using a new estimate_eps function, leading to more data-driven clustering. [1] [2]
All clustering and cluster assignment functions now return the number of faces skipped due to quality or embedding validation, and this is propagated through the pipeline. [1] [2] [3] [4] [5] [6] [7]

API and Schema Updates

The /recluster endpoint and corresponding response schema now report the number of faces skipped during clustering, giving users visibility into data quality issues. [1] [2]

Pipeline Integration

The image processing pipeline and face detection now accumulate and return the total number of faces skipped due to quality checks, ensuring accurate reporting and traceability. [1] [2] [3]

These changes collectively improve clustering reliability, transparency, and configurability, and provide better diagnostics for skipped/filtered faces.

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

This PR does not contain AI-generated code at all.
This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: TODO

Checklist

My PR addresses a single issue, fixes a single bug or makes a single improvement.
My code follows the project's code style and conventions
If applicable, I have made corresponding changes or additions to the documentation
If applicable, I have made corresponding changes or additions to tests
My changes generate no new warnings or errors
I have joined the Discord server and I will share a link to this PR with the project maintainers there
I have read the Contribution Guidelines
Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

New Features
- Face detection adds automated quality checks and per-face skip counting.
- Reclustering now reports skipped faces alongside clusters created.
- Clustering gains adaptive parameter estimation and configurable clustering thresholds.
- New global alert UI notifies when reclustering skips faces.
Tests
- Unit tests added for clustering behavior, adaptive estimation, and face-quality checks.
Docs
- OpenAPI schema updated to include skipped-face info in recluster responses.

…ize clustering parameters

…ontend

…ps stability

coderabbitai · 2026-05-25T11:24:35Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a72e8e0-d0a4-472e-8810-9e2f61f842b5

📥 Commits

Reviewing files that changed from the base of the PR and between 7ad239f and 31b5b59.

📒 Files selected for processing (2)

backend/app/config/settings.py
backend/tests/test_face_clusters.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/tests/test_face_clusters.py

Walkthrough

Centralizes clustering config, adds a pre-embedding face quality gate, implements adaptive eps estimation, propagates skipped-face counts through clustering utilities and API, and surfaces skipped counts in the frontend via a global alert.

Changes

Adaptive Clustering and Quality Gating

Layer / File(s)	Summary
Clustering Configuration Foundation `backend/app/config/settings.py`	Introduces `_get_env_float`/`_get_env_int` helpers and module-level `PICTO_CLUSTERING_*` constants populated from env vars with defaults and bounds.
Face Quality Gate & Detection Integration `backend/app/utils/face_quality.py`, `backend/app/models/FaceDetector.py`, `backend/app/utils/images.py`	Adds `face_passes_quality_gate` (confidence, bbox size, Laplacian blur) and integrates it into FaceDetector; detection path now counts and returns `faces_skipped`. Images utility accumulates skipped counts.
Adaptive EPS and Clustering Utilities `backend/app/utils/face_clusters.py`	Adds `estimate_eps()` using k‑NN distances; updates clustering functions to validate embeddings, propagate `total_faces_skipped`, return `(results, total_faces_skipped)`, and use config merge threshold when needed; adaptive eps falls back to config when insufficient data.
API Response Contract Extension `backend/app/schemas/face_clusters.py`, `backend/app/routes/face_clusters.py`, `docs/backend/backend_python/openapi.json`	Extends `GlobalReclusterData` with optional `faces_skipped`. Route unpacks `(result, total_faces_skipped)` and populates both `clusters_created` and `faces_skipped` in responses. OpenAPI updated (schema relocation/reorder observed).
Frontend Type Safety and Redux State `frontend/src/api/api-functions/face_clusters.ts`, `frontend/src/features/globalAlertSlice.ts`, `frontend/src/app/store.ts`	Adds `GlobalReclusterData` TS interface and typed API return; introduces `globalAlert` slice and wires its reducer into the store.
Global Alert UI and Integration `frontend/src/components/GlobalAlert/GlobalAlert.tsx`, `frontend/src/App.tsx`, `frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx`	Adds `GlobalAlert` component (auto-dismiss, manual close), mounts it in App, and dispatches `showGlobalAlert` from Settings page when `faces_skipped > 0`.
Clustering and Quality Gate Tests `backend/tests/test_face_clusters.py`	Adds synthetic embedding generators, mock DB fixtures, and tests verifying clustering regression, adaptive- eps stability/fallback, and quality-gate pass/fail cases.

Sequence Diagram(s)

sequenceDiagram
  participant FaceDetector
  participant QualityGate as face_passes_quality_gate
  participant EmbeddingSvc as FaceNetEmbedding
  participant EpsEstimator as estimate_eps
  participant DBSCAN
  participant Assigner as ClusterAssigner
  participant API as trigger_global_reclustering
  participant Client

  FaceDetector->>QualityGate: crop, bbox, conf_score
  QualityGate-->>FaceDetector: pass / fail (increment faces_skipped if fail)
  FaceDetector->>EmbeddingSvc: preprocess face crop -> embedding
  EmbeddingSvc->>EpsEstimator: embeddings (k = min_samples)
  EpsEstimator-->>DBSCAN: eps (or None -> use config)
  DBSCAN->>Assigner: cluster labels
  Assigner-->>API: (clusters_created, total_faces_skipped)
  API-->>Client: GlobalReclusterResponse{clusters_created, faces_skipped}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

AOSSIE-Org/PictoPy#560: Related global reclustering endpoint changes; this PR extends the response to include faces_skipped.
AOSSIE-Org/PictoPy#771: Overlaps clustering logic changes in backend/app/utils/face_clusters.py.

Suggested labels

Python, TypeScript/JavaScript, Documentation

Suggested reviewers

rahulharpal1603

Poem

🐰 I hop through embeddings late at night,
Skipping the blurred and the tiny in sight,
DBSCAN learns the local beat,
Clusters settle tidy and neat,
Alerts chirp gently — count the faces skipped right.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Face Quality and Adaptive Density-Aware Face Clustering' clearly summarizes the main changes: introduction of a face quality gate and adaptive DBSCAN eps estimation, which are the primary objectives of the linked issue `#1271`.
Linked Issues check	✅ Passed	All coding requirements from issue `#1271` are met: face quality gate implemented with blur/size/confidence checks [face_quality.py], adaptive eps estimation via k-NN distances added [estimate_eps], clustering parameters centralized in settings.py with env-var overrides, faces_skipped counts propagated through backend pipeline and API responses, frontend TypeScript typing added, and UI toast alert implemented for skipped faces.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#1271` objectives: face quality gate, adaptive eps estimation, settings centralization, faces_skipped propagation, API schema/response updates, and frontend UI integration. The GlobalAlert component addition supports the required UI toast feature for reporting skipped faces.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/utils/face_clusters.py (1)
232-233: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix tuple contract on empty-input return path.

At Line 233, cluster_util_cluster_all_face_embeddings returns [], but cluster_util_face_clusters_sync (Line 112) unpacks two values. This raises unpacking errors when there are no faces.
Suggested fix
     if not faces_data:
-        return []
+        return [], 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/utils/face_clusters.py` around lines 232 - 233, The early-return
in cluster_util_cluster_all_face_embeddings currently returns [] but
cluster_util_face_clusters_sync expects two values; change the empty-input
return to match the tuple contract (e.g., return an empty list and an empty
dict) so callers can safely unpack: update
cluster_util_cluster_all_face_embeddings to return ([], {}) when faces_data is
falsy, ensuring the shape matches what cluster_util_face_clusters_sync unpacks.

🧹 Nitpick comments (1)

backend/tests/test_face_clusters.py (1)
468-614: ⚡ Quick win

Add an empty-dataset regression test for the new tuple return contract.

Please add a test that patches db_get_all_faces_with_cluster_names to [] and asserts cluster_util_cluster_all_face_embeddings() returns ([], 0). This would catch the current unpacking failure path early.

As per coding guidelines "Verify that all critical functionality is covered by tests".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/test_face_clusters.py` around lines 468 - 614, Add a unit test
that patches db_get_all_faces_with_cluster_names to return an empty list and
asserts cluster_util_cluster_all_face_embeddings() returns ([], 0);
specifically, create a new test method (e.g.,
test_empty_dataset_returns_empty_and_zero) in TestFaceClusteringAlgo, use
`@patch`("app.utils.face_clusters.db_get_all_faces_with_cluster_names") to set
mock_db_get.return_value = [], call cluster_util_cluster_all_face_embeddings()
with default args, and assert the returned tuple equals ([], 0) to prevent the
current unpacking failure.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/config/settings.py`:
- Around line 40-56: The clustering env var parsing (PICTO_CLUSTERING_EPS,
PICTO_CLUSTERING_MIN_SAMPLES, PICTO_CLUSTERING_SIMILARITY_THRESHOLD,
PICTO_CLUSTERING_MERGE_THRESHOLD, PICTO_CLUSTERING_CONF_THRESHOLD,
PICTO_CLUSTERING_BLUR_THRESHOLD, PICTO_CLUSTERING_MIN_FACE_SIZE) needs robust
validation: replace the direct int/float casts with guarded parsing (try/except)
or helper functions that return defaults on parse failure, then clamp numeric
results to safe ranges (e.g., eps/similarity/merge/conf in [0.0,1.0],
min_samples >= 1, blur_threshold >= 0, min_face_size >= 1) and log or raise a
clear configuration error if values are out-of-bounds; update the constants
initialization to use these validators so malformed env vars won't crash startup
or produce invalid clustering behavior.

In `@backend/app/utils/face_quality.py`:
- Around line 27-35: In face_passes_quality_gate(), add an early-return guard
for empty/invalid face crops before the grayscale conversion: check that
face_crop is not None and face_crop.size > 0 (and that shape dimensions are
valid) right before the existing "if len(face_crop.shape) == 3:" block; if the
crop is empty or invalid, return False to avoid calling cv2.cvtColor /
cv2.Laplacian and crashing (this protects the variance = cv2.Laplacian(...) line
that uses blur_threshold).

In `@frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx`:
- Line 45: The alert message currently states that skipped faces were due to
"invalid embeddings" but facesSkipped includes other filters (e.g.,
quality-gated faces); update the text in ApplicationControlsCard (where
facesSkipped is used) to reflect the broader reason (e.g., "were skipped during
clustering due to filtering or invalid embeddings" or "due to filtering (e.g.,
low quality) or invalid embeddings") so the alert is accurate and not
misleading.

---

Outside diff comments:
In `@backend/app/utils/face_clusters.py`:
- Around line 232-233: The early-return in
cluster_util_cluster_all_face_embeddings currently returns [] but
cluster_util_face_clusters_sync expects two values; change the empty-input
return to match the tuple contract (e.g., return an empty list and an empty
dict) so callers can safely unpack: update
cluster_util_cluster_all_face_embeddings to return ([], {}) when faces_data is
falsy, ensuring the shape matches what cluster_util_face_clusters_sync unpacks.

---

Nitpick comments:
In `@backend/tests/test_face_clusters.py`:
- Around line 468-614: Add a unit test that patches
db_get_all_faces_with_cluster_names to return an empty list and asserts
cluster_util_cluster_all_face_embeddings() returns ([], 0); specifically, create
a new test method (e.g., test_empty_dataset_returns_empty_and_zero) in
TestFaceClusteringAlgo, use
`@patch`("app.utils.face_clusters.db_get_all_faces_with_cluster_names") to set
mock_db_get.return_value = [], call cluster_util_cluster_all_face_embeddings()
with default args, and assert the returned tuple equals ([], 0) to prevent the
current unpacking failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07fa7eb3-5305-4d3e-ab5e-e93fb9da28fd

📥 Commits

Reviewing files that changed from the base of the PR and between 0528751 and e11cc68.

📒 Files selected for processing (15)

backend/app/config/settings.py
backend/app/models/FaceDetector.py
backend/app/routes/face_clusters.py
backend/app/schemas/face_clusters.py
backend/app/utils/face_clusters.py
backend/app/utils/face_quality.py
backend/app/utils/images.py
backend/tests/test_face_clusters.py
docs/backend/backend_python/openapi.json
frontend/src/App.tsx
frontend/src/api/api-functions/face_clusters.ts
frontend/src/app/store.ts
frontend/src/components/GlobalAlert/GlobalAlert.tsx
frontend/src/features/globalAlertSlice.ts
frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx

…rameters

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/config/settings.py`:
- Around line 115-117: The PICTO_CLUSTERING_MIN_SAMPLES configuration currently
allows a value of 1 which can reintroduce the chaining regression; update the
call to _get_env_int that defines PICTO_CLUSTERING_MIN_SAMPLES so its min_value
is 2 (i.e., change the min_value argument from 1 to 2) to enforce min_samples >=
2 and prevent the chaining issue.

In `@backend/tests/test_face_clusters.py`:
- Around line 615-627: The test currently uses bbox=(0, 0, 0, 0) so the
min-face-size check triggers before empty-crop handling; change the bbox to a
non-zero, sufficiently large box (for example bbox=(0, 0, 500, 500)) so the
min_face_size gate won't fail first and the call to
face_passes_quality_gate(face_crop=empty_crop, bbox=..., ...) will exercise the
empty-crop branch; keep the other args (conf_score, conf_threshold,
blur_threshold, min_face_size) the same and assert the result is False.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c21ef6ee-8549-43b7-b361-5a8946270f3e

📥 Commits

Reviewing files that changed from the base of the PR and between e11cc68 and 7ad239f.

📒 Files selected for processing (3)

backend/app/config/settings.py
backend/app/utils/face_quality.py
backend/tests/test_face_clusters.py

rohan-pandeyy added 4 commits May 21, 2026 23:48

feat(backend): implement face quality parsing with DBSCAN and central…

bebafe6

…ize clustering parameters

feat(backend): Include skipped face clusters images logic to backend

fdccbbe

feat(frontend): add GlobalAlert and skipped face_clusters logic to fr…

3156996

…ontend

tests(backend): add algo check tests for quality check and adaptive e…

e11cc68

…ps stability

github-actions Bot added GSoC 2026 backend enhancement New feature or request frontend labels May 25, 2026

coderabbitai Bot requested changes May 25, 2026

View reviewed changes

Comment thread backend/app/config/settings.py Outdated

Comment thread backend/app/utils/face_quality.py

Comment thread frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx

feat: add face quality validation utility and configure clustering pa…

7ad239f

…rameters

coderabbitai Bot requested changes May 26, 2026

View reviewed changes

Comment thread backend/app/config/settings.py

Comment thread backend/tests/test_face_clusters.py

update face clustering configuration settings and endpoint tests

31b5b59

coderabbitai Bot approved these changes May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Face Quality and Adaptive Density-Aware Face Clustering#1290

Face Quality and Adaptive Density-Aware Face Clustering#1290
rohan-pandeyy wants to merge 6 commits into
AOSSIE-Org:mainfrom
rohan-pandeyy:feat/adaptive-density-aware-face-clustering

rohan-pandeyy commented May 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rohan-pandeyy commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Face Quality Filtering

Configurability

Clustering Logic Enhancements

API and Schema Updates

Pipeline Integration

AI Usage Disclosure:

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rohan-pandeyy commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading