Face Quality and Adaptive Density-Aware Face Clustering#1290
Face Quality and Adaptive Density-Aware Face Clustering#1290rohan-pandeyy wants to merge 6 commits into
Conversation
…ize clustering parameters
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughCentralizes clustering config, adds a pre-embedding face quality gate, implements adaptive eps estimation, propagates skipped-face counts through clustering utilities and API, and surfaces skipped counts in the frontend via a global alert. ChangesAdaptive Clustering and Quality Gating
Sequence Diagram(s) sequenceDiagram
participant FaceDetector
participant QualityGate as face_passes_quality_gate
participant EmbeddingSvc as FaceNetEmbedding
participant EpsEstimator as estimate_eps
participant DBSCAN
participant Assigner as ClusterAssigner
participant API as trigger_global_reclustering
participant Client
FaceDetector->>QualityGate: crop, bbox, conf_score
QualityGate-->>FaceDetector: pass / fail (increment faces_skipped if fail)
FaceDetector->>EmbeddingSvc: preprocess face crop -> embedding
EmbeddingSvc->>EpsEstimator: embeddings (k = min_samples)
EpsEstimator-->>DBSCAN: eps (or None -> use config)
DBSCAN->>Assigner: cluster labels
Assigner-->>API: (clusters_created, total_faces_skipped)
API-->>Client: GlobalReclusterResponse{clusters_created, faces_skipped}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/utils/face_clusters.py (1)
232-233:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix tuple contract on empty-input return path.
At Line 233,
cluster_util_cluster_all_face_embeddingsreturns[], butcluster_util_face_clusters_sync(Line 112) unpacks two values. This raises unpacking errors when there are no faces.Suggested fix
if not faces_data: - return [] + return [], 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/utils/face_clusters.py` around lines 232 - 233, The early-return in cluster_util_cluster_all_face_embeddings currently returns [] but cluster_util_face_clusters_sync expects two values; change the empty-input return to match the tuple contract (e.g., return an empty list and an empty dict) so callers can safely unpack: update cluster_util_cluster_all_face_embeddings to return ([], {}) when faces_data is falsy, ensuring the shape matches what cluster_util_face_clusters_sync unpacks.
🧹 Nitpick comments (1)
backend/tests/test_face_clusters.py (1)
468-614: ⚡ Quick winAdd an empty-dataset regression test for the new tuple return contract.
Please add a test that patches
db_get_all_faces_with_cluster_namesto[]and assertscluster_util_cluster_all_face_embeddings()returns([], 0). This would catch the current unpacking failure path early.As per coding guidelines "Verify that all critical functionality is covered by tests".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/tests/test_face_clusters.py` around lines 468 - 614, Add a unit test that patches db_get_all_faces_with_cluster_names to return an empty list and asserts cluster_util_cluster_all_face_embeddings() returns ([], 0); specifically, create a new test method (e.g., test_empty_dataset_returns_empty_and_zero) in TestFaceClusteringAlgo, use `@patch`("app.utils.face_clusters.db_get_all_faces_with_cluster_names") to set mock_db_get.return_value = [], call cluster_util_cluster_all_face_embeddings() with default args, and assert the returned tuple equals ([], 0) to prevent the current unpacking failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/config/settings.py`:
- Around line 40-56: The clustering env var parsing (PICTO_CLUSTERING_EPS,
PICTO_CLUSTERING_MIN_SAMPLES, PICTO_CLUSTERING_SIMILARITY_THRESHOLD,
PICTO_CLUSTERING_MERGE_THRESHOLD, PICTO_CLUSTERING_CONF_THRESHOLD,
PICTO_CLUSTERING_BLUR_THRESHOLD, PICTO_CLUSTERING_MIN_FACE_SIZE) needs robust
validation: replace the direct int/float casts with guarded parsing (try/except)
or helper functions that return defaults on parse failure, then clamp numeric
results to safe ranges (e.g., eps/similarity/merge/conf in [0.0,1.0],
min_samples >= 1, blur_threshold >= 0, min_face_size >= 1) and log or raise a
clear configuration error if values are out-of-bounds; update the constants
initialization to use these validators so malformed env vars won't crash startup
or produce invalid clustering behavior.
In `@backend/app/utils/face_quality.py`:
- Around line 27-35: In face_passes_quality_gate(), add an early-return guard
for empty/invalid face crops before the grayscale conversion: check that
face_crop is not None and face_crop.size > 0 (and that shape dimensions are
valid) right before the existing "if len(face_crop.shape) == 3:" block; if the
crop is empty or invalid, return False to avoid calling cv2.cvtColor /
cv2.Laplacian and crashing (this protects the variance = cv2.Laplacian(...) line
that uses blur_threshold).
In `@frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx`:
- Line 45: The alert message currently states that skipped faces were due to
"invalid embeddings" but facesSkipped includes other filters (e.g.,
quality-gated faces); update the text in ApplicationControlsCard (where
facesSkipped is used) to reflect the broader reason (e.g., "were skipped during
clustering due to filtering or invalid embeddings" or "due to filtering (e.g.,
low quality) or invalid embeddings") so the alert is accurate and not
misleading.
---
Outside diff comments:
In `@backend/app/utils/face_clusters.py`:
- Around line 232-233: The early-return in
cluster_util_cluster_all_face_embeddings currently returns [] but
cluster_util_face_clusters_sync expects two values; change the empty-input
return to match the tuple contract (e.g., return an empty list and an empty
dict) so callers can safely unpack: update
cluster_util_cluster_all_face_embeddings to return ([], {}) when faces_data is
falsy, ensuring the shape matches what cluster_util_face_clusters_sync unpacks.
---
Nitpick comments:
In `@backend/tests/test_face_clusters.py`:
- Around line 468-614: Add a unit test that patches
db_get_all_faces_with_cluster_names to return an empty list and asserts
cluster_util_cluster_all_face_embeddings() returns ([], 0); specifically, create
a new test method (e.g., test_empty_dataset_returns_empty_and_zero) in
TestFaceClusteringAlgo, use
`@patch`("app.utils.face_clusters.db_get_all_faces_with_cluster_names") to set
mock_db_get.return_value = [], call cluster_util_cluster_all_face_embeddings()
with default args, and assert the returned tuple equals ([], 0) to prevent the
current unpacking failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 07fa7eb3-5305-4d3e-ab5e-e93fb9da28fd
📒 Files selected for processing (15)
backend/app/config/settings.pybackend/app/models/FaceDetector.pybackend/app/routes/face_clusters.pybackend/app/schemas/face_clusters.pybackend/app/utils/face_clusters.pybackend/app/utils/face_quality.pybackend/app/utils/images.pybackend/tests/test_face_clusters.pydocs/backend/backend_python/openapi.jsonfrontend/src/App.tsxfrontend/src/api/api-functions/face_clusters.tsfrontend/src/app/store.tsfrontend/src/components/GlobalAlert/GlobalAlert.tsxfrontend/src/features/globalAlertSlice.tsfrontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/config/settings.py`:
- Around line 115-117: The PICTO_CLUSTERING_MIN_SAMPLES configuration currently
allows a value of 1 which can reintroduce the chaining regression; update the
call to _get_env_int that defines PICTO_CLUSTERING_MIN_SAMPLES so its min_value
is 2 (i.e., change the min_value argument from 1 to 2) to enforce min_samples >=
2 and prevent the chaining issue.
In `@backend/tests/test_face_clusters.py`:
- Around line 615-627: The test currently uses bbox=(0, 0, 0, 0) so the
min-face-size check triggers before empty-crop handling; change the bbox to a
non-zero, sufficiently large box (for example bbox=(0, 0, 500, 500)) so the
min_face_size gate won't fail first and the call to
face_passes_quality_gate(face_crop=empty_crop, bbox=..., ...) will exercise the
empty-crop branch; keep the other args (conf_score, conf_threshold,
blur_threshold, min_face_size) the same and assert the result is False.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c21ef6ee-8549-43b7-b361-5a8946270f3e
📒 Files selected for processing (3)
backend/app/config/settings.pybackend/app/utils/face_quality.pybackend/tests/test_face_clusters.py
Fixes #1271
Summary
This pull request introduces a comprehensive quality gate for face detection and clustering, making the clustering process more robust by filtering out low-quality faces based on configurable thresholds. It also tracks and reports the number of faces skipped due to these quality checks throughout the pipeline, from detection to clustering and API responses. Additionally, the clustering logic now adapts the DBSCAN
epsparameter based on the data distribution for improved clustering accuracy.Key improvements and features:
Face Quality Filtering
face_passes_quality_gateutility inface_quality.pyto filter detected faces based on confidence, size, and blur, using configurable thresholds (PICTO_CLUSTERING_CONF_THRESHOLD,PICTO_CLUSTERING_BLUR_THRESHOLD,PICTO_CLUSTERING_MIN_FACE_SIZE). This is now integrated into the face detection pipeline, ensuring only high-quality faces are processed. [1] [2] [3] [4]Configurability
settings.py(e.g.,PICTO_CLUSTERING_EPS,PICTO_CLUSTERING_SIMILARITY_THRESHOLD, etc.) to allow runtime tuning of clustering and filtering behavior.Clustering Logic Enhancements
epsparameter based on the data using a newestimate_epsfunction, leading to more data-driven clustering. [1] [2]API and Schema Updates
/reclusterendpoint and corresponding response schema now report the number of faces skipped during clustering, giving users visibility into data quality issues. [1] [2]Pipeline Integration
These changes collectively improve clustering reliability, transparency, and configurability, and provide better diagnostics for skipped/filtered faces.
AI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Check one of the checkboxes below:
I have used the following AI models and tools: TODO
Checklist
Summary by CodeRabbit
New Features
Tests
Docs