Skip to content

Conversation

@atharrva01
Copy link

@atharrva01 atharrva01 commented Jan 16, 2026

Summary

This PR fixes a critical failure mode where a single missing or corrupt image could halt all AI image processing in PictoPy.
The root issue was an unchecked None return from the object classification pipeline, combined with batch-level failure behavior and missing recovery for orphaned database records.

With this change, AI tagging becomes resilient and self-healing, allowing processing to complete even when individual images are unreadable or removed from disk.


Impact

Before this fix:

  • One unreadable image could stop the entire AI tagging pipeline.
  • Remaining images in the batch were never processed.
  • The problematic image stayed permanently untagged in the database, causing repeated failures.
  • From the user’s perspective, AI tagging appeared to “never finish” with no visible error.

After this fix:

  • AI processing continues even if individual images fail.
  • Orphaned or corrupt images no longer permanently block the pipeline.
  • Face clustering reliably executes after image processing.
  • Failures are logged clearly and isolated to the affected image only.

This significantly improves reliability for users with large libraries, external storage, or frequent filesystem changes.


Steps to Reproduce (Before Fix)

  1. Add a folder containing many images (e.g. 100+).

  2. Enable AI tagging and let background processing start.

  3. While processing is running:

    • Delete or move one image from disk, or
    • Include a corrupt image file (e.g. 0-byte .jpg).
  4. Observe:

    • AI tagging stops partway through.
    • Remaining images are never processed.
    • Logs show a TypeError related to len(None).
    • Re-running AI tagging repeatedly fails on the same image.

Root Cause

  • ObjectClassifier.get_classes() explicitly returns None when an image cannot be read.
  • The caller assumes a list is always returned and calls len(classes) unconditionally.
  • This raises a runtime exception that aborts the entire batch.
  • There is no cleanup or state update for images that no longer exist on disk.
  • The design assumes filesystem stability between scanning and AI processing, which does not hold in real-world usage.

Fix Overview

This PR makes the AI pipeline defensive at image boundaries instead of failing the entire batch:

  • Missing files are detected early and cleaned up from the database.
  • None returns from the classifier are handled explicitly.
  • Face detection failures are logged but do not abort processing.
  • Errors are isolated per image so the batch can continue.

Key Example

Handling unreadable images safely:

classes = object_classifier.get_classes(image_path)

if classes is None:
    logger.warning(f"Skipping image {image_path}: file not readable")
    db_update_image_tagged_status(image_id, True)
    continue

Ensuring filesystem consistency:

if not os.path.exists(image_path):
    logger.warning(f"Image file no longer exists: {image_path}")
    db_delete_images_by_ids([image_id])
    continue

Result

  • AI tagging completes reliably even with corrupt or missing files.
  • Infinite retry loops are eliminated.
  • Face clustering runs consistently after processing.
  • Errors are visible in logs but no longer break core functionality.

This change preserves existing behavior for valid images while preventing a single bad file from breaking the entire system.

Summary by CodeRabbit

  • Bug Fixes
    • Improved robustness when handling missing or corrupted image files
    • Enhanced error handling for classification and face detection failures
    • Added comprehensive error logging for better diagnostics and troubleshooting

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link
Contributor

⚠️ No issue was linked in the PR description.
Please make sure to link an issue (e.g., 'Fixes #issue_number')

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

Enhanced image processing robustness in the utility module by adding file existence validation, conditional classification handling with None checks, optimized face detection (only for valid person classes), and per-image error handling with granular logging and conditional status updates.

Changes

Cohort / File(s) Summary
Image Processing Error Handling & Validation
backend/app/utils/images.py
Added pre-processing file existence guard with database deletion for missing images; introduced None-handling for classification results; implemented conditional face detection only when person class (id 0) present with count 1-6; wrapped per-image processing in try/except with per-image error logging; made image tagging updates conditional on successful processing outcomes rather than unconditional

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppity-hop, the images now stand tall,
With guards that check before they fall,
No missing files shall pass our test,
Face detection works its very best—
Only when the people appear so clear! 🖼️✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main fix: handling TypeErrors that occur when image files are missing or corrupt during processing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

⚠️ No issue was linked in the PR description.
Please make sure to link an issue (e.g., 'Fixes #issue_number')

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/app/utils/images.py`:
- Around line 115-123: When detecting a missing source file in the block that
checks os.path.exists(image_path), also remove the associated thumbnail file
referenced on the image record (thumbnailPath) before deleting the DB record;
locate the check using image_path and image_id and call
os.path.exists(thumbnailPath) then os.remove(thumbnailPath) (wrap in try/except
to log any unlink errors) and only then call db_delete_images_by_ids([image_id])
so orphaned thumbnails are cleaned up.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3af96c and 7e90942.

📒 Files selected for processing (1)
  • backend/app/utils/images.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/utils/images.py (3)
backend/app/database/images.py (3)
  • db_delete_images_by_ids (364-396)
  • db_update_image_tagged_status (264-290)
  • db_insert_image_classes_batch (293-324)
backend/app/models/ObjectClassifier.py (1)
  • get_classes (15-25)
backend/app/models/FaceDetector.py (1)
  • detect_faces (26-67)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +115 to +123
try:
# Step 0: Check if image file still exists on disk
if not os.path.exists(image_path):
logger.warning(
f"Image file no longer exists, removing from database: {image_path}"
)
db_delete_images_by_ids([image_id])
continue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "def db_get_untagged_images" -A 15

Repository: AOSSIE-Org/PictoPy

Length of output: 1071


🏁 Script executed:

rg -n "def db_delete_images_by_ids" -A 5

Repository: AOSSIE-Org/PictoPy

Length of output: 490


🏁 Script executed:

fd -t f "images.py" | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 179


🏁 Script executed:

rg -n "UntaggedImageRecord" -A 5 -B 2 | head -40

Repository: AOSSIE-Org/PictoPy

Length of output: 1010


🏁 Script executed:

sed -n '100,140p' backend/app/utils/images.py

Repository: AOSSIE-Org/PictoPy

Length of output: 1748


🏁 Script executed:

sed -n '364,390p' backend/app/database/images.py

Repository: AOSSIE-Org/PictoPy

Length of output: 857


🏁 Script executed:

sed -n '32,45p' backend/app/database/images.py

Repository: AOSSIE-Org/PictoPy

Length of output: 357


🏁 Script executed:

rg -n "thumbnailPath\|thumbnail_path" backend/app/utils/images.py | head -10

Repository: AOSSIE-Org/PictoPy

Length of output: 44


🏁 Script executed:

rg -n "os.remove.*thumbnail\|thumbnail.*os.remove" backend/app/

Repository: AOSSIE-Org/PictoPy

Length of output: 44


🏁 Script executed:

sed -n '217,250p' backend/app/database/images.py

Repository: AOSSIE-Org/PictoPy

Length of output: 1120


🏁 Script executed:

sed -n '95,115p' backend/app/utils/images.py

Repository: AOSSIE-Org/PictoPy

Length of output: 695


🏁 Script executed:

rg -n "image_util_classify_and_face_detect_images" backend/app/ -B 2 -A 2

Repository: AOSSIE-Org/PictoPy

Length of output: 619


Clean up thumbnails when source file is missing.

When the source image is deleted from disk, the code removes the DB record but leaves any thumbnail file orphaned. The image record includes thumbnailPath, so consider deleting it to prevent disk growth.

♻️ Suggested fix
                 if not os.path.exists(image_path):
                     logger.warning(
                         f"Image file no longer exists, removing from database: {image_path}"
                     )
+                    thumbnail_path = image.get("thumbnailPath")
+                    if thumbnail_path and os.path.exists(thumbnail_path):
+                        try:
+                            os.remove(thumbnail_path)
+                            logger.info(f"Removed orphaned thumbnail: {thumbnail_path}")
+                        except OSError as e:
+                            logger.warning(f"Error removing thumbnail {thumbnail_path}: {e}")
                     db_delete_images_by_ids([image_id])
                     continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
# Step 0: Check if image file still exists on disk
if not os.path.exists(image_path):
logger.warning(
f"Image file no longer exists, removing from database: {image_path}"
)
db_delete_images_by_ids([image_id])
continue
try:
# Step 0: Check if image file still exists on disk
if not os.path.exists(image_path):
logger.warning(
f"Image file no longer exists, removing from database: {image_path}"
)
thumbnail_path = image.get("thumbnailPath")
if thumbnail_path and os.path.exists(thumbnail_path):
try:
os.remove(thumbnail_path)
logger.info(f"Removed orphaned thumbnail: {thumbnail_path}")
except OSError as e:
logger.warning(f"Error removing thumbnail {thumbnail_path}: {e}")
db_delete_images_by_ids([image_id])
continue
🤖 Prompt for AI Agents
In `@backend/app/utils/images.py` around lines 115 - 123, When detecting a missing
source file in the block that checks os.path.exists(image_path), also remove the
associated thumbnail file referenced on the image record (thumbnailPath) before
deleting the DB record; locate the check using image_path and image_id and call
os.path.exists(thumbnailPath) then os.remove(thumbnailPath) (wrap in try/except
to log any unlink errors) and only then call db_delete_images_by_ids([image_id])
so orphaned thumbnails are cleaned up.

@atharrva01
Copy link
Author

hi @rahulharpal1603,
This PR fixes a critical failure where a single missing or corrupt image could halt all AI processing. It makes image tagging resilient by safely skipping problematic files and preventing infinite retry loops, so the rest of the batch can complete successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant