Skip to content

Feat:No content-based duplicate image detection (same image added multiple times in DB) #1069

@harsh1519

Description

@harsh1519

Describe the feature

Currently PictoPy identifies images only by their file path. If the same image file is:

Copied to another folder

Downloaded multiple times

Renamed

Or exists in backups

…it is stored as a new independent image in the database.

This causes:

Duplicate thumbnails

Duplicate metadata

Duplicate face processing

Duplicate tagging

Waste of storage and processing

No way to detect or manage duplicates

Add ScreenShots

Image

Same images:
Harsh_Shah.jpg exists in two folders

🔍 Current Behavior:

Images are uniquely identified by path

Same image in different folders = separate DB rows

No content hash or duplicate detection exists

✅ Expected Behavior:

System should compute a content hash (SHA256 or similar)

Store it in DB as image_hash

Allow:

Finding all images with same content

Showing duplicate groups

Letting user decide what to do with them (keep/delete/merge)

Record

  • I agree to follow this project's Code of Conduct
  • I want to work on this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions