Skip to content

Al1Nad1/CBIR-Project

Repository files navigation

🔍 Enterprise Content-Based Image Retrieval (CBIR) System

Python PyTorch Transformers Tkinter

An advanced, multimodal Content-Based Image Retrieval (CBIR) system featuring Deep Learning embeddings, Locality Sensitive Hashing (LSH), and a Thread-Safe Persistent Vector Database.

✨ Key Features

  • Multimodal Search Options:

    • Image-to-Image: Uses ResNet18 (transfer learning) to extract 512-dimensional visual features.

    • Text-to-Image: Uses OpenAI's CLIP model to allow users to search the database using natural language (e.g., "A red sports car").

  • Sub-linear Search (LSH): Implements Locality Sensitive Hashing via Random Projections to achieve approximate nearest neighbor (ANN) search, bypassing $O(N)$ brute-force limitations.

  • Thread-Safe CRUD Database: Full Create, Read, Update, and Delete capabilities. The database persists to disk (.npz) and utilizes threading.Lock() to ensure atomicity during concurrent read/write operations.

  • Distance Metrics: Toggle seamlessly between Cosine Similarity (angular distance) and Euclidean Distance (L2 Norm).

  • Interactive PCA Visualization: Automatically projects the 512-dimensional vector space into a 2D interactive Matplotlib scatter plot to visualize the clustering of database images relative to the query.

🧠 Architectural Highlights

1. The "Modality Gap" Fix in LSH

While LSH is exceptionally fast for single-modality searches (Image-to-Image), cross-modal searches (Text-to-Image) often suffer from the Modality Gap—a phenomenon where text and image vectors exist in the same space but at a distinct angular offset (often ~75 degrees). To ensure maximum accuracy, this system dynamically disables LSH and falls back to Exact k-NN Search when a text query is detected, while utilizing LSH for image queries.

2. Concurrency & UI Responsiveness

Deep learning feature extraction and database indexing are computationally heavy. This application isolates the UI loop from the mathematical backend:

  • Feature extraction runs on the main thread (to comply with macOS Metal/MPS restrictions).
  • Database querying, LSH bucketing, and Disk I/O run on background threads with state locks.

🚀 Installation & Setup

1. Clone the repository

git clone https://github.com/Al1Nad1/CBIR_Project.git
cd CBIR_Project

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages