An advanced, multimodal Content-Based Image Retrieval (CBIR) system featuring Deep Learning embeddings, Locality Sensitive Hashing (LSH), and a Thread-Safe Persistent Vector Database.
-
Multimodal Search Options:
-
Image-to-Image: Uses ResNet18 (transfer learning) to extract 512-dimensional visual features.
-
Text-to-Image: Uses OpenAI's CLIP model to allow users to search the database using natural language (e.g., "A red sports car").
-
-
Sub-linear Search (LSH): Implements Locality Sensitive Hashing via Random Projections to achieve approximate nearest neighbor (ANN) search, bypassing
$O(N)$ brute-force limitations. -
Thread-Safe CRUD Database: Full Create, Read, Update, and Delete capabilities. The database persists to disk (
.npz) and utilizesthreading.Lock()to ensure atomicity during concurrent read/write operations. -
Distance Metrics: Toggle seamlessly between Cosine Similarity (angular distance) and Euclidean Distance (L2 Norm).
-
Interactive PCA Visualization: Automatically projects the 512-dimensional vector space into a 2D interactive Matplotlib scatter plot to visualize the clustering of database images relative to the query.
While LSH is exceptionally fast for single-modality searches (Image-to-Image), cross-modal searches (Text-to-Image) often suffer from the Modality Gap—a phenomenon where text and image vectors exist in the same space but at a distinct angular offset (often ~75 degrees). To ensure maximum accuracy, this system dynamically disables LSH and falls back to Exact k-NN Search when a text query is detected, while utilizing LSH for image queries.
Deep learning feature extraction and database indexing are computationally heavy. This application isolates the UI loop from the mathematical backend:
- Feature extraction runs on the main thread (to comply with macOS Metal/MPS restrictions).
- Database querying, LSH bucketing, and Disk I/O run on background threads with state locks.
1. Clone the repository
git clone https://github.com/Al1Nad1/CBIR_Project.git
cd CBIR_Project