Version: 1.0
Last Updated: November 23, 2024
Platform: macOS
Language: Python 3.10+
smugVision is a command-line tool that uses local AI vision models to automatically generate descriptive captions and relevant keyword tags for photos stored in SmugMug galleries. The tool processes images locally, generates metadata using AI, and updates SmugMug galleries directly via their API without requiring image re-uploads.
- Improve SmugMug photo searchability through AI-generated metadata
- Process images locally for privacy and cost efficiency
- Support batch processing of entire galleries or folders
- Provide extensibility for future enhancements (face detection, custom prompts, etc.)
- Maintain a clean, modular architecture
- Language: Python 3.10+
- AI Model: Llama 3.2 Vision 11B (via Ollama) - Default, but modular for future model support
- SmugMug Integration: Python library or custom API wrapper
- Configuration: YAML or JSON config file
- Logging: Python standard logging library with timestamps and module identification
Llama 3.2 Vision 11B was chosen as the default model because:
- Supports images up to 1120x1120 pixels resolution
- Runs efficiently on Apple Silicon (M4 Pro) with at least 8GB VRAM
- Provides strong image understanding and captioning capabilities
- Free and runs locally without API costs
- Works well via Ollama for easy management
Image Size Recommendation: Download medium-sized images (approximately 1024-1200px on longest edge) from SmugMug. This balances:
- Quality sufficient for accurate AI analysis
- Bandwidth efficiency
- Processing speed
- Staying within the model's optimal resolution range (1120x1120px)
While Llama 3.2 Vision is the initial choice, the architecture should support:
- OpenAI GPT-4o/GPT-4o-mini (cloud-based, best quality)
- Google Gemini Vision (cloud-based)
- Microsoft Florence-2 (lightweight, can run locally)
- RAM/RAM++ (specialized for tagging, open-source)
┌─────────────────┐
│ CLI Interface │
│ (arg parsing) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Config Manager │
│ (config.yaml) │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ SmugMug Manager │◄─────►│ Image Cache │
│ (API wrapper) │ │ (local temp) │
└────────┬────────┘ └──────────────────┘
│
▼
┌─────────────────┐
│ Vision Model │
│ (Ollama/LLM) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Metadata Gen │
│ (tags/captions) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ SmugMug API │
│ (push data) │
└─────────────────┘
smugvision/
├── __init__.py
├── __main__.py # CLI entry point
├── config/ # ✅ IMPLEMENTED
│ ├── __init__.py
│ ├── manager.py # Configuration loading and validation
│ └── defaults.py # Default configuration values
├── face/ # ✅ IMPLEMENTED
│ ├── __init__.py
│ └── recognizer.py # Face detection and recognition
├── smugmug/ # ✅ IMPLEMENTED
│ ├── __init__.py
│ ├── client.py # SmugMug API client wrapper
│ ├── models.py # Data models for Album, AlbumImage
│ └── exceptions.py # Custom exceptions
├── vision/ # ✅ IMPLEMENTED
│ ├── __init__.py
│ ├── base.py # Abstract base class for vision models
│ ├── llama.py # Llama 3.2 Vision implementation
│ ├── factory.py # Factory pattern for model selection
│ └── exceptions.py # Custom exceptions for vision models
├── cache/ # ✅ IMPLEMENTED
│ ├── __init__.py
│ └── manager.py # Image cache management
├── processing/
│ ├── __init__.py
│ ├── processor.py # Main processing orchestration
│ └── metadata.py # Metadata generation utilities
└── utils/ # ✅ PARTIALLY IMPLEMENTED
├── __init__.py
├── exif.py # EXIF data extraction and geocoding
├── locations.py # Custom location resolution (✅ IMPLEMENTED)
├── relationships.py # Person relationship management
└── helpers.py # Helper functions
config.yaml # User configuration file
config.yaml.example # Example configuration (✅ CREATED)
locations.yaml.example # Example custom locations file (✅ CREATED)
requirements.txt # Python dependencies
setup.py # Package installation
README.md # User documentation
DESIGN.md # This document
config.yaml example:
# SmugMug API Configuration
smugmug:
api_key: "YOUR_API_KEY"
api_secret: "YOUR_API_SECRET"
user_token: "YOUR_USER_TOKEN"
user_secret: "YOUR_USER_SECRET"
# Vision Model Configuration
vision:
model: "llama3.2-vision" # Options: llama3.2-vision, gpt-4o, etc.
endpoint: "http://localhost:11434" # Ollama endpoint
temperature: 0.7
max_tokens: 150
# Processing Configuration
processing:
marker_tag: "smugvision" # Tag to mark processed images
generate_captions: true # Enable caption generation
generate_tags: true # Enable tag generation
preserve_existing: true # Keep existing captions/tags
image_size: "medium" # Download size from SmugMug
# Location Resolution Configuration
location:
custom_locations_file: "~/.smugvision/locations.yaml"
check_custom_first: true # Check custom locations before geocoding
use_aliases_as_tags: true # Add location aliases as keyword tags
# Prompt Configuration
prompts:
caption: |
Analyze this image and provide a concise, descriptive caption (1-2 sentences)
that describes the main subject, setting, and any notable activities or features.
If EXIF location data is available, incorporate the location naturally.
tags: |
Generate 5-10 relevant keyword tags for this image. Focus on:
- Main subjects and objects
- Activities or actions
- Setting and location
- Colors and mood
- Time of day or season (if apparent)
Provide tags as a comma-separated list.
# Cache Configuration
cache:
directory: "~/.smugvision/cache"
clear_on_exit: false # Will be implemented later
preserve_structure: true # Mirror gallery/folder structure
# Logging Configuration
logging:
level: "INFO" # DEBUG, INFO, WARNING, ERROR
file: "~/.smugvision/smugvision.log"
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"- Each processed image receives a configurable marker tag (default: "smugvision")
- Before processing, check if marker tag exists to avoid duplicate processing
- Command-line flag
--force-reprocessbypasses this check - Marker tag is added to SmugMug keywords array
- Extract EXIF data from images (GPS coordinates, camera model, date/time)
- If GPS coordinates exist, attempt reverse geocoding to location name
- Include location information naturally in prompts sent to vision model
- Example: "A sunset over the Golden Gate Bridge in San Francisco, California"
Custom locations allow users to define friendly names for places like their home, relatives' houses, or frequently visited locations. These override reverse geocoding results.
Use Cases:
- Get "Eric's House" instead of a street address
- Ensure consistent naming across all photos at the same location
- Faster processing (no API calls needed for custom locations)
- Add searchable aliases as tags
locations.yaml example:
locations:
- name: "Eric Pullen's House"
latitude: 38.123456
longitude: -85.654321
radius: 50 # Match radius in meters
address: "5311 Montfort Lane, Louisville, KY"
aliases:
- "Home"
- "Pullen Residence"
- name: "Louisville Slugger Field"
latitude: 38.256510
longitude: -85.747476
radius: 200 # Larger radius for a stadium
aliases:
- "Bats Game"
- "Baseball Stadium"Resolution Priority:
- Check custom locations file first (closest match within radius)
- If no match, fall back to Overpass API / Nominatim reverse geocoding
- If geocoding fails, return coordinates as string
Key Classes:
LocationResolver: Loads and manages custom locations from YAMLCustomLocation: Data class for a single location definitionLocationMatch: Result of a coordinate match including distance
- When generating new metadata, preserve existing captions and keywords
- Append new captions after existing ones (with separator if needed)
- Merge new keywords with existing ones (avoiding duplicates)
- Configuration option to control this behavior
~/.smugvision/cache/
└── [user_nickname]/
└── [folder_name]/
└── [gallery_name]/
├── image1.jpg
├── image2.jpg
└── ...
- Download images to local cache with structure mirroring SmugMug
- Check cache before downloading (skip if already exists)
- Command-line flag
--clear-cacheto remove cached images - Future: Auto-cleanup after successful processing (configurable)
Strategy: Fail-fast for initial implementation
- Stop processing on first API error
- Log detailed error information including:
- Image filename/URL
- Error type and message
- Stack trace
- Gallery/folder context
- Save processing state for potential resume (future enhancement)
Error Types:
- SmugMug API errors (auth, rate limit, network)
- Vision model errors (timeout, invalid response)
- File I/O errors (cache write failures)
- Configuration errors (missing/invalid settings)
Initial Implementation:
- Single gallery processing via
--gallery <gallery_id> - Gallery ID obtained from SmugMug URL or API
Future Enhancement:
- Folder processing via
--folder <folder_id> - Recursive processing of all galleries in folder
- Progress tracking across multiple galleries
# Process a single gallery
smugvision --gallery "abc123"
# Force reprocessing of already-processed images
smugvision --gallery "abc123" --force-reprocess
# Clear cache before processing
smugvision --gallery "abc123" --clear-cache
# Dry run (don't update SmugMug, just show what would happen)
smugvision --gallery "abc123" --dry-run
# Use alternate config file
smugvision --gallery "abc123" --config /path/to/config.yaml
# Verbose logging
smugvision --gallery "abc123" --verboseRequired:
--gallery GALLERY_ID SmugMug gallery ID to process
Optional:
--folder FOLDER_ID Process all galleries in folder (future)
--config PATH Path to config file (default: ./config.yaml)
--force-reprocess Reprocess images even if already tagged
--clear-cache Clear image cache before processing
--dry-run Preview actions without updating SmugMug
--verbose, -v Enable verbose DEBUG logging
--help, -h Show help message
--version Show version information
- PEP 8 compliance for all Python code
- Type hints for all function signatures
- Docstrings for all classes and public methods (Google style)
- Line length: 100 characters maximum
- Formatter: Black (with line-length=100)
- Linter: Pylint with custom configuration
from typing import List, Optional
import logging
logger = logging.getLogger(__name__)
class SmugMugClient:
"""Client for interacting with SmugMug API.
This class handles authentication, API requests, and data retrieval
from SmugMug galleries and images.
Attributes:
api_key: SmugMug API key
api_secret: SmugMug API secret
access_token: OAuth access token
"""
def __init__(self, api_key: str, api_secret: str) -> None:
"""Initialize SmugMug client with credentials.
Args:
api_key: SmugMug API key
api_secret: SmugMug API secret
Raises:
ValueError: If credentials are empty or invalid
"""
self.api_key = api_key
self.api_secret = api_secret
self._authenticated = False
logger.info("SmugMugClient initialized")
def get_gallery_images(
self,
gallery_id: str,
include_processed: bool = False
) -> List[dict]:
"""Retrieve all images from specified gallery.
Args:
gallery_id: Unique identifier for the gallery
include_processed: If False, skip images with marker tag
Returns:
List of image dictionaries containing metadata
Raises:
SmugMugAPIError: If API request fails
AuthenticationError: If not authenticated
"""
logger.info(f"Fetching images from gallery {gallery_id}")
# Implementation here
passfrom smugvision.smugmug.exceptions import SmugMugAPIError
try:
images = smugmug_client.get_gallery_images(gallery_id)
except SmugMugAPIError as e:
logger.error(f"Failed to fetch gallery images: {e}", exc_info=True)
print(f"Error: Unable to access gallery {gallery_id}")
print(f"Details: {str(e)}")
sys.exit(1)- Unit tests for individual modules
- Integration tests for API interactions (with mocking)
- Fixture data for consistent test inputs
- Test coverage target: 80% minimum
- Framework: pytest
# requirements.txt
ollama>=0.1.0
pyyaml>=6.0
requests>=2.31.0
pillow>=10.0.0
python-dateutil>=2.8.2
pytest>=7.4.0
black>=23.0.0
pylint>=3.0.0
- Branch naming:
feature/description,bugfix/description - Commit messages: Conventional Commits format
feat: add support for folder processingfix: correct EXIF coordinate parsingdocs: update configuration examples
- Pull requests: Required for all changes
- Version tags: Semantic versioning (v1.0.0, v1.1.0, etc.)
All log messages must include:
- Timestamp (ISO 8601 format)
- Module name (automatically via
__name__) - Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Message with relevant context
import logging
logger = logging.getLogger(__name__)
# INFO: General progress
logger.info(f"Processing image: {image_filename}")
# DEBUG: Detailed information
logger.debug(f"Vision model response: {response[:100]}...")
# WARNING: Recoverable issues
logger.warning(f"Image {image_id} already has marker tag, skipping")
# ERROR: Failures requiring attention
logger.error(f"Failed to update image {image_id}: {error}", exc_info=True)2024-11-23 14:32:15,123 - smugvision.smugmug.client - INFO - Authenticating with SmugMug API
2024-11-23 14:32:16,456 - smugvision.processing.processor - INFO - Processing gallery abc123
2024-11-23 14:32:17,789 - smugvision.vision.llama - DEBUG - Sending prompt to Llama 3.2 Vision
2024-11-23 14:32:19,012 - smugvision.smugmug.client - ERROR - API request failed: 401 Unauthorized
- Face detection and recognition using provided reference faces
- Folder-level processing (recursive gallery processing)
- Resume capability for interrupted processing
- Parallel processing for faster throughput
- Progress bar with ETA
- Web UI for easier configuration and monitoring
- Support for additional vision models (GPT-4o, Gemini, etc.)
- Batch prompt customization per gallery
- Advanced filtering (by date range, existing tags, etc.)
- Export metadata to local database for analytics
- SmugMug to SmugMug gallery migration with metadata
- Integration with other photo services (Google Photos, iCloud)
- Custom model fine-tuning for specific photo collections
- Automated tagging based on learned patterns
- Production CLI interface (
__main__.py) - Config file loading and validation
- Configuration manager with interactive setup
- YAML configuration support
- SmugMug API authentication (OAuth 1.0a)
- SmugMug API client wrapper
- Single gallery image retrieval
- Album and image data models
- Image cache management (download, organize, skip existing)
- Llama 3.2 Vision integration via Ollama
- Caption and tag generation
- Metadata update to SmugMug (PATCH endpoint)
- Marker tag system (check and add tags)
- Image download from SmugMug (multiple sizes)
- Video download support with LargestVideo endpoint
- Video filtering (skip by default, optional include)
- Basic error handling and logging
- Vision model factory pattern
- Abstract base class for vision models
- Custom exceptions for vision models
- Custom exceptions for SmugMug API
- Album resolution from URLs, node IDs, and names
- Recursive album search within folder structures
- URL path resolution for folder navigation
- Pagination support for large result sets
- ImageProcessor orchestration class
- MetadataFormatter for combining AI and EXIF metadata
- End-to-end processing pipeline with statistics
- Test utilities (test_smugmug.py, test_processor.py, test_vision.py, debug_face_recognition.py)
- Force reprocessing flag
- Preserve existing metadata
- EXIF data extraction and integration
- EXIF orientation handling
- HEIC/HEIF image format support
- Reverse geocoding for location names
- Improved error messages
- Unit tests for core modules
- Face detection and recognition system
- Reference faces management
- Person name identification (with proper formatting)
- Relationship context integration
- Dry-run mode
- Progress indicators and statistics
- Folder processing support (planned for 1.0)
- Cache cleanup functionality (planned for 1.0)
- Integration tests
- Complete documentation
- Installation via pip
- Comprehensive test coverage
- Production-ready error handling
- Performance optimizations
- Advanced EXIF location extraction with venue search
- Overpass API integration for POI discovery
- Configurable geocoding with exclusion filters
- Interactive venue selection
- Face encoding with multiple reference images per person
- Confidence-based face matching
- Person relationship management system
- Context-aware caption generation with person names
- Processing time tracking and metrics
- Multi-format image support (JPEG, PNG, HEIC)
- Image scaling for performance optimization
- Comprehensive logging with module identification
- SmugMug URL parsing and album resolution
- Recursive folder navigation and album discovery
- Cache folder structure mirroring SmugMug hierarchy
- Multiple image size support (Thumb through X3Large, Original)
- Video file detection and separate handling
- Video download via LargestVideo API endpoint
- Configurable video inclusion/exclusion
- SmugMug API pagination for large datasets
- Node-based folder hierarchy navigation
- Test utilities for SmugMug integration (test_smugmug.py)
- Test utilities for full processing pipeline (test_processor.py)
- OAuth token acquisition helper (get_smugmug_tokens.py)
- Album discovery tool (find_album_key.py)
- Production-ready CLI with rich output formatting
- Batch processing statistics and reporting
- Dry-run mode with detailed preview output
- Person name formatting (converting underscores to spaces)
- Pip-installable package with pyproject.toml
- Console scripts: smugvision, smugvision-config, smugvision-get-tokens, smugvision-optimize-faces
- Organized project structure (tests/, scripts/ directories)
- Python 3.10+
- Ollama installed on macOS
- SmugMug API credentials (API key, secret, OAuth tokens)
- Llama 3.2 Vision model downloaded via Ollama
# 1. Install Ollama
brew install ollama
# 2. Start Ollama service
ollama serve
# 3. Download Llama 3.2 Vision model
ollama pull llama3.2-vision
# 4. Clone repository
git clone https://github.com/yourusername/smugvision.git
cd smugvision
# 5. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 6. Install dependencies
pip install -r requirements.txt
# 7. Copy and configure config file
cp config.yaml.example config.yaml
# Edit config.yaml with your SmugMug credentials
# 8. Run smugVision
python -m smugvision --gallery "abc123"SmugMug uses OAuth 1.0a for authentication. The process:
- Application credentials (API key + secret) - obtained from SmugMug
- Request token - temporary token for authorization
- User authorization - user approves access via web browser
- Access token - permanent token for API requests
These tokens should be stored in config.yaml after initial setup.
# Get gallery details
GET /api/v2/album/{gallery_id}
# Get gallery images
GET /api/v2/album/{gallery_id}!images
# Get image details
GET /api/v2/image/{image_key}
# Update image metadata
PATCH /api/v2/image/{image_key}
{
"Caption": "New caption text",
"Keywords": ["tag1", "tag2", "tag3"]
}
# Download image
GET {image_url}?size={size} # size: Medium, Large, X2Large, etc.- SmugMug API rate limits should be respected
- Implement exponential backoff for rate limit errors
- Consider batch operations where possible
- Log rate limit warnings
@dataclass
class Gallery:
"""Represents a SmugMug gallery."""
gallery_id: str
name: str
url: str
image_count: int
uri: str@dataclass
class Image:
"""Represents a SmugMug image with metadata."""
image_key: str
filename: str
uri: str
caption: Optional[str]
keywords: List[str]
download_url: str
date_uploaded: str
exif: Optional[dict]
has_marker: bool # Whether smugvision tag exists@dataclass
class MetadataResult:
"""Generated metadata for an image."""
caption: str
tags: List[str]
confidence: float # 0.0 to 1.0
model_used: str
processing_time: float # secondsInfrastructure Layer:
- ✅ Configuration system with defaults, validation, and interactive setup
- ✅ SmugMug API client with OAuth 1.0a authentication
- ✅ Album/image retrieval with pagination and filtering
- ✅ Image and video download with multiple size options
- ✅ Local cache management with folder structure preservation
- ✅ URL/path-based album resolution and folder navigation
AI/ML Layer:
- ✅ Vision model abstraction (factory pattern with base class)
- ✅ Llama 3.2 Vision integration via Ollama
- ✅ Caption and tag generation
- ✅ EXIF data extraction with GPS and reverse geocoding
- ✅ Face detection and recognition with configurable confidence
- ✅ Person relationship management for context-aware captions
Data Layer:
- ✅ Album and AlbumImage data models
- ✅ Metadata update to SmugMug (PATCH endpoint)
- ✅ Marker tag system for tracking processed images
- ✅ Custom exceptions for error handling
All core processing components are now implemented and tested:
Completed Components:
-
✅
processing/processor.py- MainImageProcessororchestrator:- Accepts album key/URL and processes all unprocessed images
- Downloads images to cache (using CacheManager)
- Extracts EXIF data and identifies faces
- Generates captions and tags (using VisionModel)
- Updates SmugMug with new metadata
- Adds marker tag to processed images
- Tracks progress and reports detailed statistics
-
✅
processing/metadata.py-MetadataFormatterutilities:- Combines vision-generated captions with EXIF location data
- Merges person names from face recognition
- Handles metadata preservation (append vs replace)
- Formats tags and captions for SmugMug API
-
✅
__main__.py- Production CLI entry point:- Parses command-line arguments
- Initializes configuration
- Creates processor instance
- Runs processing and displays rich formatted results
- Supports dry-run, force-reprocess, and video filtering
Phase 1: Testing & Documentation ✅ COMPLETE
- ✅ Test with real albums (validated)
- ✅ Document CLI usage in README.md
- ✅ Create comprehensive documentation
- ✅ Add usage examples
Phase 2: Future Enhancements (Version 1.0+)
- Folder batch processing (process entire folder trees)
- Cache cleanup utilities
- Unit and integration tests
- Performance optimizations (parallel downloads, batch API calls)
Phase 2: CLI Interface
- Create
__main__.pywith argument parsing - Add commands:
process,list,status - Support for
--gallery,--url,--node,--force-reprocess - Add
--dry-runmode for preview - Implement verbose logging flag
Phase 3: Testing & Refinement
- Test with real SmugMug galleries
- Handle edge cases (no faces, no EXIF, processing errors)
- Optimize for performance (parallel downloads, batch updates)
- Add progress bars and ETA
- Write unit tests for processor
Phase 4: Documentation & Packaging
- Complete README with usage examples
- Add troubleshooting guide
- Create setup.py for pip installation
- Add example configurations
Processing Flow:
For each image in album:
1. Check if already processed (marker tag) → skip if yes
2. Download to cache (skip if cached)
3. Extract EXIF data (GPS, camera info, date)
4. Detect and identify faces (if enabled)
5. Generate caption with vision model
6. Generate tags with vision model
7. Format metadata (merge person names, location)
8. Update SmugMug via PATCH API
9. Add marker tag
10. Log results and metrics
Error Handling Strategy:
- Continue processing on single image failure
- Log errors with full context
- Collect statistics (success/skip/error counts)
- Display summary at end
- Option for
--stop-on-errorfor strict mode
Performance Optimizations:
- Cache downloaded images (already implemented)
- Reuse face encodings across images
- Batch SmugMug updates where possible
- Show progress with ETA
- Metadata backup: Should we maintain local backup of original metadata before modification?
- Prompt templates: Should we support per-gallery custom prompts?
- Batch size: What's the optimal number of images to process before syncing to SmugMug?
- Model switching: Should we support multiple models simultaneously for comparison?
- Undo functionality: How to implement rollback of metadata changes?
A local web-based interface for smugVision that provides a visual preview of AI-generated metadata before committing changes to SmugMug. The UI defaults to dry-run mode, showing proposed changes alongside thumbnails, and requires explicit user confirmation to commit.
- Provide visual feedback for processing decisions
- Show side-by-side comparison of current vs. proposed metadata
- Default to safe dry-run mode (no changes without explicit commit)
- Display detected faces and location information
- Surface reference face and relationship data for transparency
- Backend: Flask (Python) - Simple, integrates directly with existing smugVision modules
- Frontend: Vanilla HTML/CSS/JavaScript with a simple CSS framework (e.g., Pico CSS or similar minimal framework)
- Communication: REST API + Server-Sent Events (SSE) for progress updates
- Deployment: Localhost only (e.g.,
http://localhost:5050)
┌─────────────────────────────────────────────────────────────────┐
│ Web Browser │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ smugVision Web UI │ │
│ │ ┌─────────────┐ ┌────────────────────────────────────┐ │ │
│ │ │ Album Input │ │ Preview Grid │ │ │
│ │ │ (URL paste) │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │
│ │ └─────────────┘ │ │thumb│ │thumb│ │thumb│ │thumb│ │ │ │
│ │ ┌─────────────┐ │ │+diff│ │+diff│ │+diff│ │+diff│ │ │ │
│ │ │ Actions │ │ └─────┘ └─────┘ └─────┘ └─────┘ │ │ │
│ │ │ [Preview] │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │
│ │ │ [Commit] │ │ │thumb│ │thumb│ │thumb│ │thumb│ │ │ │
│ │ └─────────────┘ │ │+diff│ │+diff│ │+diff│ │+diff│ │ │ │
│ │ ┌─────────────┐ │ └─────┘ └─────┘ └─────┘ └─────┘ │ │ │
│ │ │ Progress │ │ (infinite scroll) │ │ │
│ │ │ [███░░] 60%│ └────────────────────────────────────┘ │ │
│ │ └─────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
│ HTTP/SSE
▼
┌─────────────────────────────────────────────────────────────────┐
│ Flask Backend (localhost:5050) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ REST API Routes │ │
│ │ POST /api/preview - Start preview processing │ │
│ │ GET /api/preview/status - SSE stream for progress │ │
│ │ GET /api/preview/results - Get preview results │ │
│ │ POST /api/commit - Commit changes to SmugMug │ │
│ │ GET /api/faces - List known reference faces │ │
│ │ GET /api/relationships - Get relationship graph data │ │
│ │ GET /api/thumbnail/<key> - Proxy thumbnail from SmugMug │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Existing smugVision Modules │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │ │
│ │ │ SmugMug │ │ Image │ │ Face │ │ │
│ │ │ Client │ │ Processor │ │ Recognizer │ │ │
│ │ └────────────┘ └────────────┘ └────────────────────┘ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │ │
│ │ │ Vision │ │ Cache │ │ Relationship │ │ │
│ │ │ Model │ │ Manager │ │ Manager │ │ │
│ │ └────────────┘ └────────────┘ └────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
smugvision/
├── web/ # NEW: Web UI module
│ ├── __init__.py
│ ├── app.py # Flask application factory
│ ├── routes/
│ │ ├── __init__.py
│ │ ├── api.py # REST API endpoints
│ │ └── pages.py # HTML page routes
│ ├── services/
│ │ ├── __init__.py
│ │ └── preview.py # Preview processing service
│ ├── static/
│ │ ├── css/
│ │ │ └── style.css # Custom styles
│ │ └── js/
│ │ └── app.js # Frontend JavaScript
│ └── templates/
│ ├── base.html # Base template
│ ├── index.html # Main page
│ ├── preview.html # Preview results
│ ├── faces.html # Known faces display
│ └── relationships.html # Relationship graph
Start a preview (dry-run) processing job for an album.
Request:
{
"url": "https://site.smugmug.com/.../n-XXXXX/album-name",
"force_reprocess": false
}Response:
{
"job_id": "abc123",
"album_key": "XXXXX",
"album_name": "Album Name",
"total_images": 42,
"status": "processing"
}Server-Sent Events stream for progress updates.
SSE Events:
event: progress
data: {"current": 5, "total": 42, "filename": "IMG_1234.jpg", "percent": 12}
event: image_complete
data: {"image_key": "xxx", "filename": "IMG_1234.jpg", "success": true}
event: complete
data: {"processed": 40, "skipped": 2, "errors": 0}
event: error
data: {"message": "Failed to process IMG_5678.jpg: timeout"}
Get the full preview results after processing completes.
Response:
{
"job_id": "abc123",
"album_key": "XXXXX",
"album_name": "Album Name",
"status": "complete",
"stats": {
"total": 42,
"processed": 40,
"skipped": 2,
"errors": 0
},
"images": [
{
"image_key": "img123",
"filename": "IMG_1234.jpg",
"thumbnail_url": "/api/thumbnail/img123",
"web_uri": "https://site.smugmug.com/...",
"status": "processed",
"current": {
"caption": "Existing caption or null",
"keywords": ["tag1", "tag2"]
},
"proposed": {
"caption": "AI-generated caption with location and people",
"keywords": ["tag1", "tag2", "newtag1", "newtag2", "smugvision"]
},
"details": {
"faces_detected": ["John Doe", "Jane Smith"],
"location": "Golden Gate Bridge, San Francisco, CA",
"exif_date": "2024-06-15T14:30:00"
}
},
{
"image_key": "img456",
"filename": "IMG_5678.jpg",
"thumbnail_url": "/api/thumbnail/img456",
"status": "skipped",
"reason": "Already has smugvision marker tag"
},
{
"image_key": "img789",
"filename": "IMG_9012.jpg",
"thumbnail_url": "/api/thumbnail/img789",
"status": "error",
"error": "Vision model timeout"
}
]
}Commit the previewed changes to SmugMug.
Request:
{
"job_id": "abc123"
}Response:
{
"status": "success",
"committed": 40,
"errors": 0
}Get list of known reference faces.
Response:
{
"faces": [
{
"name": "John Doe",
"display_name": "John Doe",
"reference_count": 3,
"sample_image": "/api/face-sample/John_Doe"
},
{
"name": "Jane Smith",
"display_name": "Jane Smith",
"reference_count": 2,
"sample_image": "/api/face-sample/Jane_Smith"
}
],
"total": 2
}Get relationship graph data for visualization.
Response:
{
"nodes": [
{"id": "John_Doe", "label": "John Doe"},
{"id": "Jane_Smith", "label": "Jane Smith"},
{"id": "Junior_Doe", "label": "Junior Doe"}
],
"edges": [
{"from": "John_Doe", "to": "Jane_Smith", "label": "spouse"},
{"from": "John_Doe", "to": "Junior_Doe", "label": "parent"},
{"from": "Jane_Smith", "to": "Junior_Doe", "label": "parent"}
],
"groups": [
{
"members": ["John_Doe", "Jane_Smith", "Junior_Doe"],
"description": "The Doe Family"
}
]
}Proxy thumbnail image from SmugMug (avoids CORS issues).
Response: Image binary (JPEG)
Get a sample reference face image for display.
Response: Image binary (JPEG)
- URL input field for SmugMug album URL
- "Preview" button to start dry-run processing
- Navigation to Faces and Relationships pages
- Status indicator for Ollama/vision model availability
- Album info header (name, image count)
- Progress bar (during processing)
- Infinite-scroll grid of image cards:
- Thumbnail image
- Status indicator (processed/skipped/error)
- Current vs. Proposed metadata diff view
- Detected faces chips
- Location badge
- "Commit All Changes" button (disabled during processing, enabled after)
- Summary statistics
┌─────────────────────────────────────────────────────────┐
│ ┌─────────────┐ IMG_1234.jpg [✓] │
│ │ │ ─────────────────────────────────────│
│ │ thumbnail │ Caption: │
│ │ │ - "Family at the beach" │
│ │ │ + "John and Jane enjoying sunset at │
│ └─────────────┘ Golden Gate Bridge, San Francisco" │
│ │
│ Keywords: │
│ [beach] [vacation] + [Golden Gate] + [sunset] │
│ + [John Doe] + [Jane Smith] + [smugvision] │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ 👤 John Doe, Jane Smith 📍 San Francisco, CA │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Skipped Card (grayed out):
┌─────────────────────────────────────────────────────────┐
│ ┌─────────────┐ IMG_5678.jpg [SKIPPED] │
│ │ │ ─────────────────────────────────────│
│ │ thumbnail │ Already processed (has smugvision │
│ │ (grayed) │ marker tag) │
│ │ │ │
│ └─────────────┘ Current: "Existing caption..." │
└─────────────────────────────────────────────────────────┘
- Grid of known people with sample face images
- Count of reference images per person
- Simple display (no add/remove functionality for now)
┌──────────────────────────────────────────────────────────────┐
│ Known Faces (5 people) │
│ ────────────────────────────────────────────────────────────│
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ [face] │ │ [face] │ │ [face] │ │
│ │ John Doe │ │ Jane Smith │ │ Junior Doe │ │
│ │ 3 refs │ │ 2 refs │ │ 1 ref │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└──────────────────────────────────────────────────────────────┘
- Visual graph of relationships (using a simple JS graph library like vis.js or cytoscape.js)
- List view of defined groups
- Shows relationship types (spouse, parent, sibling, etc.)
┌──────────────────────────────────────────────────────────────┐
│ Relationship Graph │
│ ────────────────────────────────────────────────────────────│
│ │
│ ┌──────────┐ │
│ │ John Doe │ │
│ └────┬─────┘ │
│ spouse │ │
│ ┌────┴─────┐ │
│ │Jane Smith│ │
│ └────┬─────┘ │
│ parent │ │
│ ┌────┴─────┐ │
│ │Junior Doe│ │
│ └──────────┘ │
│ │
│ Groups: │
│ • The Doe Family: John Doe, Jane Smith, Junior Doe │
└──────────────────────────────────────────────────────────────┘
1. User enters SmugMug URL → clicks "Preview"
│
2. POST /api/preview
│
3. Backend:
├── Parse URL → extract album key
├── Fetch album info from SmugMug
├── Return job_id immediately
│
4. Frontend connects to GET /api/preview/status?job_id=xxx (SSE)
│
5. Backend processes each image (dry_run=True):
│ For each image:
│ ├── Download thumbnail/medium image
│ ├── Extract EXIF location
│ ├── Detect/identify faces
│ ├── Generate caption & tags via vision model
│ ├── Format proposed metadata
│ ├── Store result in memory (job results dict)
│ └── Send SSE progress event
│
6. Frontend receives SSE events → updates progress bar
│
7. Processing complete → SSE "complete" event
│
8. Frontend calls GET /api/preview/results
│
9. Frontend renders image grid with diff views
1. User reviews preview → clicks "Commit All Changes"
│
2. POST /api/commit {job_id: "xxx"}
│
3. Backend:
│ For each processed image in job results:
│ ├── Call SmugMug PATCH API with proposed metadata
│ └── Track success/failure
│
4. Return commit results
│
5. Frontend shows success message with statistics
The backend maintains in-memory state for active preview jobs:
# In-memory job storage (simple dict for localhost use)
preview_jobs: Dict[str, PreviewJob] = {}
@dataclass
class PreviewJob:
job_id: str
album_key: str
album_name: str
status: str # "processing", "complete", "error"
total_images: int
current_image: int
results: List[ImagePreviewResult]
created_at: datetime
@dataclass
class ImagePreviewResult:
image_key: str
filename: str
thumbnail_url: str
web_uri: str
status: str # "processed", "skipped", "error"
current_caption: Optional[str]
current_keywords: List[str]
proposed_caption: Optional[str]
proposed_keywords: List[str]
faces_detected: List[str]
location: Optional[str]
error: Optional[str]Add a new command to start the web server:
# Start the web UI server
smugvision-web
# Or with options
smugvision-web --port 5050 --debugThis will be a new console script entry point in pyproject.toml.
- Create Flask app structure with routes
- Implement
/api/previewendpoint (leverages existing ImageProcessor with dry_run=True) - Implement SSE progress streaming
- Implement
/api/preview/resultsendpoint - Create basic HTML templates with URL input and progress display
- Implement thumbnail proxying
- Build image card component with diff view
- Implement infinite scroll for results
- Style processed/skipped/error states
- Add faces and location display to cards
- Implement
/api/commitendpoint - Add commit button with confirmation
- Show commit results/statistics
- Implement
/api/facesendpoint - Build faces gallery page
- Implement
/api/relationshipsendpoint - Build relationship graph visualization (using vis.js or similar)
# Add to requirements.txt
flask>=3.0.0
No heavy frontend framework needed - vanilla JS with fetch API and EventSource for SSE.
- Localhost only: Server binds to 127.0.0.1, not 0.0.0.0
- No authentication: Assumes trusted local environment
- No persistent storage: Job data is in-memory only, cleared on restart
- SmugMug credentials: Read from existing config.yaml, never exposed via API
- Multiple gallery processing queue
- Gallery browser (tree view of SmugMug folders)
- Selective commit (checkbox per image)
- Edit proposed metadata before commit
- Reference face management (add/remove)
- Cache management UI
- Processing history/logs view
- Dark mode
This is currently a personal project. Contribution guidelines will be added once the MVP is complete.
To be determined (likely MIT or Apache 2.0)
Developer: Eric
Repository: https://github.com/yourusername/smugvision (update when created)
Issues: Use GitHub Issues for bug reports and feature requests
Document History:
- v1.0 (2024-11-23): Initial design document created