smugVision Design Document

Version: 1.0
Last Updated: November 23, 2024
Platform: macOS
Language: Python 3.10+

Project Overview

smugVision is a command-line tool that uses local AI vision models to automatically generate descriptive captions and relevant keyword tags for photos stored in SmugMug galleries. The tool processes images locally, generates metadata using AI, and updates SmugMug galleries directly via their API without requiring image re-uploads.

Core Goals

Improve SmugMug photo searchability through AI-generated metadata
Process images locally for privacy and cost efficiency
Support batch processing of entire galleries or folders
Provide extensibility for future enhancements (face detection, custom prompts, etc.)
Maintain a clean, modular architecture

Technical Stack

Primary Components

Language: Python 3.10+
AI Model: Llama 3.2 Vision 11B (via Ollama) - Default, but modular for future model support
SmugMug Integration: Python library or custom API wrapper
Configuration: YAML or JSON config file
Logging: Python standard logging library with timestamps and module identification

Model Selection Rationale

Llama 3.2 Vision 11B was chosen as the default model because:

Supports images up to 1120x1120 pixels resolution
Runs efficiently on Apple Silicon (M4 Pro) with at least 8GB VRAM
Provides strong image understanding and captioning capabilities
Free and runs locally without API costs
Works well via Ollama for easy management

Image Size Recommendation: Download medium-sized images (approximately 1024-1200px on longest edge) from SmugMug. This balances:

Quality sufficient for accurate AI analysis
Bandwidth efficiency
Processing speed
Staying within the model's optimal resolution range (1120x1120px)

Alternative Models (Future Consideration)

While Llama 3.2 Vision is the initial choice, the architecture should support:

OpenAI GPT-4o/GPT-4o-mini (cloud-based, best quality)
Google Gemini Vision (cloud-based)
Microsoft Florence-2 (lightweight, can run locally)
RAM/RAM++ (specialized for tagging, open-source)

Architecture

High-Level Design

┌─────────────────┐
│   CLI Interface │
│  (arg parsing)  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Config Manager  │
│  (config.yaml)  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐       ┌──────────────────┐
│ SmugMug Manager │◄─────►│   Image Cache    │
│  (API wrapper)  │       │   (local temp)   │
└────────┬────────┘       └──────────────────┘
         │
         ▼
┌─────────────────┐
│  Vision Model   │
│   (Ollama/LLM)  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Metadata Gen   │
│ (tags/captions) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  SmugMug API    │
│   (push data)   │
└─────────────────┘

Module Structure

smugvision/
├── __init__.py
├── __main__.py              # CLI entry point
├── config/                  # ✅ IMPLEMENTED
│   ├── __init__.py
│   ├── manager.py           # Configuration loading and validation
│   └── defaults.py          # Default configuration values
├── face/                    # ✅ IMPLEMENTED
│   ├── __init__.py
│   └── recognizer.py        # Face detection and recognition
├── smugmug/                 # ✅ IMPLEMENTED
│   ├── __init__.py
│   ├── client.py            # SmugMug API client wrapper
│   ├── models.py            # Data models for Album, AlbumImage
│   └── exceptions.py        # Custom exceptions
├── vision/                  # ✅ IMPLEMENTED
│   ├── __init__.py
│   ├── base.py              # Abstract base class for vision models
│   ├── llama.py             # Llama 3.2 Vision implementation
│   ├── factory.py           # Factory pattern for model selection
│   └── exceptions.py        # Custom exceptions for vision models
├── cache/                  # ✅ IMPLEMENTED
│   ├── __init__.py
│   └── manager.py           # Image cache management
├── processing/
│   ├── __init__.py
│   ├── processor.py         # Main processing orchestration
│   └── metadata.py          # Metadata generation utilities
└── utils/                   # ✅ PARTIALLY IMPLEMENTED
    ├── __init__.py
    ├── exif.py              # EXIF data extraction and geocoding
    ├── locations.py         # Custom location resolution (✅ IMPLEMENTED)
    ├── relationships.py     # Person relationship management
    └── helpers.py           # Helper functions

config.yaml                  # User configuration file
config.yaml.example          # Example configuration (✅ CREATED)
locations.yaml.example       # Example custom locations file (✅ CREATED)
requirements.txt             # Python dependencies
setup.py                     # Package installation
README.md                    # User documentation
DESIGN.md                    # This document

Configuration File Structure

config.yaml example:

# SmugMug API Configuration
smugmug:
  api_key: "YOUR_API_KEY"
  api_secret: "YOUR_API_SECRET"
  user_token: "YOUR_USER_TOKEN"
  user_secret: "YOUR_USER_SECRET"

# Vision Model Configuration
vision:
  model: "llama3.2-vision"     # Options: llama3.2-vision, gpt-4o, etc.
  endpoint: "http://localhost:11434"  # Ollama endpoint
  temperature: 0.7
  max_tokens: 150

# Processing Configuration
processing:
  marker_tag: "smugvision"      # Tag to mark processed images
  generate_captions: true       # Enable caption generation
  generate_tags: true           # Enable tag generation
  preserve_existing: true       # Keep existing captions/tags
  image_size: "medium"          # Download size from SmugMug

# Location Resolution Configuration
location:
  custom_locations_file: "~/.smugvision/locations.yaml"
  check_custom_first: true      # Check custom locations before geocoding
  use_aliases_as_tags: true     # Add location aliases as keyword tags

# Prompt Configuration
prompts:
  caption: |
    Analyze this image and provide a concise, descriptive caption (1-2 sentences) 
    that describes the main subject, setting, and any notable activities or features.
    If EXIF location data is available, incorporate the location naturally.
  
  tags: |
    Generate 5-10 relevant keyword tags for this image. Focus on:
    - Main subjects and objects
    - Activities or actions
    - Setting and location
    - Colors and mood
    - Time of day or season (if apparent)
    Provide tags as a comma-separated list.

# Cache Configuration
cache:
  directory: "~/.smugvision/cache"
  clear_on_exit: false          # Will be implemented later
  preserve_structure: true      # Mirror gallery/folder structure

# Logging Configuration
logging:
  level: "INFO"                 # DEBUG, INFO, WARNING, ERROR
  file: "~/.smugvision/smugvision.log"
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

Key Features & Implementation Details

1. Marker Tag System

Each processed image receives a configurable marker tag (default: "smugvision")
Before processing, check if marker tag exists to avoid duplicate processing
Command-line flag --force-reprocess bypasses this check
Marker tag is added to SmugMug keywords array

2. EXIF-Aware Captions

Extract EXIF data from images (GPS coordinates, camera model, date/time)
If GPS coordinates exist, attempt reverse geocoding to location name
Include location information naturally in prompts sent to vision model
Example: "A sunset over the Golden Gate Bridge in San Francisco, California"

2.1 Custom Location Reference (✅ IMPLEMENTED)

Custom locations allow users to define friendly names for places like their home, relatives' houses, or frequently visited locations. These override reverse geocoding results.

Use Cases:

Get "Eric's House" instead of a street address
Ensure consistent naming across all photos at the same location
Faster processing (no API calls needed for custom locations)
Add searchable aliases as tags

locations.yaml example:

locations:
  - name: "Eric Pullen's House"
    latitude: 38.123456
    longitude: -85.654321
    radius: 50                    # Match radius in meters
    address: "5311 Montfort Lane, Louisville, KY"
    aliases:
      - "Home"
      - "Pullen Residence"
  
  - name: "Louisville Slugger Field"
    latitude: 38.256510
    longitude: -85.747476
    radius: 200                   # Larger radius for a stadium
    aliases:
      - "Bats Game"
      - "Baseball Stadium"

Resolution Priority:

Check custom locations file first (closest match within radius)
If no match, fall back to Overpass API / Nominatim reverse geocoding
If geocoding fails, return coordinates as string

Key Classes:

LocationResolver: Loads and manages custom locations from YAML
CustomLocation: Data class for a single location definition
LocationMatch: Result of a coordinate match including distance

3. Metadata Preservation

When generating new metadata, preserve existing captions and keywords
Append new captions after existing ones (with separator if needed)
Merge new keywords with existing ones (avoiding duplicates)
Configuration option to control this behavior

4. Image Cache Management

~/.smugvision/cache/
└── [user_nickname]/
    └── [folder_name]/
        └── [gallery_name]/
            ├── image1.jpg
            ├── image2.jpg
            └── ...

Download images to local cache with structure mirroring SmugMug
Check cache before downloading (skip if already exists)
Command-line flag --clear-cache to remove cached images
Future: Auto-cleanup after successful processing (configurable)

5. Error Handling

Strategy: Fail-fast for initial implementation

Stop processing on first API error
Log detailed error information including:
- Image filename/URL
- Error type and message
- Stack trace
- Gallery/folder context
Save processing state for potential resume (future enhancement)

Error Types:

SmugMug API errors (auth, rate limit, network)
Vision model errors (timeout, invalid response)
File I/O errors (cache write failures)
Configuration errors (missing/invalid settings)

6. Processing Scope

Initial Implementation:

Single gallery processing via --gallery <gallery_id>
Gallery ID obtained from SmugMug URL or API

Future Enhancement:

Folder processing via --folder <folder_id>
Recursive processing of all galleries in folder
Progress tracking across multiple galleries

Command-Line Interface

Basic Usage

# Process a single gallery
smugvision --gallery "abc123"

# Force reprocessing of already-processed images
smugvision --gallery "abc123" --force-reprocess

# Clear cache before processing
smugvision --gallery "abc123" --clear-cache

# Dry run (don't update SmugMug, just show what would happen)
smugvision --gallery "abc123" --dry-run

# Use alternate config file
smugvision --gallery "abc123" --config /path/to/config.yaml

# Verbose logging
smugvision --gallery "abc123" --verbose

CLI Arguments

Required:
  --gallery GALLERY_ID       SmugMug gallery ID to process

Optional:
  --folder FOLDER_ID         Process all galleries in folder (future)
  --config PATH              Path to config file (default: ./config.yaml)
  --force-reprocess          Reprocess images even if already tagged
  --clear-cache              Clear image cache before processing
  --dry-run                  Preview actions without updating SmugMug
  --verbose, -v              Enable verbose DEBUG logging
  --help, -h                 Show help message
  --version                  Show version information

Coding Practices

Code Style

PEP 8 compliance for all Python code
Type hints for all function signatures
Docstrings for all classes and public methods (Google style)
Line length: 100 characters maximum
Formatter: Black (with line-length=100)
Linter: Pylint with custom configuration

Example Code Style

from typing import List, Optional
import logging

logger = logging.getLogger(__name__)


class SmugMugClient:
    """Client for interacting with SmugMug API.
    
    This class handles authentication, API requests, and data retrieval
    from SmugMug galleries and images.
    
    Attributes:
        api_key: SmugMug API key
        api_secret: SmugMug API secret
        access_token: OAuth access token
    """
    
    def __init__(self, api_key: str, api_secret: str) -> None:
        """Initialize SmugMug client with credentials.
        
        Args:
            api_key: SmugMug API key
            api_secret: SmugMug API secret
            
        Raises:
            ValueError: If credentials are empty or invalid
        """
        self.api_key = api_key
        self.api_secret = api_secret
        self._authenticated = False
        logger.info("SmugMugClient initialized")
    
    def get_gallery_images(
        self, 
        gallery_id: str, 
        include_processed: bool = False
    ) -> List[dict]:
        """Retrieve all images from specified gallery.
        
        Args:
            gallery_id: Unique identifier for the gallery
            include_processed: If False, skip images with marker tag
            
        Returns:
            List of image dictionaries containing metadata
            
        Raises:
            SmugMugAPIError: If API request fails
            AuthenticationError: If not authenticated
        """
        logger.info(f"Fetching images from gallery {gallery_id}")
        # Implementation here
        pass

Error Handling Pattern

from smugvision.smugmug.exceptions import SmugMugAPIError

try:
    images = smugmug_client.get_gallery_images(gallery_id)
except SmugMugAPIError as e:
    logger.error(f"Failed to fetch gallery images: {e}", exc_info=True)
    print(f"Error: Unable to access gallery {gallery_id}")
    print(f"Details: {str(e)}")
    sys.exit(1)

Testing Strategy

Unit tests for individual modules
Integration tests for API interactions (with mocking)
Fixture data for consistent test inputs
Test coverage target: 80% minimum
Framework: pytest

Dependency Management

# requirements.txt
ollama>=0.1.0
pyyaml>=6.0
requests>=2.31.0
pillow>=10.0.0
python-dateutil>=2.8.2
pytest>=7.4.0
black>=23.0.0
pylint>=3.0.0

Git Practices

Branch naming: feature/description, bugfix/description
Commit messages: Conventional Commits format
- feat: add support for folder processing
- fix: correct EXIF coordinate parsing
- docs: update configuration examples
Pull requests: Required for all changes
Version tags: Semantic versioning (v1.0.0, v1.1.0, etc.)

Logging Standards

All log messages must include:

Timestamp (ISO 8601 format)
Module name (automatically via __name__)
Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
Message with relevant context

Logging Example

import logging

logger = logging.getLogger(__name__)

# INFO: General progress
logger.info(f"Processing image: {image_filename}")

# DEBUG: Detailed information
logger.debug(f"Vision model response: {response[:100]}...")

# WARNING: Recoverable issues
logger.warning(f"Image {image_id} already has marker tag, skipping")

# ERROR: Failures requiring attention
logger.error(f"Failed to update image {image_id}: {error}", exc_info=True)

Log Output Format

2024-11-23 14:32:15,123 - smugvision.smugmug.client - INFO - Authenticating with SmugMug API
2024-11-23 14:32:16,456 - smugvision.processing.processor - INFO - Processing gallery abc123
2024-11-23 14:32:17,789 - smugvision.vision.llama - DEBUG - Sending prompt to Llama 3.2 Vision
2024-11-23 14:32:19,012 - smugvision.smugmug.client - ERROR - API request failed: 401 Unauthorized

Future Enhancements

Phase 2 (Near-term)

Face detection and recognition using provided reference faces
Folder-level processing (recursive gallery processing)
Resume capability for interrupted processing
Parallel processing for faster throughput
Progress bar with ETA

Phase 3 (Medium-term)

Web UI for easier configuration and monitoring
Support for additional vision models (GPT-4o, Gemini, etc.)
Batch prompt customization per gallery
Advanced filtering (by date range, existing tags, etc.)
Export metadata to local database for analytics

Phase 4 (Long-term)

SmugMug to SmugMug gallery migration with metadata
Integration with other photo services (Google Photos, iCloud)
Custom model fine-tuning for specific photo collections
Automated tagging based on learned patterns

Development Roadmap

Version 0.1.0 (MVP) - Mostly Complete

Version 0.2.0 - Complete ✓

Force reprocessing flag
Preserve existing metadata
EXIF data extraction and integration
EXIF orientation handling
HEIC/HEIF image format support
Reverse geocoding for location names
Improved error messages
Unit tests for core modules

Version 0.3.0 - Complete ✓

Face detection and recognition system
Reference faces management
Person name identification (with proper formatting)
Relationship context integration
Dry-run mode
Progress indicators and statistics
Folder processing support (planned for 1.0)
Cache cleanup functionality (planned for 1.0)
Integration tests

Version 1.0.0

Completed Features (Beyond Original Roadmap)

Installation & Setup

Prerequisites

Python 3.10+
Ollama installed on macOS
SmugMug API credentials (API key, secret, OAuth tokens)
Llama 3.2 Vision model downloaded via Ollama

Installation Steps

# 1. Install Ollama
brew install ollama

# 2. Start Ollama service
ollama serve

# 3. Download Llama 3.2 Vision model
ollama pull llama3.2-vision

# 4. Clone repository
git clone https://github.com/yourusername/smugvision.git
cd smugvision

# 5. Create virtual environment
python3 -m venv venv
source venv/bin/activate

# 6. Install dependencies
pip install -r requirements.txt

# 7. Copy and configure config file
cp config.yaml.example config.yaml
# Edit config.yaml with your SmugMug credentials

# 8. Run smugVision
python -m smugvision --gallery "abc123"

SmugMug API Integration

Authentication Flow

SmugMug uses OAuth 1.0a for authentication. The process:

Application credentials (API key + secret) - obtained from SmugMug
Request token - temporary token for authorization
User authorization - user approves access via web browser
Access token - permanent token for API requests

These tokens should be stored in config.yaml after initial setup.

Key API Endpoints

# Get gallery details
GET /api/v2/album/{gallery_id}

# Get gallery images
GET /api/v2/album/{gallery_id}!images

# Get image details
GET /api/v2/image/{image_key}

# Update image metadata
PATCH /api/v2/image/{image_key}
{
    "Caption": "New caption text",
    "Keywords": ["tag1", "tag2", "tag3"]
}

# Download image
GET {image_url}?size={size}  # size: Medium, Large, X2Large, etc.

Rate Limiting

SmugMug API rate limits should be respected
Implement exponential backoff for rate limit errors
Consider batch operations where possible
Log rate limit warnings

Data Models

Gallery Model

@dataclass
class Gallery:
    """Represents a SmugMug gallery."""
    gallery_id: str
    name: str
    url: str
    image_count: int
    uri: str

Image Model

@dataclass
class Image:
    """Represents a SmugMug image with metadata."""
    image_key: str
    filename: str
    uri: str
    caption: Optional[str]
    keywords: List[str]
    download_url: str
    date_uploaded: str
    exif: Optional[dict]
    has_marker: bool  # Whether smugvision tag exists

Metadata Result Model

@dataclass
class MetadataResult:
    """Generated metadata for an image."""
    caption: str
    tags: List[str]
    confidence: float  # 0.0 to 1.0
    model_used: str
    processing_time: float  # seconds

Current Status & Next Steps

What's Complete (Ready for Integration)

Infrastructure Layer:

✅ Configuration system with defaults, validation, and interactive setup
✅ SmugMug API client with OAuth 1.0a authentication
✅ Album/image retrieval with pagination and filtering
✅ Image and video download with multiple size options
✅ Local cache management with folder structure preservation
✅ URL/path-based album resolution and folder navigation

AI/ML Layer:

✅ Vision model abstraction (factory pattern with base class)
✅ Llama 3.2 Vision integration via Ollama
✅ Caption and tag generation
✅ EXIF data extraction with GPS and reverse geocoding
✅ Face detection and recognition with configurable confidence
✅ Person relationship management for context-aware captions

Data Layer:

✅ Album and AlbumImage data models
✅ Metadata update to SmugMug (PATCH endpoint)
✅ Marker tag system for tracking processed images
✅ Custom exceptions for error handling

✅ Processing Module - COMPLETE

All core processing components are now implemented and tested:

Completed Components:

✅ processing/processor.py - Main ImageProcessor orchestrator:
- Accepts album key/URL and processes all unprocessed images
- Downloads images to cache (using CacheManager)
- Extracts EXIF data and identifies faces
- Generates captions and tags (using VisionModel)
- Updates SmugMug with new metadata
- Adds marker tag to processed images
- Tracks progress and reports detailed statistics
✅ processing/metadata.py - MetadataFormatter utilities:
- Combines vision-generated captions with EXIF location data
- Merges person names from face recognition
- Handles metadata preservation (append vs replace)
- Formats tags and captions for SmugMug API
✅ __main__.py - Production CLI entry point:
- Parses command-line arguments
- Initializes configuration
- Creates processor instance
- Runs processing and displays rich formatted results
- Supports dry-run, force-reprocess, and video filtering

Recommended Next Steps

Phase 1: Testing & Documentation ✅ COMPLETE

✅ Test with real albums (validated)
✅ Document CLI usage in README.md
✅ Create comprehensive documentation
✅ Add usage examples

Phase 2: Future Enhancements (Version 1.0+)

Folder batch processing (process entire folder trees)
Cache cleanup utilities
Unit and integration tests
Performance optimizations (parallel downloads, batch API calls)

Phase 2: CLI Interface

Create __main__.py with argument parsing
Add commands: process, list, status
Support for --gallery, --url, --node, --force-reprocess
Add --dry-run mode for preview
Implement verbose logging flag

Phase 3: Testing & Refinement

Test with real SmugMug galleries
Handle edge cases (no faces, no EXIF, processing errors)
Optimize for performance (parallel downloads, batch updates)
Add progress bars and ETA
Write unit tests for processor

Phase 4: Documentation & Packaging

Complete README with usage examples
Add troubleshooting guide
Create setup.py for pip installation
Add example configurations

Design Considerations for Processor

Processing Flow:

For each image in album:
  1. Check if already processed (marker tag) → skip if yes
  2. Download to cache (skip if cached)
  3. Extract EXIF data (GPS, camera info, date)
  4. Detect and identify faces (if enabled)
  5. Generate caption with vision model
  6. Generate tags with vision model
  7. Format metadata (merge person names, location)
  8. Update SmugMug via PATCH API
  9. Add marker tag
  10. Log results and metrics

Error Handling Strategy:

Continue processing on single image failure
Log errors with full context
Collect statistics (success/skip/error counts)
Display summary at end
Option for --stop-on-error for strict mode

Performance Optimizations:

Cache downloaded images (already implemented)
Reuse face encodings across images
Batch SmugMug updates where possible
Show progress with ETA

Questions for Future Consideration

Metadata backup: Should we maintain local backup of original metadata before modification?
Prompt templates: Should we support per-gallery custom prompts?
Batch size: What's the optimal number of images to process before syncing to SmugMug?
Model switching: Should we support multiple models simultaneously for comparison?
Undo functionality: How to implement rollback of metadata changes?

Web UI Design (Phase 3 Feature)

Overview

A local web-based interface for smugVision that provides a visual preview of AI-generated metadata before committing changes to SmugMug. The UI defaults to dry-run mode, showing proposed changes alongside thumbnails, and requires explicit user confirmation to commit.

Goals

Provide visual feedback for processing decisions
Show side-by-side comparison of current vs. proposed metadata
Default to safe dry-run mode (no changes without explicit commit)
Display detected faces and location information
Surface reference face and relationship data for transparency

Technical Stack

Backend: Flask (Python) - Simple, integrates directly with existing smugVision modules
Frontend: Vanilla HTML/CSS/JavaScript with a simple CSS framework (e.g., Pico CSS or similar minimal framework)
Communication: REST API + Server-Sent Events (SSE) for progress updates
Deployment: Localhost only (e.g., http://localhost:5050)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Web Browser                               │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                    smugVision Web UI                      │   │
│  │  ┌─────────────┐  ┌────────────────────────────────────┐ │   │
│  │  │ Album Input │  │         Preview Grid               │ │   │
│  │  │ (URL paste) │  │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐  │ │   │
│  │  └─────────────┘  │  │thumb│ │thumb│ │thumb│ │thumb│  │ │   │
│  │  ┌─────────────┐  │  │+diff│ │+diff│ │+diff│ │+diff│  │ │   │
│  │  │   Actions   │  │  └─────┘ └─────┘ └─────┘ └─────┘  │ │   │
│  │  │ [Preview]   │  │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐  │ │   │
│  │  │ [Commit]    │  │  │thumb│ │thumb│ │thumb│ │thumb│  │ │   │
│  │  └─────────────┘  │  │+diff│ │+diff│ │+diff│ │+diff│  │ │   │
│  │  ┌─────────────┐  │  └─────┘ └─────┘ └─────┘ └─────┘  │ │   │
│  │  │  Progress   │  │         (infinite scroll)          │ │   │
│  │  │  [███░░] 60%│  └────────────────────────────────────┘ │   │
│  │  └─────────────┘                                          │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ HTTP/SSE
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Flask Backend (localhost:5050)               │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                      REST API Routes                      │   │
│  │  POST /api/preview     - Start preview processing         │   │
│  │  GET  /api/preview/status - SSE stream for progress       │   │
│  │  GET  /api/preview/results - Get preview results          │   │
│  │  POST /api/commit      - Commit changes to SmugMug        │   │
│  │  GET  /api/faces       - List known reference faces       │   │
│  │  GET  /api/relationships - Get relationship graph data    │   │
│  │  GET  /api/thumbnail/<key> - Proxy thumbnail from SmugMug │   │
│  └──────────────────────────────────────────────────────────┘   │
│                              │                                   │
│                              ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │              Existing smugVision Modules                  │   │
│  │  ┌────────────┐ ┌────────────┐ ┌────────────────────┐    │   │
│  │  │ SmugMug    │ │ Image      │ │ Face               │    │   │
│  │  │ Client     │ │ Processor  │ │ Recognizer         │    │   │
│  │  └────────────┘ └────────────┘ └────────────────────┘    │   │
│  │  ┌────────────┐ ┌────────────┐ ┌────────────────────┐    │   │
│  │  │ Vision     │ │ Cache      │ │ Relationship       │    │   │
│  │  │ Model      │ │ Manager    │ │ Manager            │    │   │
│  │  └────────────┘ └────────────┘ └────────────────────┘    │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Module Structure

smugvision/
├── web/                        # NEW: Web UI module
│   ├── __init__.py
│   ├── app.py                  # Flask application factory
│   ├── routes/
│   │   ├── __init__.py
│   │   ├── api.py              # REST API endpoints
│   │   └── pages.py            # HTML page routes
│   ├── services/
│   │   ├── __init__.py
│   │   └── preview.py          # Preview processing service
│   ├── static/
│   │   ├── css/
│   │   │   └── style.css       # Custom styles
│   │   └── js/
│   │       └── app.js          # Frontend JavaScript
│   └── templates/
│       ├── base.html           # Base template
│       ├── index.html          # Main page
│       ├── preview.html        # Preview results
│       ├── faces.html          # Known faces display
│       └── relationships.html  # Relationship graph

API Endpoints

POST /api/preview

Start a preview (dry-run) processing job for an album.

Request:

{
  "url": "https://site.smugmug.com/.../n-XXXXX/album-name",
  "force_reprocess": false
}

Response:

{
  "job_id": "abc123",
  "album_key": "XXXXX",
  "album_name": "Album Name",
  "total_images": 42,
  "status": "processing"
}

GET /api/preview/status?job_id=abc123

Server-Sent Events stream for progress updates.

SSE Events:

event: progress
data: {"current": 5, "total": 42, "filename": "IMG_1234.jpg", "percent": 12}

event: image_complete
data: {"image_key": "xxx", "filename": "IMG_1234.jpg", "success": true}

event: complete
data: {"processed": 40, "skipped": 2, "errors": 0}

event: error
data: {"message": "Failed to process IMG_5678.jpg: timeout"}

GET /api/preview/results?job_id=abc123

Get the full preview results after processing completes.

Response:

{
  "job_id": "abc123",
  "album_key": "XXXXX",
  "album_name": "Album Name",
  "status": "complete",
  "stats": {
    "total": 42,
    "processed": 40,
    "skipped": 2,
    "errors": 0
  },
  "images": [
    {
      "image_key": "img123",
      "filename": "IMG_1234.jpg",
      "thumbnail_url": "/api/thumbnail/img123",
      "web_uri": "https://site.smugmug.com/...",
      "status": "processed",
      "current": {
        "caption": "Existing caption or null",
        "keywords": ["tag1", "tag2"]
      },
      "proposed": {
        "caption": "AI-generated caption with location and people",
        "keywords": ["tag1", "tag2", "newtag1", "newtag2", "smugvision"]
      },
      "details": {
        "faces_detected": ["John Doe", "Jane Smith"],
        "location": "Golden Gate Bridge, San Francisco, CA",
        "exif_date": "2024-06-15T14:30:00"
      }
    },
    {
      "image_key": "img456",
      "filename": "IMG_5678.jpg",
      "thumbnail_url": "/api/thumbnail/img456",
      "status": "skipped",
      "reason": "Already has smugvision marker tag"
    },
    {
      "image_key": "img789",
      "filename": "IMG_9012.jpg",
      "thumbnail_url": "/api/thumbnail/img789",
      "status": "error",
      "error": "Vision model timeout"
    }
  ]
}

POST /api/commit

Commit the previewed changes to SmugMug.

Request:

{
  "job_id": "abc123"
}

Response:

{
  "status": "success",
  "committed": 40,
  "errors": 0
}

GET /api/faces

Get list of known reference faces.

Response:

{
  "faces": [
    {
      "name": "John Doe",
      "display_name": "John Doe",
      "reference_count": 3,
      "sample_image": "/api/face-sample/John_Doe"
    },
    {
      "name": "Jane Smith",
      "display_name": "Jane Smith", 
      "reference_count": 2,
      "sample_image": "/api/face-sample/Jane_Smith"
    }
  ],
  "total": 2
}

GET /api/relationships

Get relationship graph data for visualization.

Response:

{
  "nodes": [
    {"id": "John_Doe", "label": "John Doe"},
    {"id": "Jane_Smith", "label": "Jane Smith"},
    {"id": "Junior_Doe", "label": "Junior Doe"}
  ],
  "edges": [
    {"from": "John_Doe", "to": "Jane_Smith", "label": "spouse"},
    {"from": "John_Doe", "to": "Junior_Doe", "label": "parent"},
    {"from": "Jane_Smith", "to": "Junior_Doe", "label": "parent"}
  ],
  "groups": [
    {
      "members": ["John_Doe", "Jane_Smith", "Junior_Doe"],
      "description": "The Doe Family"
    }
  ]
}

GET /api/thumbnail/<image_key>

Proxy thumbnail image from SmugMug (avoids CORS issues).

Response: Image binary (JPEG)

GET /api/face-sample/<person_name>

Get a sample reference face image for display.

Response: Image binary (JPEG)

UI Pages

Main Page (index.html)

URL input field for SmugMug album URL
"Preview" button to start dry-run processing
Navigation to Faces and Relationships pages
Status indicator for Ollama/vision model availability

Preview Results Page (preview.html)

Album info header (name, image count)
Progress bar (during processing)
Infinite-scroll grid of image cards:
- Thumbnail image
- Status indicator (processed/skipped/error)
- Current vs. Proposed metadata diff view
- Detected faces chips
- Location badge
"Commit All Changes" button (disabled during processing, enabled after)
Summary statistics

Image Card Component

┌─────────────────────────────────────────────────────────┐
│  ┌─────────────┐  IMG_1234.jpg                    [✓]  │
│  │             │  ─────────────────────────────────────│
│  │  thumbnail  │  Caption:                             │
│  │             │  - "Family at the beach"              │
│  │             │  + "John and Jane enjoying sunset at  │
│  └─────────────┘    Golden Gate Bridge, San Francisco" │
│                                                         │
│  Keywords:                                              │
│  [beach] [vacation] + [Golden Gate] + [sunset]         │
│  + [John Doe] + [Jane Smith] + [smugvision]            │
│                                                         │
│  ┌──────────────────────────────────────────────────┐  │
│  │ 👤 John Doe, Jane Smith  📍 San Francisco, CA   │  │
│  └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Skipped Card (grayed out):
┌─────────────────────────────────────────────────────────┐
│  ┌─────────────┐  IMG_5678.jpg              [SKIPPED]  │
│  │             │  ─────────────────────────────────────│
│  │  thumbnail  │  Already processed (has smugvision    │
│  │  (grayed)   │  marker tag)                          │
│  │             │                                        │
│  └─────────────┘  Current: "Existing caption..."       │
└─────────────────────────────────────────────────────────┘

Known Faces Page (faces.html)

Grid of known people with sample face images
Count of reference images per person
Simple display (no add/remove functionality for now)

┌──────────────────────────────────────────────────────────────┐
│  Known Faces (5 people)                                      │
│  ────────────────────────────────────────────────────────────│
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │   [face]    │  │   [face]    │  │   [face]    │          │
│  │  John Doe   │  │ Jane Smith  │  │ Junior Doe  │          │
│  │  3 refs     │  │  2 refs     │  │  1 ref      │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
└──────────────────────────────────────────────────────────────┘

Relationships Page (relationships.html)

Visual graph of relationships (using a simple JS graph library like vis.js or cytoscape.js)
List view of defined groups
Shows relationship types (spouse, parent, sibling, etc.)

┌──────────────────────────────────────────────────────────────┐
│  Relationship Graph                                          │
│  ────────────────────────────────────────────────────────────│
│                                                              │
│            ┌──────────┐                                      │
│            │ John Doe │                                      │
│            └────┬─────┘                                      │
│         spouse  │                                            │
│            ┌────┴─────┐                                      │
│            │Jane Smith│                                      │
│            └────┬─────┘                                      │
│          parent │                                            │
│            ┌────┴─────┐                                      │
│            │Junior Doe│                                      │
│            └──────────┘                                      │
│                                                              │
│  Groups:                                                     │
│  • The Doe Family: John Doe, Jane Smith, Junior Doe          │
└──────────────────────────────────────────────────────────────┘

Data Flow

Preview Workflow

1. User enters SmugMug URL → clicks "Preview"
   │
2. POST /api/preview
   │
3. Backend:
   ├── Parse URL → extract album key
   ├── Fetch album info from SmugMug
   ├── Return job_id immediately
   │
4. Frontend connects to GET /api/preview/status?job_id=xxx (SSE)
   │
5. Backend processes each image (dry_run=True):
   │   For each image:
   │   ├── Download thumbnail/medium image
   │   ├── Extract EXIF location
   │   ├── Detect/identify faces
   │   ├── Generate caption & tags via vision model
   │   ├── Format proposed metadata
   │   ├── Store result in memory (job results dict)
   │   └── Send SSE progress event
   │
6. Frontend receives SSE events → updates progress bar
   │
7. Processing complete → SSE "complete" event
   │
8. Frontend calls GET /api/preview/results
   │
9. Frontend renders image grid with diff views

Commit Workflow

1. User reviews preview → clicks "Commit All Changes"
   │
2. POST /api/commit {job_id: "xxx"}
   │
3. Backend:
   │   For each processed image in job results:
   │   ├── Call SmugMug PATCH API with proposed metadata
   │   └── Track success/failure
   │
4. Return commit results
   │
5. Frontend shows success message with statistics

State Management

The backend maintains in-memory state for active preview jobs:

# In-memory job storage (simple dict for localhost use)
preview_jobs: Dict[str, PreviewJob] = {}

@dataclass
class PreviewJob:
    job_id: str
    album_key: str
    album_name: str
    status: str  # "processing", "complete", "error"
    total_images: int
    current_image: int
    results: List[ImagePreviewResult]
    created_at: datetime
    
@dataclass
class ImagePreviewResult:
    image_key: str
    filename: str
    thumbnail_url: str
    web_uri: str
    status: str  # "processed", "skipped", "error"
    current_caption: Optional[str]
    current_keywords: List[str]
    proposed_caption: Optional[str]
    proposed_keywords: List[str]
    faces_detected: List[str]
    location: Optional[str]
    error: Optional[str]

CLI Integration

Add a new command to start the web server:

# Start the web UI server
smugvision-web

# Or with options
smugvision-web --port 5050 --debug

This will be a new console script entry point in pyproject.toml.

Implementation Plan

Phase 1: Core Backend & Basic UI

Create Flask app structure with routes
Implement /api/preview endpoint (leverages existing ImageProcessor with dry_run=True)
Implement SSE progress streaming
Implement /api/preview/results endpoint
Create basic HTML templates with URL input and progress display
Implement thumbnail proxying

Phase 2: Preview Display

Build image card component with diff view
Implement infinite scroll for results
Style processed/skipped/error states
Add faces and location display to cards

Phase 3: Commit Flow

Implement /api/commit endpoint
Add commit button with confirmation
Show commit results/statistics

Phase 4: Faces & Relationships Pages

Implement /api/faces endpoint
Build faces gallery page
Implement /api/relationships endpoint
Build relationship graph visualization (using vis.js or similar)

Dependencies (New)

# Add to requirements.txt
flask>=3.0.0

No heavy frontend framework needed - vanilla JS with fetch API and EventSource for SSE.

Security Considerations

Localhost only: Server binds to 127.0.0.1, not 0.0.0.0
No authentication: Assumes trusted local environment
No persistent storage: Job data is in-memory only, cleared on restart
SmugMug credentials: Read from existing config.yaml, never exposed via API

Future Enhancements (Not in Initial Scope)

Multiple gallery processing queue
Gallery browser (tree view of SmugMug folders)
Selective commit (checkbox per image)
Edit proposed metadata before commit
Reference face management (add/remove)
Cache management UI
Processing history/logs view
Dark mode

Contributing

This is currently a personal project. Contribution guidelines will be added once the MVP is complete.

License

To be determined (likely MIT or Apache 2.0)

Contact & Support

Developer: Eric
Repository: https://github.com/yourusername/smugvision (update when created)
Issues: Use GitHub Issues for bug reports and feature requests

Document History:

v1.0 (2024-11-23): Initial design document created

FilesExpand file tree

DESIGN.md

Latest commit

History

DESIGN.md

File metadata and controls

smugVision Design Document

Project Overview

Core Goals

Technical Stack

Primary Components

Model Selection Rationale

Alternative Models (Future Consideration)

Architecture

High-Level Design

Module Structure

Configuration File Structure

Key Features & Implementation Details

1. Marker Tag System

2. EXIF-Aware Captions

2.1 Custom Location Reference (✅ IMPLEMENTED)

3. Metadata Preservation

4. Image Cache Management

5. Error Handling

6. Processing Scope

Command-Line Interface

Basic Usage

CLI Arguments

Coding Practices

Code Style

Example Code Style

Error Handling Pattern

Testing Strategy

Dependency Management

Git Practices

Logging Standards

Logging Example

Log Output Format

Future Enhancements

Phase 2 (Near-term)

Phase 3 (Medium-term)

Phase 4 (Long-term)

Development Roadmap

Version 0.1.0 (MVP) - Mostly Complete

Version 0.2.0 - Complete ✓

Version 0.3.0 - Complete ✓

Version 1.0.0

Completed Features (Beyond Original Roadmap)

Installation & Setup

Prerequisites

Installation Steps

SmugMug API Integration

Authentication Flow

Key API Endpoints

Rate Limiting

Data Models

Gallery Model

Image Model

Metadata Result Model

Current Status & Next Steps

What's Complete (Ready for Integration)

✅ Processing Module - COMPLETE

Recommended Next Steps

Design Considerations for Processor

Questions for Future Consideration

Web UI Design (Phase 3 Feature)

Overview

Goals

Technical Stack

Architecture

Module Structure

API Endpoints

POST /api/preview

GET /api/preview/status?job_id=abc123

GET /api/preview/results?job_id=abc123

POST /api/commit

GET /api/faces

GET /api/relationships

GET /api/thumbnail/<image_key>

GET /api/face-sample/<person_name>