A modern, open-source dataset management tool for AI/ML teams building vision models. Organize, label, and review image datasets with an intuitive Airtable-inspired interface.
Manage all your image datasets in one place with pagination and statistics.
View and manage all rows with images, custom fields, and status tracking.
Navigate through pending images with an intuitive review interface.
Create custom schemas with flexible field types (boolean, text, numeric, enum).
AITrace Datasets is a self-hosted platform designed for managing image datasets used in AI/ML projects. Think of it as Airtable meets Label Studio - you get the flexibility of custom schemas with the power of image annotation workflows.
- AI/ML Teams building computer vision models who need to organize training data
- Data Labeling Teams who want a self-hosted, customizable annotation tool
- Research Teams managing datasets across multiple projects
- Anyone tired of managing image datasets in spreadsheets or cloud tools they don't control
β Self-hosted - Your data stays on your infrastructure β Flexible Schemas - Define custom fields for your specific use case β Review Workflows - Built-in queue system for data validation β Bulk Operations - CSV import/export for efficient data management β Team Collaboration - Role-based access control (admin/user) β 100% Open Source - MIT licensed, no vendor lock-in
Define your own data structure with multiple field types:
- Boolean: Yes/No flags (e.g., "contains person", "is valid")
- Text: Short or long text fields (e.g., descriptions, captions)
- Numeric: Integer or decimal numbers (e.g., object count, confidence score)
- Enum: Dropdown selections (e.g., category, quality rating)
- Add images via URL with one-click preview
- View images in full-screen modal
- Organize images with your custom metadata
- Track pending vs reviewed status for each row
Navigate through pending images with keyboard-friendly controls:
- Approve & Next - Mark as reviewed and move forward
- Skip - Keep as pending and continue
- Delete & Next - Remove bad data instantly
- Real-time progress tracking with visual progress bar
- CSV Import: Upload hundreds of rows at once with flexible column mapping
- CSV Export: Download reviewed or all rows for model training
- Bulk Status Update: Change multiple rows from pending to reviewed
- Bulk Delete: Clean up invalid data efficiently
- Admin Users: Full control over datasets, schemas, and team members
- Regular Users: Can view and edit assigned datasets
- Secure Authentication: JWT-based auth with password hashing
- Team Isolation: Each team's data is completely separated
- Airtable-inspired table views with pagination
- Blue color scheme with intuitive navigation
- Responsive design that works on desktop and tablet
- Fast, reactive interface built with Vue 3
- Docker & Docker Compose (recommended) OR
- Python 3.11+ and PostgreSQL (manual setup)
- Node.js 20+ (only for local frontend development)
Just run this command and everything will be built and started:
# Clone the repository
git clone https://github.com/yourusername/aitrace-datasets.git
cd aitrace-datasets
# Start everything - builds app, starts database + application
docker-compose up
# Or run in background
docker-compose up -d
# Check logs
docker-compose logs -f appThat's it! Open http://localhost:8000 in your browser.
The first run will take a few minutes as it builds the Docker image (compiles frontend + backend). Subsequent starts are much faster.
β οΈ Note: This uses an insecure default SECRET_KEY suitable for local testing only. For production, see Option 2 below.
Use this for production deployments with a secure secret key:
# Generate a secure secret key
export SECRET_KEY=$(openssl rand -hex 32)
# Start with production configuration
docker-compose -f docker-compose.prod.yml up -d
# Check logs
docker-compose -f docker-compose.prod.yml logs -f appAccess at http://localhost:8000
Use this if you want to develop the application with instant hot reload for frontend changes:
# 1. Start only the database
docker-compose up -d postgres
# 2. Install backend dependencies (requires uv: pip install uv)
uv sync
# 3. Create environment file
cp .env.example .env
# Edit .env and make sure ENV=local (important!)
nano .env
# 4. Start backend (http://localhost:8000)
uv run python src/aitrace/run_local.py
# 5. In a new terminal, start frontend (http://localhost:3000)
cd frontend
npm install
npm run devNow open http://localhost:3000 in your browser. Changes to frontend code will hot-reload automatically.
AITrace Datasets runs in two different modes depending on the ENV variable:
How it works:
- Frontend is built into static files during Docker build
- Static files are embedded in the backend at
/src/aitrace/static/ - FastAPI serves both API (
/api/*) and frontend (all other routes) - Single application running on one port
When to use:
- Production deployments
- Staging environments
- Single-container deployments
- When you want a simple, all-in-one package
Access:
- Application: http://localhost:8000
- API Docs: http://localhost:8000/api/docs
How it works:
- Backend runs standalone with CORS enabled
- Frontend runs on separate dev server with hot reload
- Frontend proxies API requests to backend
- Two processes running on different ports
When to use:
- Local development
- Frontend changes with instant hot reload
- Backend development without rebuilding frontend
- Testing and debugging
Access:
- Frontend: http://localhost:3000 (with hot reload)
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/api/docs
# .env file
# For local development (separate frontend/backend)
ENV=local
# For production (embedded frontend)
ENV=production
# or simply don't set ENV (defaults to production)After starting the application for the first time:
-
Navigate to Setup Page
- Go to http://localhost:8000/setup (or :3000 in dev mode)
- You'll be automatically redirected if setup is needed
-
Create Admin Account
- Enter your email (e.g., admin@yourcompany.com)
- Create a strong password (min 8 characters)
- A default team is created automatically
-
Login
- Use your credentials to login
- You'll be directed to the Datasets page
-
Create Your First Schema
- Go to Schemas β New Schema
- Define your fields (e.g., "contains_person: boolean", "description: text")
- Save the schema
-
Create Your First Dataset
- Go to Datasets β New Dataset
- Select your schema
- Add a name and description
-
Add Images
- Open your dataset
- Click "Add Row"
- Paste image URL and fill in your custom fields
- Or use "Bulk CSV Upload" for multiple images
-
Review Your Data
- Switch to the "Queue" tab
- Review pending images one by one
- Approve, skip, or delete as needed
- FastAPI - High-performance async Python web framework
- PostgreSQL - Robust relational database
- SQLAlchemy 2.0 - Async ORM with type hints
- Pydantic v2 - Data validation and settings management
- JWT - Secure token-based authentication
- Asyncpg - Fast async PostgreSQL driver
- Vue 3 - Progressive JavaScript framework with Composition API
- Vite - Lightning-fast build tool
- Pinia - Intuitive state management
- Tailwind CSS - Utility-first CSS framework
- TypeScript - Type-safe JavaScript
- Vue Router - Client-side routing
- Docker - Containerized application
- Multi-stage build - Optimized production images
- Static file serving - Frontend embedded in backend
aitrace-datasets/
βββ src/aitrace/ # Backend Python application
β βββ common/ # Shared utilities (settings, database, exceptions)
β βββ models/ # SQLAlchemy models & Pydantic schemas
β βββ repositories/ # Data access layer (database queries)
β βββ services/ # Business logic
β βββ routes/ # API endpoints (FastAPI routers)
β βββ main.py # FastAPI app initialization
β βββ run_local.py # Local development runner
β βββ static/ # Frontend build output (production only)
β
βββ frontend/ # Vue 3 frontend application
β βββ src/
β β βββ components/ # Reusable Vue components
β β βββ pages/ # Page components
β β βββ stores/ # Pinia state management
β β βββ services/ # API client services
β β βββ router/ # Vue Router configuration
β β βββ types/ # TypeScript type definitions
β βββ package.json
β βββ vite.config.ts
β
βββ database/
β βββ schema.sql # PostgreSQL database schema
β
βββ deployment/ # Production deployment scripts
β βββ scripts/
β β βββ deploy.sh # Cloud Run deployment
β β βββ setup_infra.sh # Infrastructure setup
β βββ env-vars.yaml # Environment configuration
β
βββ tests/ # Backend tests
βββ Dockerfile # Production Docker image
βββ docker-compose.yml # Local development
βββ docker-compose.prod.yml # Production deployment
βββ pyproject.toml # Python dependencies
βββ .env.example # Environment variables template
| Variable | Required | Default | Description |
|---|---|---|---|
ENV |
No | production |
Set to local for development (enables CORS, disables static serving) |
POSTGRES_CONNECTION_MODE |
No | direct |
Connection mode: direct or cloud_sql |
POSTGRES_HOST |
Yes | - | PostgreSQL host (e.g., localhost, postgres) |
POSTGRES_PORT |
No | 5432 |
PostgreSQL port |
POSTGRES_USER |
Yes | - | PostgreSQL username |
POSTGRES_PASSWORD |
Yes | - | PostgreSQL password |
POSTGRES_DB |
Yes | - | PostgreSQL database name |
SECRET_KEY |
Yes | - | JWT signing key (min 32 characters, use openssl rand -hex 32) |
LOG_LEVEL |
No | INFO |
Logging verbosity (DEBUG, INFO, WARNING, ERROR) |
# Local Development
ENV=local
# Database
POSTGRES_CONNECTION_MODE=direct
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=aitrace
# Security
SECRET_KEY=your-super-secret-key-at-least-32-characters-long-generated-with-openssl
# Logging
LOG_LEVEL=DEBUG
# Production (don't set ENV or set to 'production')
# ENV=production
# POSTGRES_HOST=your-db-host
# POSTGRES_USER=produser
# POSTGRES_PASSWORD=<secure-password>
# POSTGRES_DB=aitrace
# SECRET_KEY=<your-production-secret>
# LOG_LEVEL=INFO# Production with embedded frontend
docker-compose -f docker-compose.prod.yml up -d
# Access at http://localhost:8000# Build the image
docker build -t aitrace-datasets .
# Run with your database
docker run -d \
-p 8000:8000 \
-e POSTGRES_HOST=your-db-host \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=yourpassword \
-e POSTGRES_DB=aitrace \
-e SECRET_KEY=$(openssl rand -hex 32) \
aitrace-datasetsSee deployment/README.md for complete Cloud Run deployment guide.
# Quick deploy
cd deployment/scripts
./deploy.sh# Install dependencies
uv sync
# Set environment variables
export ENV=production
export POSTGRES_HOST=your-db-host
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=yourpassword
export POSTGRES_DB=aitrace
export SECRET_KEY=$(openssl rand -hex 32)
# Build frontend
cd frontend
npm install
npm run build
cd ..
# Copy frontend build to backend static folder
mkdir -p src/aitrace/static
cp -r frontend/dist/* src/aitrace/static/
# Run with uvicorn
uvicorn aitrace.main:app --host 0.0.0.0 --port 8000# Install dependencies
uv sync
# Start backend (with auto-reload)
uv run python src/aitrace/run_local.py
# Run tests
uv run pytest
# Format code
uv run black src/ tests/
uv run isort src/ tests/
# Type checking
uv run mypy src/cd frontend
# Install dependencies
npm install
# Start dev server (hot reload)
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview
# Lint and format
npm run lintCurrently using raw SQL schema. For changes:
- Edit
database/schema.sql - Drop and recreate database (development only!)
- Production: Write manual migration SQL
Future: Will add Alembic for proper migrations.
When running, visit http://localhost:8000/api/docs for interactive API documentation (Swagger UI).
Authentication
POST /api/v1/auth/login- Login with email/passwordGET /api/v1/auth/me- Get current user infoPUT /api/v1/auth/password- Change password
Schemas
GET /api/v1/schemas- List all schemasPOST /api/v1/schemas- Create new schemaPUT /api/v1/schemas/{id}- Update schemaDELETE /api/v1/schemas/{id}- Delete schema
Datasets
GET /api/v1/datasets- List datasets (paginated)POST /api/v1/datasets- Create datasetGET /api/v1/datasets/{id}- Get dataset detailsPUT /api/v1/datasets/{id}- Update datasetDELETE /api/v1/datasets/{id}- Delete dataset
Rows (Images + Data)
GET /api/v1/datasets/{id}/rows- List rows (with filters)GET /api/v1/datasets/{id}/rows/queue- Get review queuePOST /api/v1/datasets/{id}/rows- Add single rowPUT /api/v1/datasets/{id}/rows/{rowId}- Update rowDELETE /api/v1/datasets/{id}/rows/{rowId}- Delete rowPOST /api/v1/datasets/{id}/rows/import- CSV bulk importGET /api/v1/datasets/{id}/rows/export- CSV export
We love contributions! Whether it's bug reports, feature requests, or code contributions.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests (
uv run pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to your fork (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
- Follow existing code style (Black for Python, Prettier for TypeScript)
- Add tests for new features
- Update documentation
- Keep commits focused and atomic
- Write clear commit messages
- AI Dataset Creation - Describe your dataset needs in natural language and automatically fetch images from Google Images API
- Auto-labeling Agent - Train custom AI agents to automatically label images based on your schema and existing labeled data
- Smart Suggestions - AI-powered field suggestions while labeling
- Similarity Search - Find similar images in your dataset using embeddings
- Multi-team support - Multiple isolated teams in one instance
- User invitations - Email invites with team quotas
- Email verification - Verify user emails on signup
- 2FA authentication - TOTP-based two-factor auth
- Social login - Google, GitHub OAuth
- Image proxy/caching - Cache external images locally
- Advanced export formats - JSON, Parquet, COCO format
- Advanced search - Full-text search, filters, saved queries
- API keys - Programmatic access without passwords
- Webhooks - Real-time notifications for events
- Keyboard shortcuts - Navigate review queue with keyboard
- Mobile app - React Native mobile companion
- Custom schema builder
- Dataset management
- Review queue with keyboard navigation
- CSV import/export
- Bulk operations
- Team collaboration
- Role-based access control
- Docker deployment
- Full-screen image viewer
This project is fully open source and licensed under the MIT License.
See LICENSE for details.
What this means:
- β Use commercially
- β Modify the code
- β Distribute
- β Use privately
- β No warranty provided
- Documentation: You're reading it! Check other docs in
implementation/folder - GitHub Issues: Report bugs or request features
- GitHub Discussions: Ask questions and share ideas
Q: Can I use this for commercial projects? A: Yes! MIT license allows commercial use.
Q: Is my data secure? A: Yes, it's self-hosted on your infrastructure. Data never leaves your servers.
Q: Can I customize the UI? A: Absolutely! Fork the repo and modify as needed. It's open source.
Q: What image formats are supported? A: Any format your browser can display (JPEG, PNG, GIF, WebP). Images are loaded via URL.
Q: How do I backup my data?
A: Use pg_dump on your PostgreSQL database. Images are stored as URLs (not files).
# Backup database
docker exec -i $(docker-compose ps -q postgres) pg_dump -U postgres aitrace > backup.sql
# Restore database
docker exec -i $(docker-compose ps -q postgres) psql -U postgres -d aitrace < backup.sqlQ: Can I host this on Heroku/Railway/Render? A: Yes! Any platform that supports Docker will work.
Built with love by Bluggie SG PTE LTD
Special thanks to the open source community and all contributors!
- FastAPI - Modern Python web framework
- Vue.js - Progressive JavaScript framework
- PostgreSQL - Powerful open source database
- Tailwind CSS - Utility-first CSS framework
- Vite - Next generation frontend tooling
β Star us on GitHub if you find this useful!



