- This Project is soon deprecated and will be replaced by much better architecture and code
A modern, glassmorphic FastAPI application to automate the conversion of HuggingFace models to GGUF format using llama.cpp.
🚀 Don't want to self-host? Use our free hosted service at gguforge.com — no setup required! Just login with HuggingFace and request your conversions.
- HuggingFace OAuth: Guest login with HuggingFace for requesting conversions
- Request System: Users can request model conversions, admins approve/decline with reasons
- Quant Selection: Users can request specific quantization types; admins can modify the selection
- Dashboard: Real-time status of model conversions with detailed logs
- Model Search: Search HuggingFace models with autocomplete
- Auto-Pipeline: Download → Convert (FP16) → Quantize (Selected levels) → Upload
- Configurable Storage: Set custom download path for models (local, external drive, or HF_HOME)
- Async Processing: Non-blocking quantization and concurrent uploads
- Time Tracking: Detailed timing stats for each conversion step
- Version Tracking: Auto-calculated version based on git commits
- Llama.cpp Management: Automatically clones and builds
llama.cpp - Validation: Checks disk space before processing
- Responsive UI: Glassmorphism design with real-time updates
- Job Termination: Admins can terminate any running conversion job mid-process
- Startup Recovery: Automatically detects and recovers stuck jobs after server crashes
- Ticket System: Conversational threads between users and admins for discussing requests
- Discussion Workflow: Instead of instant rejection, admins can start discussions with users
- SQLite: Default, zero-config local database
- MSSQL: Enterprise-grade Microsoft SQL Server support for production deployments
- Python 3.10+
- Git: Required for cloning llama.cpp and version tracking
- Build Tools:
- Windows: Install CMake and ensure
cmakeis in your PATH. You might also need Visual Studio Build Tools. - Linux:
sudo apt install build-essential cmake
- Windows: Install CMake and ensure
-
Clone the repository:
git clone https://github.com/Akicuo/automaticConversion.git cd automaticConversion -
Create a virtual environment (recommended):
python -m venv venv # Windows .\venv\Scripts\activate # Linux/Mac source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.envfile with the required secrets (see Configuration below)
For containerized deployment, use the included Docker configuration.
# 1. Create .env file with your tokens
cat > .env << EOF
HF_TOKEN=hf_your_token_here
OAUTH_CLIENT_ID=your_client_id
OAUTH_CLIENT_SECRET=your_client_secret
OAUTH_REDIRECT_URI=http://localhost:8000/auth/callback
ADMIN_USERS=your_hf_username
EOF
# 2. Build and run
docker-compose up -d
# 3. View logs
docker-compose logs -f
# 4. Check admin credentials (first run)
cat data/creds.txtThe Dockerfile includes:
- Python 3.12 (slim-bookworm base)
- Git - for llama.cpp cloning and app updates
- Vim - for editing config files in container
- Build tools - cmake, g++, make (for compiling llama.cpp)
- Microsoft ODBC Driver 18 - for SQL Server support
| Mount | Purpose |
|---|---|
./data |
Database + credentials (persistent) |
./cache |
Downloaded models (can be very large!) |
./llama.cpp |
Compiled llama.cpp (avoids rebuilding) |
| Variable | Description |
|---|---|
HF_TOKEN |
HuggingFace token for uploads (required) |
OAUTH_CLIENT_ID |
HF OAuth app ID (optional) |
OAUTH_CLIENT_SECRET |
HF OAuth secret (optional) |
OAUTH_REDIRECT_URI |
OAuth callback URL |
ADMIN_USERS |
Comma-separated HF usernames for admin access |
PARALLEL_QUANT_JOBS |
Number of simultaneous quantization jobs (default: 2) |
HOST |
Server bind address (default: 0.0.0.0) |
PORT |
Server port (default: 8000) |
MSSQL_* |
SQL Server connection settings (optional) |
For GPU-accelerated quantization, uncomment the GPU profile in docker-compose.yml:
# Run with GPU support
docker-compose --profile gpu up -dRequires NVIDIA Container Toolkit installed on the host.
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 32 GB |
| Storage | 50 GB | 200+ GB |
| CPU Cores | 4 | 16+ |
Note: Storage requirements depend on model size. A 7B model needs ~30GB during conversion.
# Build the image
docker build -t gguf-forge .
# Run with environment file
docker run -d \
--name gguf-forge \
-p 8000:8000 \
-v ./data:/data \
-v ./cache:/app/.cache \
-v ./llama.cpp:/app/llama.cpp \
--env-file .env \
gguf-forgeCreate a .env file in the project root with the following variables:
# Parallel Quantization Jobs (default: 2)
# How many quantizations to run simultaneously. Each job gets total_cores / jobs threads.
PARALLEL_QUANT_JOBS=2
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx- Purpose: Upload converted models to HuggingFace
- How to get:
- Go to HuggingFace Settings → Access Tokens
- Create a new token with Write permissions
- Copy the token (starts with
hf_)
- Note: Without this, models will be quantized but not uploaded
OAUTH_CLIENT_ID=your_oauth_client_id
OAUTH_CLIENT_SECRET=your_oauth_client_secret
OAUTH_REDIRECT_URI=http://localhost:8000/auth/callback-
Purpose: Enable HuggingFace OAuth login for guests to request conversions
-
How to get:
- Go to HuggingFace Settings → Connected Applications
- Click "Create a new OAuth app"
- Fill in:
- Application name: GGUF Forge (or your choice)
- Homepage URL:
http://localhost:8000(or your domain) - Redirect URI:
http://localhost:8000/auth/callback(or your domain +/auth/callback) - Scopes: Select
openid,profile,email
- Copy the Client ID and Client Secret
-
Note: For production, update
OAUTH_REDIRECT_URIto your actual domain
# HuggingFace Token (Required for uploads)
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# OAuth Configuration (Required for guest login)
OAUTH_CLIENT_ID=your_oauth_client_id_here
OAUTH_CLIENT_SECRET=your_oauth_client_secret_here
OAUTH_REDIRECT_URI=http://localhost:8000/auth/callback
# Admin Users (HuggingFace usernames, comma-separated)
ADMIN_USERS=Akicuo,ungsungInstead of using a password-based admin account, admins are configured via the ADMIN_USERS environment variable:
# Comma-separated list of HuggingFace usernames
ADMIN_USERS=Akicuo,ungsung,anotheruserWhen these users login via HuggingFace OAuth, they automatically get admin privileges. This means:
- No need to remember separate admin credentials
- Easy to add/remove admins by updating
.envand restarting - Role is updated on each login (so changes take effect immediately)
You can also set these environment variables (they have defaults):
PORT: Server port (default: 8000)HOST: Server host (default: 0.0.0.0)LLAMA_CPP_DIR: Path to llama.cpp directory (default:./llama.cpp)- Can be relative (e.g.,
../llama.cpp,custom/llama.cpp) or absolute (e.g.,/opt/llama.cpp,C:\tools\llama.cpp) - Relative paths are resolved from the project root directory
- Useful if you want to share a single llama.cpp installation across multiple projects
- Example:
LLAMA_CPP_DIR=/opt/llama.cpporLLAMA_CPP_DIR=../shared/llama.cpp
- Can be relative (e.g.,
PARALLEL_QUANT_JOBS: Number of simultaneous quantization jobs (default: 2)1= Sequential (slowest, safest for RAM)2= Default (balanced)4= Aggressive (requires more RAM, faster on high-core systems)- Note: CPU cores are divided between parallel jobs. A 16-core system with 2 parallel jobs will give 8 cores to each.
DB_POOL_MIN: Minimum MSSQL connection pool size (default: 2)DB_POOL_MAX: Maximum MSSQL connection pool size (default: 10)MSSQL_CONN_TIMEOUT: MSSQL connection timeout in seconds (default: 60)MSSQL_LOGIN_TIMEOUT: MSSQL login timeout in seconds (default: 60)
Configure where models are downloaded and processed:
# Model Download Path (optional)
# Supports absolute paths (D:\Models, /mnt/external) or relative paths (./models)
MODEL_DOWNLOAD_PATH=/path/to/large/storage
# Alternatively, HuggingFace's HF_HOME is used as fallback
HF_HOME=/path/to/hf/cachePriority Order: MODEL_DOWNLOAD_PATH > HF_HOME > default .cache folder
This is useful when:
- You have limited space on your system drive
- You want to store models on an external drive or network storage
- You want to share a cache with other HuggingFace applications
GGUF Forge supports two database backends: SQLite (default) and MSSQL.
No configuration needed. A local gguf_app.db file is created automatically.
To use MSSQL instead of SQLite, add these variables to your .env:
# Database type: "sqlite" (default) or "mssql"
DB_TYPE=mssql
# MSSQL Configuration
MSSQL_HOST=your-server.database.net
MSSQL_PORT=1433
MSSQL_DATABASE=gguforge
MSSQL_USER=your_username
MSSQL_PASSWORD=your_password
MSSQL_ENCRYPT=yes
MSSQL_TRUST_CERT=yesPrerequisites for MSSQL:
-
Install the ODBC Driver for SQL Server:
- Windows: Download from Microsoft
- Linux (Ubuntu/Debian):
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add - curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list sudo apt-get update sudo ACCEPT_EULA=Y apt-get install -y msodbcsql17
- WSL (Windows Subsystem for Linux):
# First install unixODBC (required for pyodbc) sudo apt-get update sudo apt-get install -y unixodbc unixodbc-dev # Add Microsoft repository curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add - curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list # Install ODBC Driver 17 sudo apt-get update sudo ACCEPT_EULA=Y apt-get install -y msodbcsql17 # Verify installation odbcinst -q -d -n "ODBC Driver 17 for SQL Server" # Reinstall pyodbc (may be needed after installing ODBC) pip install --force-reinstall pyodbc
- macOS:
# Install Homebrew if not installed /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install unixODBC and ODBC Driver brew install unixodbc brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release brew update HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql17
-
Install pyodbc (already in requirements.txt):
pip install pyodbc
💡 Tip: If you don't need MSSQL, just use SQLite! Set
DB_TYPE=sqlitein your.envfile (or remove the line entirely) - no additional dependencies required.
Testing Connection: After configuration, verify with:
curl http://localhost:8000/api/healthResponse shows database status:
{
"status": "healthy",
"database": {
"type": "mssql",
"connected": true,
"message": "Successfully connected to mssql database"
}
}-
Start the app:
python app_gguf.py
-
First Run: Check the console output! It will print your Admin Username and Password. These credentials are also saved to
creds.txt.
- Can browse the dashboard and view active conversions
- Can request model conversions (requires HuggingFace OAuth login)
- Can view their request history with status and decline reasons
- All guest permissions
- Can submit conversion requests
- View request status (pending/approved/rejected)
- See decline reasons if requests are rejected
- All user permissions
- Can directly start model conversions
- View and manage all pending requests
- Approve or decline requests with optional reasons
- Access to full system controls
- Login at
/loginwith admin credentials - Search for a model (e.g.,
TinyLlama/TinyLlama-1.1B-Chat-v1.0) - Click Start to begin conversion immediately
- Click Login and choose "Continue with HuggingFace"
- Authorize the application
- Search for a model
- Click Request to submit a conversion request
- Optional: Select specific quantization types you want (or leave empty for all)
- Wait for admin approval
- Check "My Requests" section for status updates
Users can request specific quantization types when submitting a conversion request:
Available Quant Types:
| Type | Description |
|---|---|
| Q2_K | 2-bit (smallest, lowest quality) |
| Q3_K_S/M/L | 3-bit small/medium/large |
| Q4_0 | 4-bit legacy |
| Q4_K_S | 4-bit small (recommended for low memory) |
| Q4_K_M | 4-bit medium (good balance) |
| Q5_0 | 5-bit legacy |
| Q5_K_S/M | 5-bit small/medium (good quality) |
| Q6_K | 6-bit (high quality) |
| Q8_0 | 8-bit (highest quality, largest size) |
Admin Quant Modification: When approving a request, admins can:
- Use the user's requested quants as-is
- Add additional quant types
- Remove unnecessary quant types
- Override with a completely different selection
If no quants are specified (by user or admin), all available quants are generated.
Each conversion goes through these steps:
- Setup (10%): Clone and build llama.cpp
- Download (30%): Download model from HuggingFace
- Convert (50%): Convert to FP16 GGUF format
- Quantize (80%): Create all quantization levels in parallel (Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0)
- Upload (90%): Concurrent upload of all quants to HuggingFace
- README (100%): Generate and upload README with stats
Each conversion tracks:
- Job ID: Unique identifier for the conversion
- Version: GGUF Forge version (auto-calculated from git commits)
- Total Time: Complete pipeline duration
- Avg Time per Quant: Average quantization time
- Step Breakdown: Individual timing for each step
- The first conversion will take longer as it clones and builds
llama.cpp - Ensure you have enough disk space. A 7B model requires ~15GB for download + ~15GB for FP16 + ~5-10GB per quant
- All conversions run asynchronously and won't block the server
- Quantizations run in parallel (configurable via
PARALLEL_QUANT_JOBS) with CPU cores distributed between jobs - Each job performs an immediate upload-and-delete after creation to minimize disk space usage
- Files are automatically cleaned up after successful upload
- Verify
OAUTH_CLIENT_IDandOAUTH_CLIENT_SECRETare correct - Check that the redirect URI in HuggingFace matches
OAUTH_REDIRECT_URIin.env - Ensure the OAuth app has the correct scopes (
openid,profile,email)
- Verify
HF_TOKENis set and has Write permissions - Check the token hasn't expired
- Ensure you have permission to create repositories on HuggingFace
- Windows: Install Visual Studio Build Tools and CMake
- Linux: Install
build-essentialandcmake - Check that
gitis installed and in PATH
- Ensure ODBC Driver 17 for SQL Server is installed
- Verify server allows remote connections
- Check firewall rules for the MSSQL port
- Test connection with:
curl http://localhost:8000/api/health
High Parallel Quant Jobs
If you set PARALLEL_QUANT_JOBS too high, the application may crash or conversions may fail due to resource exhaustion.
Symptoms:
- Application shuts down unexpectedly during quantization
- Conversions fail with "out of memory" or "killed" errors
- System becomes unresponsive during conversion
Solution:
Set PARALLEL_QUANT_JOBS=1 or PARALLEL_QUANT_JOBS=2 in your .env file:
# Recommended: 1-2 parallel jobs for stable operation
PARALLEL_QUANT_JOBS=2Guidelines:
- 1 job: Safest option, uses all CPU cores for a single quantization
- 2 jobs: Balanced option, splits CPU cores between 2 parallel quantizations
- 4+ jobs: Only use on high-memory systems (32GB+ RAM) with many CPU cores (16+)
Each parallel job consumes significant memory for the model and quantization process. Too many parallel jobs can exceed available RAM and cause the application to be terminated by the operating system.
Debugging Tips:
- Check browser console (F12) for JavaScript errors - logs now show detailed conversion progress
- Check server logs for Python errors during conversion
- Monitor system resources: RAM usage, CPU load, disk space during conversion
- Try with
PARALLEL_QUANT_JOBS=1to isolate issues
Admins can update the application directly from the dashboard:
- An Update App button appears in the top-right corner
- The button automatically checks for available updates
- When an update is available, the button shows the commit count
- Click to pull the latest changes and restart the server
Manual Update Scripts:
# Windows PowerShell
.\update.ps1
# Linux/macOS
chmod +x update.sh # First time only
./update.shNote: Updates will interrupt any running conversion jobs.
Admins can terminate any running conversion job:
- Find the job in the "Conversion Queue" section
- Click the orange Terminate button
- The job will stop and be marked as "TERMINATED"
Jobs terminated mid-process can be restarted by re-approving the original request.
Instead of instantly rejecting requests, admins can start discussions:
- Starting a Discussion: Click "Discuss" on a pending request
- Admin Dashboard: View all active discussions in the "Active Discussions" section
- Conversation: Both admin and user can exchange messages
- Resolution: Admin can either:
- Approve: Start the conversion
- Close: Reject the request
Users see their discussions in "My Requests" with a "View Thread" button.
If the server crashes during a conversion:
- On restart, stuck jobs are automatically marked as "INTERRUPTED"
- Associated requests are reset to "PENDING" for re-approval
- No manual intervention required
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check with database status |
/api/hf/search?q=query |
GET | Search HuggingFace models |
/api/status/all |
GET | Get all conversion jobs |
| Endpoint | Method | Description |
|---|---|---|
/api/requests/submit |
POST | Submit a conversion request (with optional quants) |
/api/requests/my |
GET | Get user's requests |
/api/quants |
GET | Get available quantization types |
/api/tickets/my |
GET | Get user's tickets |
/api/tickets/{id}/messages |
GET | Get ticket messages |
/api/tickets/{id}/reply |
POST | Reply to a ticket |
/api/tickets/{id}/reopen |
POST | Reopen a closed ticket |
| Endpoint | Method | Description |
|---|---|---|
/api/models/process |
POST | Start a conversion directly (with optional quants) |
/api/models/{id}/terminate |
POST | Terminate a running job |
/api/models/{id}/continue |
POST | Resume an interrupted job |
/api/requests/all |
GET | View all requests |
/api/requests/{id}/approve |
POST | Approve a request (with optional quant override) |
/api/requests/{id}/reject |
POST | Reject a request |
/api/tickets/create |
POST | Create a discussion ticket |
/api/tickets/all |
GET | View all tickets |
/api/tickets/{id}/approve |
POST | Approve from ticket (with optional quants) |
/api/tickets/{id}/close |
POST | Close/reject a ticket |
/api/admin/db-info |
GET | Database information |
/api/admin/check-update |
GET | Check for available app updates |
/api/admin/update-app |
POST | Pull updates and restart server |
automaticConversion/
├── app_gguf.py # Main FastAPI application
├── database.py # Database abstraction (SQLite + MSSQL)
├── security.py # Rate limiting, bot detection, spam protection
├── models.py # Pydantic request/response models
├── managers.py # LlamaCpp and HuggingFace managers
├── workflow.py # Model conversion pipeline
├── websocket_manager.py # Real-time WebSocket updates
├── routes/
│ ├── auth.py # Authentication (admin login + OAuth)
│ ├── models.py # Model processing endpoints
│ ├── requests.py # Request system endpoints
│ └── tickets.py # Ticket/discussion endpoints
├── static/
│ ├── js/
│ │ └── tickets.js # Ticket modal JavaScript
│ └── favicon.png
├── templates/
│ ├── base.html # Base template
│ ├── index.html # Main dashboard
│ └── login.html # Admin login page
├── Dockerfile # Docker image definition
├── docker-compose.yml # Docker Compose configuration
├── update.ps1 # Update script (Windows PowerShell)
├── update.sh # Update script (Linux/macOS)
├── .env.example # Environment template
└── requirements.txt # Python dependencies
MIT
- llama.cpp - GGUF quantization
- HuggingFace - Model hosting and OAuth
- FastAPI - Web framework
- pyodbc - MSSQL connectivity
