This document provides guidelines for agents working on GGUF Forge, a FastAPI application that converts HuggingFace models to GGUF format.
Running the Application
python app_gguf.pyDocker (Recommended for Production)
docker-compose up -d
docker-compose logs -fInstallation
pip install -r requirements.txtDatabase Health Check
curl http://localhost:8000/api/healthTesting No automated test suite exists. Manual testing via the web UI at http://localhost:8000 is recommended.
- Python 3.10+
- FastAPI for web framework
- Async/await patterns for all I/O operations
- Type hints using
from typing import Optional, List, Dict, Any
# Standard library imports first
import os
import asyncio
from pathlib import Path
# Third-party imports
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
# Local imports
from database import get_db_connection
from models import ProcessRequest- Variables/Functions:
snake_case(e.g.,get_db_connection,model_id) - Classes:
PascalCase(e.g.,ModelWorkflow,DatabaseRow) - Constants:
UPPER_CASE(e.g.,QUANTS,PARALLEL_QUANT_JOBS) - Private methods:
_prefix(e.g.,_adapt_params,_reconnect)
Always use async database operations:
conn = await get_db_connection()
await conn.execute("SELECT * FROM models WHERE id = ?", (model_id,))
model = await conn.fetchone()
await conn.commit()
await conn.close()Important: Use ? placeholders for all SQL queries to prevent injection. The database abstraction automatically handles parameter binding for both SQLite and MSSQL.
try:
conn = await get_db_connection()
await conn.execute("SELECT * FROM models WHERE id = ?", (model_id,))
model = await conn.fetchone()
except Exception as e:
logger.error(f"Failed to fetch model: {e}")
raise HTTPException(status_code=404, detail="Model not found")
finally:
await conn.close()Use logger for logging errors: logger.error() for errors, logger.warning() for warnings, logger.info() for info.
Define API request/response models in models.py:
class ProcessRequest(BaseModel):
model_id: str
quants: Optional[List[str]] = None- Organize routes in
routes/directory by functionality - Use
APIRouter(prefix="/api")for endpoints - Configure dependencies via module-level
configure()function - Example pattern from
routes/models.py:
router = APIRouter(prefix="/api")
def configure(admin_dependency):
global _require_admin_func
_require_admin_func = admin_dependency
@router.post("/models/process")
async def process_model(req: ProcessRequest, background_tasks: BackgroundTasks, user=Depends(get_admin)):
# ImplementationUse FastAPI BackgroundTasks for async operations:
@router.post("/models/process")
async def process_model(req: ProcessRequest, background_tasks: BackgroundTasks):
workflow = ModelWorkflow(model_id, hf_repo_id)
background_tasks.add_task(workflow.run_pipeline)
return {"status": "started", "id": model_id}Always access via os.getenv() with defaults:
HF_TOKEN = os.getenv("HF_TOKEN", "")
PARALLEL_QUANT_JOBS = int(os.getenv("PARALLEL_QUANT_JOBS", "2"))Never commit .env files. Use .env.example as template.
This app supports both SQLite and MSSQL. The database.py module abstracts differences. Write queries using SQLite syntax - the adapter converts to MSSQL automatically.
Use websocket_manager.py for real-time updates:
from websocket_manager import manager as ws_manager
await ws_manager.broadcast_model_update(model_id, {"status": "quantizing"})Use pathlib.Path for all file operations:
from pathlib import Path
CACHE_DIR = BASE_DIR / ".cache"
model_dir = CACHE_DIR / "models"
model_dir.mkdir(parents=True, exist_ok=True)import logging
logger = logging.getLogger("GGUF_Forge")
logger.info("Application starting")
logger.error("Failed to connect to database")- Define Pydantic models in
models.py - Create route file in
routes/or add to existing - Configure dependencies in
app_gguf.py - Update WebSocket channels if needed
- Add database migrations using ALTER TABLE with try/except
Follow format: add feature description, fix bug description, refactor: description
- App Entry:
app_gguf.py- FastAPI app with lifespan management - Database:
database.py- Abstract base class with SQLite/MSSQL implementations - Workflow:
workflow.py-ModelWorkflowclass manages conversion pipeline - Managers:
managers.py-LlamaCppManagerandHuggingFaceManager - Security:
security.py- Rate limiting, bot detection, spam protection - Routes:
routes/auth.py,routes/models.py,routes/requests.py,routes/tickets.py
- Async Context: Use
async withfor database connections - Connection Pooling: MSSQL uses
aioodbcpool; SQLite creates new connections - Migration Pattern: Try ALTER TABLE, pass on error (column exists)
- Validation: Use Pydantic for request validation; validate HF repo via
validate_hf_repo_sync() - Parallel Processing: Quantizations run in parallel (configurable via
PARALLEL_QUANT_JOBS)