QLoRA fine-tuning pipeline and lightweight API assistant for SQL/PySpark generation in internal audit scenarios.
- FastAPI endpoints:
/health,/generate,/explain,/refactor - Retrieval over internal
catalog.jsonandexamples.json - Prompt builder for SQL/PySpark/Explain/Refactor
- qLoRA synthetic training pipeline:
generate -> train -> eval -> merge - Tracking integrations:
- W&B (optional)
- MLflow (optional)
UI/Client -> FastAPI -> Retrieval -> Prompt Builder -> LLM
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn app.main:app --host 0.0.0.0 --port 8000Run UI in another terminal:
streamlit run ui/streamlit_app.pycp .env.example .env
docker compose up --buildconda create -n audit-qlora python=3.11 -y
conda activate audit-qlora
pip install torch==2.4.1 transformers==4.45.2 peft==0.13.2 datasets==3.0.1 accelerate==1.0.1 sentencepiece==0.2.0 wandb==0.19.11 mlflow==2.16.2python training/data_gen/generate_synthetic.py --out-dir training/datasets --size 320 --edge-ratio 0.30python training/train_qlora.py \
--base-model deepseek-ai/deepseek-coder-1.3b-instruct \
--train-file training/datasets/train.jsonl \
--val-file training/datasets/val.jsonl \
--output-dir training/artifacts/adapterpython training/eval.py \
--model-path training/artifacts/adapter \
--base-model deepseek-ai/deepseek-coder-1.3b-instruct \
--test-file training/datasets/test.jsonl \
--out-dir training/artifacts/evalpython training/merge_lora.py \
--base-model deepseek-ai/deepseek-coder-1.3b-instruct \
--adapter-path training/artifacts/adapter \
--output-path training/artifacts/mergedmkdir -p .secrets
cat > .secrets/wandb.env << 'EOF'
WANDB_API_KEY=your_key_here
WANDB_PROJECT=audit-code-assistant
WANDB_ENTITY=
USE_WANDB=true
EOF
chmod 600 .secrets/wandb.envRun local tracking server:
mlflow server --host 0.0.0.0 --port 5000Or use local file backend:
export USE_MLFLOW=true
export MLFLOW_TRACKING_URI=file:./mlruns
export MLFLOW_EXPERIMENT_NAME=audit-code-assistantbash scripts/run_overnight_deepseek.shpytest -q
python scripts/smoke_test.py