GPU Memory Guard

A CLI utility that checks available GPU VRAM before you load AI models. Prevents OOM crashes that force a full system reboot.

Why?

If you run local inference on consumer GPUs, you know the pain:

Without gpu-memory-guard	With gpu-memory-guard
Load 70B model on 24GB card	Check VRAM before loading
System freezes, GPU hangs	Get a clear warning in terminal
Force reboot, lose unsaved work	Pick a smaller model or free memory
Repeat next week	Zero OOM crashes

One command saves you from constant reboots.

Quick Start

git clone https://github.com/CastelDazur/gpu-memory-guard.git
cd gpu-memory-guard
pip install -e .

# Check current GPU status
gpu-guard

# Check if an 18GB model fits with 2GB safety buffer
gpu-guard --model-size 18 --buffer 2

Example output:

GPU 0: NVIDIA GeForce RTX 5090
  Total:     32.00 GB
  Used:       4.12 GB
  Available: 27.88 GB

Model size: 18.00 GB (buffer: 2.00 GB)
Status: OK - model fits with 7.88 GB to spare

Documentation

MODEL_COMPATIBILITY.md - Sizing reference for GPUs, models, and quantizations (Q4_K_M, Q5_K_M, Q8_0, FP16) with KV cache tables and the mmproj trap for vision-language models.
TROUBLESHOOTING.md - Field guide to the five CUDA OOM errors you will actually see, with a diagnostic checklist and notes on vLLM, llama.cpp, and Ollama quirks.

Installation

From source (recommended)

git clone https://github.com/CastelDazur/gpu-memory-guard.git
cd gpu-memory-guard
pip install -e .

Requirements

Python 3.8+
NVIDIA GPU with nvidia-smi installed, OR
pynvml Python package (pip install pynvml)

Usage

CLI

# Basic VRAM check
gpu-guard

# Check if a model fits (size in GB)
gpu-guard --model-size 13

# Custom safety buffer (default: 1GB)
gpu-guard --model-size 18 --buffer 2

# JSON output for scripting
gpu-guard --model-size 13 --json

# Quiet mode: exit code only (0 = fits, 1 = doesn't)
gpu-guard --model-size 7 --quiet

As a Python library

from gpu_guard import check_vram, can_load_model, get_gpu_info

# Check current VRAM
gpu_info = get_gpu_info()
for gpu in gpu_info:
    print(f"GPU {gpu.device_id}: {gpu.available_memory_gb:.2f}GB available")

# Check if a model fits
result = can_load_model(model_size_gb=13.0, buffer_gb=2.0)
if result.fits:
    print("Safe to load")
else:
    print(f"Need {result.shortage_gb:.2f}GB more VRAM")

Scripting example

# Pre-check before launching inference
if gpu-guard --model-size 13 --quiet; then
    python run_inference.py --model llama-13b
else
    echo "Not enough VRAM, switching to 7B model"
    python run_inference.py --model llama-7b
fi

Common model sizes (approximate VRAM)

Model	FP16	Q4 (GGUF)
7B params	~14 GB	~4 GB
13B params	~26 GB	~7 GB
33B params	~66 GB	~18 GB
70B params	~140 GB	~35 GB

Roadmap

AMD ROCm support
Memory estimation by model architecture
Multi-GPU split recommendations
PyPI package (pip install gpu-memory-guard)
Integration with Ollama and vLLM

Contributing

PRs welcome. If you want to add AMD ROCm support or model-specific memory estimation, open an issue first so we can discuss the approach.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
tests		tests
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MODEL_COMPATIBILITY.md		MODEL_COMPATIBILITY.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
gpu_guard.py		gpu_guard.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Memory Guard

Why?

Quick Start

Documentation

Installation

From source (recommended)

Requirements

Usage

CLI

As a Python library

Scripting example

Common model sizes (approximate VRAM)

Roadmap

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU Memory Guard

Why?

Quick Start

Documentation

Installation

From source (recommended)

Requirements

Usage

CLI

As a Python library

Scripting example

Common model sizes (approximate VRAM)

Roadmap

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages