Build software better, together

EricRollei / Comfy_HunyuanImage3

Nodes to run Hunyuan Image 3 locally with BF16 and NF4 quantized options in Comfyui

Updated Apr 30, 2026
Python

megeezy / Chameleon

Stateless LLM runtime that dynamically routes, loads, executes, and unloads models per request with bounded VRAM caching and intelligent model selection.

systems-programming llm generative-ai ai-infrastructure latency-optimization model-routing vram-optimization model-scheduling

Updated Apr 12, 2026
Rust

mkim87404 / ComfyUI-ControlOrder-FreeMemory

Star

ComfyUI custom node that controls the order of node execution with linear routing of any data type through infinite I/O slots + option to free VRAM & RAM at any point in a workflow with device-agnostic memory management utilities managed by ComfyUI that safely unload all models, while preserving all connected data & models through to the next node.

Updated Apr 15, 2026
Python

philtimmes / KeSSie

Star

KeSSie HUGE Context Semantic recall for Large Language Models

Updated Feb 21, 2026
Python

Alperen012 / TurboQuant

Star

Ultra-Low Bit KV-Cache Compression optimization layer built on top of llama.cpp for LLM inference. Reduces VRAM overhead by ~75-80% using custom CUDA kernels.

machine-learning cuda inference quantization kv-cache llm llama-cpp agent-memory vram-optimization

Updated Apr 12, 2026
C++

Pomilon / LEMA

Star

LEMA (Layer-wise Efficient Memory Abstraction): A hardware-aware framework for fine-tuning LLMs in VRAM-constrained environments using asynchronous binary pre-fetching and triple-tier memory orchestration.

machine-learning cuda pytorch memory-management lora system-architecture fine-tuning llm safetensors vram-optimization low-resource-computing lema

Updated Mar 27, 2026
Python

damienos61 / SynapSwap

Star

Predictive VRAM Virtualization Engine

c performance deep-learning cuda artificial-intelligence memory-management gpu-computing system-programming c-language inference-engine llm-inference llm-inference-poisoning vram-optimization pcie-transfer

Updated Feb 5, 2026
C

iknowkungfubar / IronSilo

Star

WIP....Turn your PC into a private, autonomous AI lab, without melting your GPU.

docker privacy aider local-ai agentic-workflow vram-optimization

Updated May 6, 2026
Python

AlfaPankaj / Neural_Memory_Operating_system

Star

NMOS (Neural Memory OS) is a predictive partial execution engine enabling 70B-level reasoning on 4GB VRAM. It uses the “Zero-Lag” hypothesis, leveraging typing latency as a compute window to mask memory limits via async layer prefetching and speculative decoding.

python machine cuda pytorch memory-management hnsw edge-ai llm generative-ai local-llm llm-inference speculative-decoding smollm2 vram-optimization anticipatory-inference layer-offloading prefeching 70b-model

Updated Apr 28, 2026
Python

kaleic / PerkunasAITrainingPlatform

Star

Perkunas AI Training Platform is a memory-aware model training and serving system for serious language model experimentation under tight hardware limits. It combines streaming training, rich telemetry, guarded recovery, checkpoint export, and OpenAI-compatible serving.

machine-learning ai deep-learning telemetry cuda inference pytorch language-models memory-efficient model-training training-platform huggingface checkpointing llm vllm low-vram vram-optimization

Updated May 17, 2026
Python

JuiceB0xC0de / deep-chaos-scheduler

Star

Sticky-block topology lottery scheduler for transformer fine-tuning.

machine-learning amd transformers pytorch lora rocm fine-tuning weights-and-biases llm vram-optimization

Updated May 14, 2026
Python

WizardsForgeIo / sparsemma

Star

INT8 Sparse Tensor Core GEMM for PyTorch — built for Windows

windows gpu cuda inference pytorch nvidia sparse quantization gemm int8 ptx structured-sparsity tensor-cores vram-optimization

Updated Feb 16, 2026
Cuda

anthony-maio / fitcheck

Star

Know before you train — VRAM estimation for LLM fine-tuning.

training gpu fine-tuning-llm vram-optimization

Updated Feb 16, 2026
Python

bendangnuksung / dynabatch

Star

PyTorch/Hugging Face batching utility that sorts variable-length text by difficulty, then dynamically increases batch size on easier samples using a pre-trained VRAM predictor to improve GPU utilization and throughput while reducing OOM risk with fallback handling.

machine-translation transformers pytorch dataloader sampler huggingface dynamic-batching vram-optimization huggingface-trainer

Updated Apr 28, 2026
Python

angelnicolasc / Stratum

Star

Adaptive dual-tier serving for Gemma 4 on consumer 16GB GPUs. Complexity + real-time VRAM routing between vLLM E4B and llama.cpp 27B. Production stack with OpenWebUI, monitoring, and more.

routing routing-engine vram vllm vram-optimization gemma4

Updated May 15, 2026
Python

mkim87404 / ComfyUI-Unload-Model

Star

Secure & device-agnostic ComfyUI custom nodes for unloading a model or all models at any point in a workflow, followed by a robust set of VRAM & RAM clean up operations using ComfyUI's own memory management utilities. Optionally persists & routes any given data throughout the memory cleanup, letting the user control the execution order of nodes.

Updated Apr 20, 2026
Python

Pomilon / LEMA-llama

Star

A Proof of Concept for the LEMA (Layer-wise Efficient Memory Abstraction) framework. Enables stable fine-tuning of Llama-2-7B on consumer-grade hardware (16GB VRAM) through layer-wise weight streaming and triple-buffer memory virtualization.

machine-learning deep-learning pytorch kaggle memory-efficiency fine-tuning llm llama2 ai-infrastructure low-resource-ai vram-optimization low-resource-computing lema lema-framework

Updated Feb 18, 2026
Jupyter Notebook

OrakulStudio / ai-toolkit-Ostris-bonememory

Star

BoneMemory: Universal Async Core for AI-Toolkit. The hardware-agnostic VRAM manager for every model: Image, Video, and Audio. Whether you’re training a Rank 16 LoRA on an entry-level GPU or pushing Rank 1024 on an RTX 4090, BoneMemory eliminates memory bottlenecks. Architecture that makes any hardware punch above its weight. Zero OOM.

python flux cuda pytorch memory-management oom-fix video-generation ai-toolkit flux2 audio-generation hardware-agnostic sdxl lora-training vram-optimization ostris bonememory async-cuda sub-linear-scaling rank-1024

Updated May 11, 2026
Python

WarrodSequen / sparsemma

Star

Accelerate INT8 sparse inference in PyTorch on Windows with minimal setup. Achieve high performance using Sparse Tensor Cores without Linux dependencies.

windows gpu cuda inference pytorch nvidia sparse quantization gemm int8 ptx structured-sparsity tensor-cores vram-optimization

Updated Feb 28, 2026

mkim87404 / ComfyUI-TransformerLLMTaskRunner

Star

ComfyUI Custom Node for running Transformer LLMs with zero dependency conflicts. Provides device-agnostic VRAM/RAM cleanup options post-run & optional dynamic LLM prompt formatting with variable inputs of any type.

Updated Mar 31, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vram-optimization

Here are 24 public repositories matching this topic...

EricRollei / Comfy_HunyuanImage3

megeezy / Chameleon

mkim87404 / ComfyUI-ControlOrder-FreeMemory

philtimmes / KeSSie

Alperen012 / TurboQuant

Pomilon / LEMA

damienos61 / SynapSwap

iknowkungfubar / IronSilo

AlfaPankaj / Neural_Memory_Operating_system

kaleic / PerkunasAITrainingPlatform

JuiceB0xC0de / deep-chaos-scheduler

WizardsForgeIo / sparsemma

anthony-maio / fitcheck

bendangnuksung / dynabatch

angelnicolasc / Stratum

mkim87404 / ComfyUI-Unload-Model

Pomilon / LEMA-llama

OrakulStudio / ai-toolkit-Ostris-bonememory

WarrodSequen / sparsemma

mkim87404 / ComfyUI-TransformerLLMTaskRunner

Improve this page

Add this topic to your repo