feat: Add low-VRAM training framework for 4GB GPUs #136

Doufless1 · 2025-12-23T16:15:26Z

Low-VRAM Training Framework for GO-1

Description

This PR adds a memory-efficient training framework that enables GO-1 model fine-tuning on consumer GPUs with as little as 4GB VRAM (tested on NVIDIA GTX 970).

Problem

The current training setup requires ~70GB VRAM, making it inaccessible to most researchers and developers. This limits community contributions and experimentation.

Solution

A modular low-VRAM training framework built with Clean Architecture and SOLID principles:

Key Features

Feature	Benefit
Gradient Accumulation	Simulate larger batch sizes with micro-batches
Mixed Precision (FP16/BF16)	50% memory reduction for activations
CPU Offloading	Move optimizer states to CPU with integrity checks
Selective Freezing	Train only Action Expert (755MB vs 5GB full model)
Feature Caching	Pre-compute vision features with SHA256 verification
Preset Configs	Ready-to-use settings for 4GB and 8GB GPUs

Architecture

go1/tools/low_vram/
├── core/               # Domain layer - interfaces and data classes
├── adapters/           # Interface adapters - concrete implementations
└── infrastructure/     # Framework layer - trainer, factory, config

Files Changed

14 new files in go1/tools/low_vram/
18 unit tests in tests/low_vram/
2,337 lines of documented, tested code

Testing

All unit tests pass:

python -m unittest tests.low_vram.test_components -v
# 18 tests passed

Tested on:

NVIDIA GTX 970 (4GB VRAM)
Windows 11 / Python 3.11.9 / PyTorch 2.5.1

Usage Example

from go1.tools.low_vram.infrastructure.factory import (
    DefaultTrainerFactory, 
    create_4gb_config
)

# Create optimized configs for 4GB GPU
memory_config, training_config = create_4gb_config()

# Build trainer with dependency injection
factory = DefaultTrainerFactory()
trainer = LowVRAMTrainer(
    model=model,
    optimizer=optimizer,
    memory_manager=factory.create_memory_manager(memory_config),
    training_strategy=factory.create_training_strategy(training_config),
    feature_cache=factory.create_feature_cache("./cache"),
    model_freezer=factory.create_model_freezer(),
)

# Train!
trainer.train(train_dataloader, num_epochs=10)

Breaking Changes

None - this is a new module that doesn't modify existing code.

Checklist

Code follows project style guidelines
Unit tests added and passing
Documentation included in code
No breaking changes to existing functionality

This adds a memory-efficient training framework that enables GO-1 model fine-tuning on consumer GPUs with as little as 4GB VRAM. Key features: - Clean Architecture with SOLID principles - Gradient accumulation with mixed precision - CPU offloading with integrity checks - Disk-based feature caching with SHA256 verification - Selective model component freezing - Preset configs for 4GB and 8GB GPUs Includes 18 unit tests covering all components.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add low-VRAM training framework for 4GB GPUs #136

feat: Add low-VRAM training framework for 4GB GPUs #136

Uh oh!

Doufless1 commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat: Add low-VRAM training framework for 4GB GPUs #136

Are you sure you want to change the base?

feat: Add low-VRAM training framework for 4GB GPUs #136

Uh oh!

Conversation

Doufless1 commented Dec 23, 2025

Low-VRAM Training Framework for GO-1

Description

Problem

Solution

Key Features

Architecture

Files Changed

Testing

Usage Example

Breaking Changes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant