Skip to content

Conversation

@Doufless1
Copy link

Low-VRAM Training Framework for GO-1

Description

This PR adds a memory-efficient training framework that enables GO-1 model fine-tuning on consumer GPUs with as little as 4GB VRAM (tested on NVIDIA GTX 970).

Problem

The current training setup requires ~70GB VRAM, making it inaccessible to most researchers and developers. This limits community contributions and experimentation.

Solution

A modular low-VRAM training framework built with Clean Architecture and SOLID principles:

Key Features

Feature Benefit
Gradient Accumulation Simulate larger batch sizes with micro-batches
Mixed Precision (FP16/BF16) 50% memory reduction for activations
CPU Offloading Move optimizer states to CPU with integrity checks
Selective Freezing Train only Action Expert (755MB vs 5GB full model)
Feature Caching Pre-compute vision features with SHA256 verification
Preset Configs Ready-to-use settings for 4GB and 8GB GPUs

Architecture

go1/tools/low_vram/
├── core/               # Domain layer - interfaces and data classes
├── adapters/           # Interface adapters - concrete implementations
└── infrastructure/     # Framework layer - trainer, factory, config

Files Changed

  • 14 new files in go1/tools/low_vram/
  • 18 unit tests in tests/low_vram/
  • 2,337 lines of documented, tested code

Testing

All unit tests pass:

python -m unittest tests.low_vram.test_components -v
# 18 tests passed

Tested on:

  • NVIDIA GTX 970 (4GB VRAM)
  • Windows 11 / Python 3.11.9 / PyTorch 2.5.1

Usage Example

from go1.tools.low_vram.infrastructure.factory import (
    DefaultTrainerFactory, 
    create_4gb_config
)

# Create optimized configs for 4GB GPU
memory_config, training_config = create_4gb_config()

# Build trainer with dependency injection
factory = DefaultTrainerFactory()
trainer = LowVRAMTrainer(
    model=model,
    optimizer=optimizer,
    memory_manager=factory.create_memory_manager(memory_config),
    training_strategy=factory.create_training_strategy(training_config),
    feature_cache=factory.create_feature_cache("./cache"),
    model_freezer=factory.create_model_freezer(),
)

# Train!
trainer.train(train_dataloader, num_epochs=10)

Breaking Changes

None - this is a new module that doesn't modify existing code.

Checklist

  • Code follows project style guidelines
  • Unit tests added and passing
  • Documentation included in code
  • No breaking changes to existing functionality

This adds a memory-efficient training framework that enables GO-1 model
fine-tuning on consumer GPUs with as little as 4GB VRAM.

Key features:
- Clean Architecture with SOLID principles
- Gradient accumulation with mixed precision
- CPU offloading with integrity checks
- Disk-based feature caching with SHA256 verification
- Selective model component freezing
- Preset configs for 4GB and 8GB GPUs

Includes 18 unit tests covering all components.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant