Skip to content

Latest commit

ย 

History

History
418 lines (319 loc) ยท 9.94 KB

File metadata and controls

418 lines (319 loc) ยท 9.94 KB

๐Ÿš€ Getting Started with caramba

This guide walks you through installing caramba, running your first experiment, and understanding the core concepts that make the platform work.


๐Ÿ“‹ Table of Contents


Prerequisites

Required

  • Python 3.10+ โ€” caramba uses modern Python features
  • PyTorch 2.0+ โ€” For model building and training
  • 8GB+ RAM โ€” For loading models and datasets

Optional

  • HuggingFace account โ€” For gated models like Llama (requires huggingface-cli login)
  • Xcode Command Line Tools โ€” For Metal kernel compilation on macOS
  • CUDA + Triton โ€” For GPU acceleration on NVIDIA hardware

Installation

Basic Installation

# Clone the repository
git clone https://github.com/theapemachine/caramba.git
cd caramba

# Install dependencies
pip install -r requirements.txt

With Agent Workflows (Optional)

If you want AI-assisted paper drafting and review:

pip install -e ".[agents]"

# Or install individual components
pip install deeplake docling transformers  # Knowledge store
pip install crawl4ai                        # Web crawling

Verify Installation

# Should print the execution plan without running
python3 -m caramba config/presets/standard_transformer.yml --dry-run

Your First Experiment

Let's run a simple transformer training experiment to verify everything works.

Step 1: Prepare Data

caramba uses pre-tokenized .npy files for efficient data loading. For testing, you can create a small dummy dataset:

import numpy as np

# Create 1M random tokens (replace with real tokenized data for actual experiments)
tokens = np.random.randint(0, 50257, size=1_000_000, dtype=np.int32)
np.save("test_data.npy", tokens)

For real experiments, use the FineWeb preparation script:

python3 prepare_fineweb.py --tokens 100M --output fineweb_100m.npy

Step 2: Create a Manifest

Create my_experiment.yml:

version: 2
name: my_first_experiment
notes: Learning how caramba works

# Default settings applied to all targets
defaults:
  data:
    tokenizer: tiktoken
    val_frac: 0.1
  logging:
    instrument: rich
    wandb: false
  runtime:
    save_every: 100

# Variables for easy modification
vars:
  d_model: 256
  n_heads: 4
  n_layers: 4
  d_ff: 1024
  vocab_size: 50257
  block_size: 128

# Experiment targets
targets:
  - type: experiment
    name: train
    description: Train a small transformer from scratch
    backend: torch
    task: task.language_modeling

    # Data configuration
    data:
      ref: dataset.tokens
      config:
        path: test_data.npy
        block_size: ${block_size}

    # Model configuration
    system:
      ref: system.language_model
      config:
        model:
          type: TransformerModel
          embedder:
            type: token
            vocab_size: ${vocab_size}
            d_model: ${d_model}
          topology:
            type: StackedTopology
            layers:
              # Repeated transformer blocks
              - type: NestedTopology
                repeat: ${n_layers}
                layers:
                  # Attention with residual
                  - type: ResidualTopology
                    layers:
                      - type: RMSNormLayer
                        d_model: ${d_model}
                      - type: AttentionLayer
                        d_model: ${d_model}
                        n_heads: ${n_heads}
                        mode: standard
                  # FFN with residual
                  - type: ResidualTopology
                    layers:
                      - type: RMSNormLayer
                        d_model: ${d_model}
                      - type: SwiGLULayer
                        d_model: ${d_model}
                        d_ff: ${d_ff}
              # Final normalization
              - type: RMSNormLayer
                d_model: ${d_model}
              # Output projection
              - type: LinearLayer
                d_in: ${d_model}
                d_out: ${vocab_size}

    objective: objective.next_token_ce
    trainer: trainer.standard

    # Training runs
    runs:
      - id: train_small
        mode: train
        exp: my_first_run
        seed: 42
        steps: 500
        train:
          phase: standard
          batch_size: 8
          block_size: ${block_size}
          lr: 0.001
          device: mps  # or 'cuda' or 'cpu'
          dtype: float32

Step 3: Validate the Manifest

Before running, verify the manifest is valid:

python3 -m caramba my_experiment.yml --dry-run

This shows the execution plan without running anything:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Execution Plan                                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Target: train                                           โ”‚
โ”‚ Runs:                                                   โ”‚
โ”‚   - train_small (500 steps, device=mps, dtype=float32)  โ”‚
โ”‚ Benchmarks: []                                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Step 4: Run the Experiment

python3 -m caramba my_experiment.yml

You'll see:

โ•ญโ”€ Training Phase: standard โ”€โ•ฎ
โ”‚ Step    100/500  loss=5.234 โ”‚
โ”‚ Step    200/500  loss=4.102 โ”‚
โ”‚ Step    300/500  loss=3.567 โ”‚
โ”‚ Step    400/500  loss=3.221 โ”‚
โ”‚ Step    500/500  loss=2.987 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โœ“ Training complete

Understanding Manifests

A manifest is a YAML file that declaratively defines your experiment. Here's the structure:

Top-Level Sections

version: 2              # Manifest schema version (always 2)
name: experiment_name   # Used for artifact directories
notes: "Description"    # Human-readable notes

vars:                   # Reusable variables
  d_model: 512

defaults:               # Settings applied to all targets
  data: { ... }
  logging: { ... }
  runtime: { ... }

targets:                # Runnable units (experiments or processes)
  - type: experiment
    name: train
    ...

entrypoints:            # Optional named entry points
  default: "train"

Variable Substitution

Use ${variable} to reference values from the vars section:

vars:
  d_model: 512
  n_heads: 8

targets:
  - type: experiment
    system:
      config:
        model:
          topology:
            layers:
              - type: AttentionLayer
                d_model: ${d_model}  # Becomes 512
                n_heads: ${n_heads}  # Becomes 8

The Topology Tree

Models are defined as trees of topologies containing layers:

topology:
  type: StackedTopology           # Root: sequential execution
  layers:
    - type: NestedTopology        # Repeat this block N times
      repeat: 6
      layers:
        - type: ResidualTopology  # x + f(x)
          layers:
            - type: RMSNormLayer
            - type: AttentionLayer

โ†’ Full Manifest Reference


Core Concepts

๐ŸŽฏ Targets

A target is a runnable unit. There are two types:

Type Purpose
experiment ML training/evaluation with runs and benchmarks
process Agent workflow (paper writing, review, etc.)

๐Ÿ”„ Runs

Each experiment target contains one or more runs:

runs:
  - id: blockwise
    mode: train
    steps: 500
    train:
      phase: blockwise
      lr: 0.0001

  - id: finetune
    mode: train
    steps: 2000
    train:
      phase: global
      lr: 0.00005

Runs execute sequentially within a target.

๐Ÿ“ Topologies vs Layers

Topologies define structure (how things connect):

  • StackedTopology โ€” A then B then C
  • ResidualTopology โ€” x + f(x)
  • ParallelTopology โ€” [A(x), B(x)] stacked

Layers define computation (what happens):

  • AttentionLayer โ€” Multi-head attention
  • SwiGLULayer โ€” Feed-forward network
  • RMSNormLayer โ€” Normalization

โœ… Verification

Attach verification to runs to check model behavior:

runs:
  - id: train
    verify:
      type: compare
      batches: 5
      attention:
        max_mean_l1: 0.05

Verification types:

  • compare โ€” Check L1 distance between teacher/student
  • fidelity โ€” Check NLL/perplexity ratios
  • eval โ€” Run behavioral test cases

๐Ÿ“Š Benchmarks

Measure and compare models after training:

benchmarks:
  - id: perplexity
    config:
      type: perplexity
      num_batches: 100
    models: [teacher, student]

Generates CSV, PNG, and LaTeX artifacts.


Next Steps

Now that you understand the basics:

  1. Manifest Reference โ€” Complete YAML schema and options
  2. Layer Reference โ€” All layer types with configurations
  3. Topology Guide โ€” Building complex architectures
  4. Training Guide โ€” Standard, upcycle, and orchestrated modes

Example Experiments to Try

# Train a Mixture of Experts model
python3 -m caramba config/presets/moe_transformer.yml --dry-run

# Upcycle Llama to DBA (requires HF login)
huggingface-cli login
python3 -m caramba config/presets/llama32_1b_dba.yml --target quick

# Run with full benchmarks
python3 -m caramba config/presets/llama32_1b_dba.yml --target paper