Skip to content

Latest commit

 

History

History
1008 lines (752 loc) · 30.8 KB

File metadata and controls

1008 lines (752 loc) · 30.8 KB

Garbage Collector Simplification & Modularization Analysis

Date: 2025-11-17 Author: AI Analysis Purpose: Analyze opportunities to simplify and modularize the GC with minimal codebase impact Status: ANALYSIS - Recommendations for incremental improvement


Executive Summary

The Lua garbage collector is a highly optimized tri-color incremental mark-and-sweep collector with generational optimization. After comprehensive analysis of the implementation and existing documentation, this report identifies 7 major opportunities for simplification and modularization that can:

  • Reduce complexity by ~30-40%
  • Improve maintainability through better separation of concerns
  • Minimize codebase impact through incremental refactoring
  • Preserve performance (target: ≤4.33s, ≤3% regression from 4.20s baseline)
  • Maintain C API compatibility completely

Key Finding: The GC is fundamentally necessary for Lua's semantics (circular references, weak tables, resurrection), but its implementation complexity can be significantly reduced without changing its core algorithm.

Recommended Strategy: Incremental modularization focusing on phase extraction, state machine cleanup, and list consolidation - NOT wholesale GC removal.


Table of Contents

  1. Current GC Architecture
  2. Complexity Analysis
  3. Simplification Opportunities
  4. Modularization Strategy
  5. Implementation Phases
  6. Impact Assessment
  7. Performance Considerations
  8. Recommendations

1. Current GC Architecture

1.1 Core Algorithm

Type: Tri-color incremental mark-and-sweep with generational optimization

Color Encoding (lgc.h:107-148):

  • WHITE: Unmarked (candidate for collection) - 2 shades for new vs old
  • GRAY: Marked but children unprocessed (work queue)
  • BLACK: Fully processed (safe until next cycle)

Critical Invariant: Black objects never point to white objects (enforced by write barriers)

1.2 GC Phases (lstate.h:154-164)

Pause → Propagate → EnterAtomic → Atomic → SweepAllGC → SweepFinObj →
        SweepToBeFnz → SweepEnd → CallFin → Pause

8 distinct phases with complex state transitions

1.3 Object Lists (16 concurrent lists!)

Incremental Mode Lists (10):

  1. allgc - All collectable objects
  2. sweepgc - Current sweep position (pointer-to-pointer)
  3. finobj - Objects with finalizers
  4. gray - Gray work queue
  5. grayagain - Objects to revisit in atomic
  6. weak - Weak-value tables
  7. ephemeron - Weak-key tables
  8. allweak - All-weak tables
  9. tobefnz - Ready for finalization
  10. fixedgc - Never collected (interned strings)

Generational Mode Pointers (+6): 11. survival - Survived one cycle 12. old1 - Survived two cycles 13. reallyold - Survived 3+ cycles 14-16. Finobj generations (finobjsur, finobjold1, finobjrold)

1.4 Age State Machine (7 ages!)

enum class GCAge : lu_byte {
  New       = 0,  // Created this cycle
  Survival  = 1,  // Survived 1 cycle
  Old0      = 2,  // Barrier-aged (not truly old)
  Old1      = 3,  // Survived 2 cycles
  Old       = 4,  // Truly old (skip in minor GC)
  Touched1  = 5,  // Old object modified this cycle
  Touched2  = 6   // Old object modified last cycle
};

Age Transitions: Complex state machine with 7 states and conditional transitions

1.5 Write Barriers (2 types)

Forward Barrier (lgc.cpp:309-324):

  • When: Black→White write
  • Action: Mark white object gray (restore invariant)
  • Generational: Promote to Old0 if writer is old

Backward Barrier (lgc.cpp:331-342):

  • When: Old object modified
  • Action: Mark as "touched", add to grayagain list
  • Purpose: Ensure old objects revisited if modified

1.6 Code Metrics

Files:

  • lgc.h - 479 lines
  • lgc.cpp - 1,950 lines
  • Total: ~2,400 lines of GC code

Complexity Hotspots:

  • atomic() - 40 lines of critical sequential operations (lgc.cpp:1649-1687)
  • convergeephemerons() - Iterative convergence loop (lgc.cpp:809-828)
  • sweeplist() - Pointer-to-pointer manipulation (lgc.cpp:990-1007)
  • youngcollection() - Generational GC with 6 sweep phases (lgc.cpp:1439-1484)
  • GCTM() - Finalizer execution with error handling (lgc.cpp:1095-1130)

2. Complexity Analysis

2.1 Complexity Drivers

Component Complexity Score Primary Cause Lines of Code
Generational GC ⭐⭐⭐⭐⭐ 7 age states, 6 generational lists, age transitions ~400 lines
Ephemeron Tables ⭐⭐⭐⭐⭐ Convergence loop, transitive marking ~150 lines
Finalization ⭐⭐⭐⭐ Resurrection, re-finalization, error handling ~250 lines
Write Barriers ⭐⭐⭐⭐ Two types, generational interactions ~150 lines
State Machine ⭐⭐⭐ 8 phases, complex transitions ~200 lines
List Management ⭐⭐⭐ 16 lists, pointer-to-pointer sweep ~300 lines
Weak Tables ⭐⭐⭐ 3 modes, clearing logic ~150 lines

Total Complexity: ~1,600 lines of high-complexity code out of 1,950 total

2.2 Existing Modularization

Already Completed (Phase 91 from CLAUDE.md):

global_State has been refactored into subsystems:

  • GCAccounting - totalbytes, GCdebt, GCmarked, etc.
  • GCParameters - GCpause, GCstepmul, GCstepsize, etc.
  • GCObjectLists - All 16 GC lists

Benefits Achieved:

  • ✅ Better organization of GC state
  • ✅ Clear ownership of fields
  • ✅ Type safety improvements

Remaining Gaps:

  • ❌ No phase extraction (all phases in single file)
  • ❌ No algorithmic separation (mark/sweep/finalize)
  • ❌ State machine still implicit (switch statements)
  • ❌ No encapsulation of generational vs incremental modes

2.3 Identified Problems

P1: Generational Mode Complexity (400 lines, 5/5 complexity)

  • 7 age states (only need 3-4)
  • 6 additional lists beyond incremental mode
  • Complex age transition logic (nextage[] array)
  • Barrier promotion logic (Old0 → Old1 → Old)
  • Minor vs major collection decision logic

P2: Ephemeron Convergence (150 lines, 5/5 complexity)

  • Iterative convergence loop (can run multiple times)
  • Direction alternation (forward/backward traversal)
  • Re-marking logic
  • Performance unpredictable (depends on graph structure)

P3: Phase Coupling (200 lines, 3/5 complexity)

  • All phases in single singlestep() function
  • No clear phase abstraction
  • State machine implicit in switch statement
  • Difficult to test individual phases

P4: List Proliferation (300 lines, 3/5 complexity)

  • 16 lists with overlapping purposes
  • Complex list movement logic (e.g., allgc → finobj → tobefnz)
  • Pointer-to-pointer sweep complicates abstraction

P5: Finalization Complexity (250 lines, 4/5 complexity)

  • Resurrection support (requires extra remark phase)
  • Re-finalization support (FINALIZEDBIT manipulation)
  • Error handling during __gc execution
  • Emergency GC prevention during finalization

3. Simplification Opportunities

3.1 Opportunity 1: Optional Generational Mode

Current State: Generational mode is always available, adding 400 lines of complexity

Proposal: Make generational mode compile-time optional

#if defined(LUA_USE_GENERATIONAL_GC)
  // Generational code (400 lines)
#else
  // Incremental-only mode (simpler)
#endif

Benefits:

  • ✅ Reduces code size by 20% when disabled
  • ✅ Eliminates 7-state age machine (only need 2: new, old)
  • ✅ Removes 6 generational lists
  • ✅ Simplifies barrier logic (only forward barrier needed)
  • ✅ Easier to understand and maintain

Costs:

  • ⚠️ Performance regression for long-running programs (generational is faster)
  • ⚠️ Some users may rely on generational mode
  • ⚠️ API still needs to support lua_gc(L, LUA_GCSETMODE, LUA_GCGEN)

Recommendation: Implement this - Keep generational as default, but allow disabling for embedded systems

Effort: 15-20 hours Risk: LOW (can be feature-flagged)


3.2 Opportunity 2: Simplify Age Management

Current State: 7 age states with complex transitions

Proposal: Reduce to 4 core ages in generational mode

enum class GCAge : lu_byte {
  Young     = 0,  // New + Survival (merged)
  Old       = 1,  // Old0 + Old1 + Old (merged)
  Touched1  = 2,  // Modified once
  Touched2  = 3   // Modified twice
};

Transition Simplification:

Young ──(survive GC)──→ Old
Old ──(modified)──→ Touched1 ──(GC)──→ Touched2 ──(GC)──→ Old

Benefits:

  • ✅ Simpler state machine (4 states vs 7)
  • ✅ Clearer semantics (young/old distinction only)
  • ✅ Less conditional logic in age advancement

Costs:

  • ⚠️ May promote objects to old sooner (more conservative)
  • ⚠️ Slightly less generational optimization

Recommendation: Consider implementing if generational mode retained

Effort: 10-15 hours Risk: MEDIUM (requires careful performance testing)


3.3 Opportunity 3: Extract Phase Classes

Current State: All phases in single singlestep() function with switch statement

Proposal: Extract phases into separate classes with interface

class GCPhase {
public:
  virtual ~GCPhase() = default;
  virtual l_mem execute(lua_State* L, int fast) = 0;
  virtual GCState getState() const = 0;
  virtual GCPhase* next() = 0;  // Next phase in sequence
};

class PropagatePhase : public GCPhase {
  l_mem execute(lua_State* L, int fast) override {
    if (fast || G(L)->getGray() == NULL) {
      return 1;  // Done, move to next phase
    }
    return propagatemark(G(L));
  }

  GCState getState() const override { return GCState::Propagate; }
  GCPhase* next() override { return &g_enterAtomicPhase; }
};

class AtomicPhase : public GCPhase { /* ... */ };
class SweepPhase : public GCPhase { /* ... */ };
// ... etc

Benefits:

  • ✅ Clear separation of concerns (one class per phase)
  • ✅ Easier to test individual phases
  • ✅ Better encapsulation of phase-specific logic
  • ✅ Explicit state machine (next() method)
  • ✅ Can use polymorphism for phase-specific behavior

Costs:

  • ⚠️ Virtual function overhead (minimal - only 1 call per GC step)
  • ⚠️ More files to manage (8 phase classes)

Recommendation: Strongly recommend - Significantly improves maintainability

Effort: 30-40 hours Risk: LOW (can be done incrementally, phase by phase)


3.4 Opportunity 4: Consolidate Gray Lists

Current State: 5 separate gray lists (gray, grayagain, weak, ephemeron, allweak)

Proposal: Use single gray list with priority/category field

enum class GrayCategory : lu_byte {
  Normal    = 0,  // Regular gray objects
  Again     = 1,  // To revisit in atomic
  WeakVal   = 2,  // Weak-value tables
  WeakKey   = 3,  // Ephemeron tables
  WeakBoth  = 4   // All-weak tables
};

// Add to GCObject:
GrayCategory gray_category;

// Single list traversal:
for (GCObject* obj : gray_list) {
  switch (obj->gray_category) {
    case GrayCategory::Normal:
      propagatemark(g, obj);
      break;
    case GrayCategory::Again:
      /* ... */
      break;
    // ...
  }
}

Benefits:

  • ✅ Reduces from 5 lists to 1
  • ✅ Simpler list management
  • ✅ Easier to understand object state
  • ✅ Can prioritize processing order easily

Costs:

  • ⚠️ Adds 1 byte per GCObject (gray_category field)
  • ⚠️ May reduce cache locality if categories mixed in list
  • ⚠️ Weak table convergence may need special handling

Recommendation: Maybe - Benefits unclear, adds field to every object

Effort: 25-30 hours Risk: MEDIUM (affects hot path)


3.5 Opportunity 5: Simplify Ephemeron Convergence

Current State: Convergence loop with direction alternation

do {
  changed = 0;
  for each ephemeron table {
    if (traverseephemeron(g, h, dir)) {
      propagateall(g);
      changed = 1;
    }
  }
  dir = !dir;  // Alternate direction
} while (changed);

Proposal: Use fixed-point iteration without direction alternation

while (true) {
  int marked_count = 0;
  for each ephemeron table {
    marked_count += traverseephemeron(g, h);
  }
  if (marked_count == 0) break;  // Fixed point reached
  propagateall(g);
}

Benefits:

  • ✅ Simpler logic (no direction tracking)
  • ✅ Easier to understand convergence condition
  • ✅ More predictable performance

Costs:

  • ⚠️ May take more iterations to converge (direction was optimization)
  • ⚠️ Performance impact depends on ephemeron graph structure

Recommendation: Consider - Simplifies code, but needs benchmarking

Effort: 8-10 hours Risk: MEDIUM (performance-sensitive)


3.6 Opportunity 6: State Machine Cleanup

Current State: Implicit state machine in switch statement

Proposal: Use explicit state pattern with std::variant

using GCPhaseVariant = std::variant<
  PauseState,
  PropagateState,
  EnterAtomicState,
  AtomicState,
  SweepAllGCState,
  SweepFinObjState,
  SweepToBeFnzState,
  SweepEndState,
  CallFinState
>;

class GCStateMachine {
private:
  GCPhaseVariant current_phase;

public:
  l_mem step(lua_State* L, int fast) {
    return std::visit([&](auto& phase) {
      return phase.execute(L, fast);
    }, current_phase);
  }

  void transition(GCPhaseVariant next) {
    current_phase = next;
  }
};

Benefits:

  • ✅ Type-safe state transitions
  • ✅ Explicit state representation
  • ✅ Compile-time exhaustiveness checking
  • ✅ Better debugging (can inspect current_phase)

Costs:

  • ⚠️ std::variant overhead (minimal)
  • ⚠️ More complex type system
  • ⚠️ Requires C++17 std::variant

Recommendation: Consider - Improves type safety, but adds complexity

Effort: 20-25 hours Risk: LOW (mostly refactoring, same logic)


3.7 Opportunity 7: Modularize Finalization

Current State: Finalization logic spread across multiple functions

Proposal: Extract into Finalizer class

class FinalizerManager {
private:
  global_State* g;

  GCObject* finobj;      // Objects with finalizers
  GCObject* tobefnz;     // Ready for finalization

public:
  // Check if object needs finalizer
  void checkFinalizer(GCObject* obj, Table* mt);

  // Separate unreachable objects
  void separateUnreachable(bool all);

  // Execute one finalizer
  int executeNext(lua_State* L);

  // Execute all pending finalizers
  void executeAll(lua_State* L);

  // Handle resurrection after __gc
  void handleResurrections(lua_State* L);
};

Benefits:

  • ✅ Clear encapsulation of finalization logic
  • ✅ Easier to test finalization in isolation
  • ✅ Better separation from GC main loop
  • ✅ Can add finalization metrics/debugging

Costs:

  • ⚠️ Additional abstraction layer
  • ⚠️ Need to pass FinalizerManager around

Recommendation: Strongly recommend - Improves maintainability significantly

Effort: 20-25 hours Risk: LOW (mostly refactoring)


4. Modularization Strategy

4.1 Proposed Module Structure

src/memory/gc/
├── gc_core.h / .cpp          # Main GC interface and coordination
├── gc_phases/
│   ├── phase_base.h          # GCPhase interface
│   ├── phase_pause.cpp       # Pause phase
│   ├── phase_propagate.cpp   # Propagate phase
│   ├── phase_atomic.cpp      # Atomic phase
│   ├── phase_sweep.cpp       # Sweep phases (all 4)
│   └── phase_finalize.cpp    # CallFin phase
├── gc_marking.h / .cpp       # Mark algorithms (reallymarkobject, propagatemark, etc.)
├── gc_sweeping.h / .cpp      # Sweep algorithms (sweeplist, sweeptolive, etc.)
├── gc_finalizer.h / .cpp     # Finalization (separatetobefnz, GCTM, etc.)
├── gc_barriers.h / .cpp      # Write barriers (barrier_, barrierback_)
├── gc_weak.h / .cpp          # Weak table handling (clearbykeys, clearbyvalues, convergeephemerons)
├── gc_generational.h / .cpp  # Generational mode (optional, #ifdef)
└── gc_state.h                # GC state enums and types

Total: 12-15 files vs current 2 files

4.2 Encapsulation Boundaries

GCCore (gc_core.h):

  • Public interface: luaC_step(), luaC_fullgc(), luaC_newobj(), etc.
  • Coordinates phase execution
  • Manages state machine
  • Delegates to specialized modules

GCMarking (gc_marking.h):

  • Mark algorithms (DFS traversal)
  • Gray list management
  • Tri-color invariant maintenance
  • Internal use only (called by phases)

GCSweeping (gc_sweeping.h):

  • Sweep algorithms (pointer-to-pointer)
  • Object freeing (freeobj)
  • List traversal
  • Internal use only

GCFinalizer (gc_finalizer.h):

  • Finalizer detection
  • Object separation
  • __gc execution
  • Resurrection handling
  • Internal use only

GCBarriers (gc_barriers.h):

  • Forward barrier
  • Backward barrier
  • Generational barrier logic (if enabled)
  • Public (called from setters in lobject.h, ltable.cpp, etc.)

GCWeak (gc_weak.h):

  • Weak table traversal
  • Ephemeron convergence
  • Weak reference clearing
  • Internal use only

GCGenerational (gc_generational.h - optional):

  • Age management
  • Minor/major collection
  • Generational list management
  • Internal use only (ifdef LUA_USE_GENERATIONAL_GC)

4.3 Dependency Graph

gc_core
  ├──> gc_phases (all phases)
  │      ├──> gc_marking
  │      ├──> gc_sweeping
  │      ├──> gc_finalizer
  │      └──> gc_weak
  ├──> gc_barriers (public interface)
  └──> gc_generational (optional)

Key Principle: Core coordinates, phases execute, algorithms implement


5. Implementation Phases

Phase 1: Extract Marking Module (15-20 hours)

Goal: Move marking logic to separate module

Steps:

  1. Create gc_marking.h and gc_marking.cpp
  2. Move reallymarkobject(), propagatemark(), propagateall()
  3. Move markvalue(), markobject(), markmt(), markbeingfnz()
  4. Create GCMarking class with methods
  5. Update callers to use new interface
  6. Test: All tests must pass, performance ≤4.33s

Deliverable: gc_marking.h/.cpp with 6-8 public methods


Phase 2: Extract Sweeping Module (15-20 hours)

Goal: Move sweep logic to separate module

Steps:

  1. Create gc_sweeping.h and gc_sweeping.cpp
  2. Move sweeplist(), sweeptolive(), entersweep()
  3. Move freeobj() and related functions
  4. Create GCSweeping class
  5. Update callers
  6. Test: All tests must pass, performance ≤4.33s

Deliverable: gc_sweeping.h/.cpp with 4-5 public methods


Phase 3: Extract Finalizer Module (20-25 hours)

Goal: Encapsulate finalization logic

Steps:

  1. Create gc_finalizer.h and gc_finalizer.cpp
  2. Move GCObject::checkFinalizer() to GCFinalizer::check()
  3. Move separatetobefnz(), GCTM(), callallpendingfinalizers()
  4. Create FinalizerManager class
  5. Handle resurrection logic
  6. Update callers
  7. Test: Finalization tests (gc.lua, errors.lua)

Deliverable: gc_finalizer.h/.cpp with FinalizerManager class


Phase 4: Extract Weak Table Module (15-20 hours)

Goal: Isolate weak table complexity

Steps:

  1. Create gc_weak.h and gc_weak.cpp
  2. Move traverseweakvalue(), traverseephemeron(), convergeephemerons()
  3. Move clearbykeys(), clearbyvalues(), getmode()
  4. Create WeakTableManager class
  5. Implement simplified convergence (Opportunity 5)
  6. Update callers
  7. Test: Weak table tests

Deliverable: gc_weak.h/.cpp with WeakTableManager class


Phase 5: Extract Barrier Module (10-15 hours)

Goal: Centralize barrier logic

Steps:

  1. Create gc_barriers.h and gc_barriers.cpp
  2. Move luaC_barrier_(), luaC_barrierback_()
  3. Create BarrierManager class
  4. Keep macros in lgc.h (inline wrappers)
  5. Update barrier call sites if needed
  6. Test: Barrier stress tests

Deliverable: gc_barriers.h/.cpp with BarrierManager class


Phase 6: Extract Phase Classes (30-40 hours)

Goal: Implement phase extraction (Opportunity 3)

Steps:

  1. Create gc_phases/phase_base.h with GCPhase interface
  2. Create individual phase classes (8 classes)
  3. Implement execute() and next() for each
  4. Refactor singlestep() to use phase objects
  5. Test each phase individually
  6. Test full GC cycle
  7. Benchmark performance

Deliverable: 8 phase classes + refactored singlestep()


Phase 7: Optional Generational Mode (15-20 hours)

Goal: Make generational mode compile-time optional (Opportunity 1)

Steps:

  1. Create gc_generational.h/.cpp
  2. Move all generational code under #ifdef LUA_USE_GENERATIONAL_GC
  3. Provide fallback for incremental-only mode
  4. Add CMake option LUA_ENABLE_GENERATIONAL_GC=ON (default)
  5. Test both modes
  6. Benchmark incremental-only mode

Deliverable: Optional generational mode, ~400 lines conditionally compiled


Phase 8: Testing & Documentation (20-25 hours)

Goal: Comprehensive testing of new architecture

Steps:

  1. Create unit tests for each module
  2. Create integration tests for GC cycles
  3. Stress test with large programs
  4. Memory leak testing (Valgrind)
  5. Performance benchmarking (5-run average)
  6. Update documentation
  7. Create module dependency diagram

Deliverable: Test suite, benchmarks, updated docs


6. Impact Assessment

6.1 Code Changes by File

File Current Lines Changes New Lines Impact
lgc.cpp 1,950 Extract to modules ~500 HIGH - 75% reduction
lgc.h 479 Keep interface, move internals ~200 MEDIUM - 60% reduction
lobject.h ~900 Update barrier macros ~900 LOW - Minor changes
ltable.cpp ~800 Update barrier calls ~800 LOW - No changes
lvm.cpp ~2000 Update barrier calls ~2000 LOW - No changes
New files 0 Create modules ~2,000 N/A - New code

Total Impact:

  • Files modified: ~8-10 (lgc.cpp, lgc.h, barrier call sites)
  • Files created: ~12-15 (new modules)
  • Net lines changed: ~500 lines modified, ~2,000 lines moved (not new)
  • Codebase impact: MODERATE (mostly reorganization, minimal logic changes)

6.2 API Impact

Public API (lua.h, lualib.h, lauxlib.h):

  • ❌ NO CHANGES - 100% compatible

Internal API (lgc.h - used by VM):

  • ✅ Barrier macros: Same interface (inline wrappers)
  • luaC_step(): Same signature
  • luaC_fullgc(): Same signature
  • luaC_newobj(): Same signature
  • ✅ Object placement new operators: Unchanged

Impact: ZERO - Complete API compatibility

6.3 Performance Impact

Expected:

  • Phase extraction: +0-2% overhead (virtual function calls)
  • Module boundaries: +0-1% overhead (function calls vs inline)
  • Optional generational: -5% performance if disabled (acceptable tradeoff)
  • Simplified convergence: +0-3% overhead (depends on ephemeron usage)

Target: ≤4.33s (≤3% from 4.20s baseline)

Mitigation:

  • Inline critical functions at module boundaries
  • Profile after each phase, revert if regression >3%
  • Use LTO (Link-Time Optimization) to inline across modules

7. Performance Considerations

7.1 Hot Paths

Critical Paths (must not regress):

  1. VM execution (lvm.cpp) → Barrier calls (every object write)
  2. Table operations (ltable.cpp) → Barrier calls
  3. Object allocation (luaC_newobj) → Called on every allocation
  4. Incremental step (luaC_step) → Called every N allocations

Optimization Strategy:

  • Keep barriers as inline macros (no function call overhead)
  • Inline luaC_newobj() if possible
  • Profile luaC_step() carefully after phase extraction
  • Use __attribute__((always_inline)) for critical functions

7.2 Benchmark Suite

Primary: all.lua test suite (current: 4.20s avg)

Stress Tests:

  1. gc.lua - Basic GC correctness
  2. gengc.lua - Generational GC stress
  3. Large allocation - 10M+ objects
  4. Deep call stacks - 1000+ levels
  5. Many weak tables - 100+ ephemeron tables
  6. Circular references - Complex object graphs

Acceptance Criteria:

  • All tests pass ("final OK !!!")
  • Performance ≤4.33s (≤3% regression)
  • No memory leaks (Valgrind clean)
  • No crashes under stress

8. Recommendations

8.1 Immediate Actions (High Value, Low Risk)

✅ RECOMMEND: Phase 1-5 (Extract Modules)

  • Effort: 75-100 hours
  • Risk: LOW (mostly code movement)
  • Benefit: 40% code organization improvement
  • Priority: HIGH

Modules to extract first:

  1. GCMarking - Clear responsibility (marking algorithms)
  2. GCSweeping - Clear responsibility (sweep algorithms)
  3. FinalizerManager - High complexity, good isolation
  4. WeakTableManager - High complexity, clear boundary
  5. BarrierManager - Clear interface, widely used

Expected Outcome: lgc.cpp reduced from 1,950 to ~500 lines


8.2 Consider (Medium Value, Medium Risk)

🤔 CONSIDER: Phase Extraction (Opportunity 3)

  • Effort: 30-40 hours
  • Risk: MEDIUM (affects GC main loop)
  • Benefit: Better testability, clearer state machine
  • Priority: MEDIUM

Decision Criteria:

  • If testing individual phases is valuable → Implement
  • If performance overhead >2% → Don't implement
  • Prototype first, measure overhead

8.3 Research (Uncertain Value, Higher Risk)

🔬 RESEARCH: Optional Generational Mode (Opportunity 1)

  • Effort: 15-20 hours
  • Risk: MEDIUM (performance-sensitive)
  • Benefit: 20% code reduction when disabled, simpler mental model
  • Priority: LOW

Decision Criteria:

  • Benchmark incremental-only mode first
  • If regression <10% for long-running programs → Implement as option
  • If regression >10% → Don't implement

🔬 RESEARCH: Simplified Age Management (Opportunity 2)

  • Effort: 10-15 hours
  • Risk: MEDIUM
  • Benefit: Simpler state machine (7→4 states)
  • Priority: LOW

Decision Criteria:

  • Only if generational mode retained
  • Benchmark carefully with GC-heavy workloads
  • If regression >5% → Don't implement

🔬 RESEARCH: Ephemeron Simplification (Opportunity 5)

  • Effort: 8-10 hours
  • Risk: MEDIUM
  • Benefit: Simpler convergence logic
  • Priority: LOW

Decision Criteria:

  • Benchmark with ephemeron-heavy code
  • If iteration count increases >20% → Don't implement
  • If performance acceptable → Implement

8.4 Do NOT Implement

❌ DO NOT: Consolidate Gray Lists (Opportunity 4)

  • Reason: Adds 1 byte per object, unclear benefits
  • Alternative: Keep current 5-list approach

❌ DO NOT: State Machine with std::variant (Opportunity 6)

  • Reason: Adds complexity, minimal benefit
  • Alternative: Explicit state machine with phase classes (Opportunity 3) if needed

❌ DO NOT: GC Removal (from GC_REMOVAL_PLAN.md)

  • Reason: Fundamentally incompatible with Lua semantics
  • Reason: Circular references require collection
  • Reason: Std::shared_ptr doesn't solve problem (still need cycle detection = GC!)
  • Verdict: Keep GC, improve its implementation

9. Summary & Next Steps

9.1 Summary of Findings

Current State:

  • 1,950 lines of complex GC code in 2 files
  • 16 linked lists, 8 phases, 7 age states
  • High complexity in generational mode, ephemerons, finalization
  • Already has some modularization (global_State subsystems)

Simplification Potential:

  • 40% code organization improvement through module extraction (HIGH CONFIDENCE)
  • 20% code reduction through optional generational mode (MEDIUM CONFIDENCE)
  • 10% complexity reduction through simplified convergence/ages (LOW CONFIDENCE)

Recommended Strategy:

  1. Extract 5 core modules (marking, sweeping, finalizer, weak, barriers) - 75-100 hours
  2. 🤔 Consider phase extraction if testability important - 30-40 hours
  3. 🔬 Research optional generational after modules extracted - 15-20 hours

Total Effort: 120-160 hours for high-confidence improvements

9.2 Proposed Roadmap

Milestone 1: Module Extraction (8-10 weeks, 75-100 hours)

  • Week 1-2: Extract GCMarking module
  • Week 3-4: Extract GCSweeping module
  • Week 5-6: Extract FinalizerManager
  • Week 7-8: Extract WeakTableManager
  • Week 9-10: Extract BarrierManager, testing, benchmarking

Milestone 2: Phase Extraction (4-5 weeks, 30-40 hours) - OPTIONAL

  • Week 1-2: Create phase interface, implement 4 phase classes
  • Week 3-4: Implement remaining 4 phase classes
  • Week 5: Testing, benchmarking, decision point

Milestone 3: Optional Generational (2-3 weeks, 15-20 hours) - OPTIONAL

  • Week 1-2: Extract generational code, create compile option
  • Week 3: Testing, benchmarking, decision point

Total Timeline: 10-18 weeks depending on optional milestones

9.3 Success Metrics

Code Quality:

  • ✅ lgc.cpp reduced from 1,950 to ≤600 lines
  • ✅ Clear module boundaries with <10 public methods each
  • ✅ Each module testable in isolation
  • ✅ Module dependency graph is acyclic

Performance:

  • ✅ All tests pass ("final OK !!!")
  • ✅ Performance ≤4.33s (≤3% from 4.20s baseline)
  • ✅ No memory leaks (Valgrind clean)
  • ✅ No crashes under stress testing

Maintainability:

  • ✅ Each module has single responsibility
  • ✅ Clear encapsulation (private implementation)
  • ✅ Well-documented interfaces
  • ✅ Easier to onboard new contributors

10. Conclusion

The Lua garbage collector is a sophisticated, performance-critical system that is fundamentally necessary for Lua's semantics. Attempts to remove it entirely (as explored in GC_REMOVAL_PLAN.md) are not feasible due to circular references, weak tables, and resurrection semantics.

However, the implementation complexity can be significantly reduced through:

  1. ✅ Module extraction (HIGH VALUE, LOW RISK) - Extract 5 core modules to reduce lgc.cpp from 1,950 to ~500-600 lines
  2. 🤔 Phase extraction (MEDIUM VALUE, MEDIUM RISK) - Consider if better testability needed
  3. 🔬 Optional generational (UNCERTAIN VALUE, MEDIUM RISK) - Research for embedded systems

Recommended Next Steps:

  1. Start with Module Extraction (Phases 1-5) - 75-100 hours
  2. Benchmark after each module to ensure ≤3% regression
  3. Evaluate phase extraction after modules complete
  4. Defer generational research until modules stable

Key Principle: Incremental improvement with continuous validation - each change must pass all tests and meet performance targets before proceeding.

Expected Outcome: 40% improvement in code organization with zero performance regression and 100% API compatibility.


Document Version: 1.0 Date: 2025-11-17 Status: ANALYSIS COMPLETE - Ready for Phase 1 implementation approval

Related Documents:

  • CLAUDE.md - Project status and guidelines
  • GC_PITFALLS_ANALYSIS.md - Detailed GC architecture analysis
  • GC_REMOVAL_PLAN.md - Why GC removal is not feasible
  • GC_REMOVAL_OWNERSHIP_PLAN.md - Alternative ownership approaches (not recommended)