Garbage Collector Simplification & Modularization Analysis

Date: 2025-11-17 Author: AI Analysis Purpose: Analyze opportunities to simplify and modularize the GC with minimal codebase impact Status: ANALYSIS - Recommendations for incremental improvement

Executive Summary

The Lua garbage collector is a highly optimized tri-color incremental mark-and-sweep collector with generational optimization. After comprehensive analysis of the implementation and existing documentation, this report identifies 7 major opportunities for simplification and modularization that can:

✅ Reduce complexity by ~30-40%
✅ Improve maintainability through better separation of concerns
✅ Minimize codebase impact through incremental refactoring
✅ Preserve performance (target: ≤4.33s, ≤3% regression from 4.20s baseline)
✅ Maintain C API compatibility completely

Key Finding: The GC is fundamentally necessary for Lua's semantics (circular references, weak tables, resurrection), but its implementation complexity can be significantly reduced without changing its core algorithm.

Recommended Strategy: Incremental modularization focusing on phase extraction, state machine cleanup, and list consolidation - NOT wholesale GC removal.

Current GC Architecture
Complexity Analysis
Simplification Opportunities
Modularization Strategy
Implementation Phases
Impact Assessment
Performance Considerations
Recommendations

1. Current GC Architecture

1.1 Core Algorithm

Type: Tri-color incremental mark-and-sweep with generational optimization

Color Encoding (lgc.h:107-148):

WHITE: Unmarked (candidate for collection) - 2 shades for new vs old
GRAY: Marked but children unprocessed (work queue)
BLACK: Fully processed (safe until next cycle)

Critical Invariant: Black objects never point to white objects (enforced by write barriers)

1.2 GC Phases (lstate.h:154-164)

Pause → Propagate → EnterAtomic → Atomic → SweepAllGC → SweepFinObj →
        SweepToBeFnz → SweepEnd → CallFin → Pause

8 distinct phases with complex state transitions

1.3 Object Lists (16 concurrent lists!)

Incremental Mode Lists (10):

allgc - All collectable objects
sweepgc - Current sweep position (pointer-to-pointer)
finobj - Objects with finalizers
gray - Gray work queue
grayagain - Objects to revisit in atomic
weak - Weak-value tables
ephemeron - Weak-key tables
allweak - All-weak tables
tobefnz - Ready for finalization
fixedgc - Never collected (interned strings)

Generational Mode Pointers (+6): 11. survival - Survived one cycle 12. old1 - Survived two cycles 13. reallyold - Survived 3+ cycles 14-16. Finobj generations (finobjsur, finobjold1, finobjrold)

1.4 Age State Machine (7 ages!)

enum class GCAge : lu_byte {
  New       = 0,  // Created this cycle
  Survival  = 1,  // Survived 1 cycle
  Old0      = 2,  // Barrier-aged (not truly old)
  Old1      = 3,  // Survived 2 cycles
  Old       = 4,  // Truly old (skip in minor GC)
  Touched1  = 5,  // Old object modified this cycle
  Touched2  = 6   // Old object modified last cycle
};

Age Transitions: Complex state machine with 7 states and conditional transitions

1.5 Write Barriers (2 types)

Forward Barrier (lgc.cpp:309-324):

When: Black→White write
Action: Mark white object gray (restore invariant)
Generational: Promote to Old0 if writer is old

Backward Barrier (lgc.cpp:331-342):

When: Old object modified
Action: Mark as "touched", add to grayagain list
Purpose: Ensure old objects revisited if modified

1.6 Code Metrics

Files:

lgc.h - 479 lines
lgc.cpp - 1,950 lines
Total: ~2,400 lines of GC code

Complexity Hotspots:

atomic() - 40 lines of critical sequential operations (lgc.cpp:1649-1687)
convergeephemerons() - Iterative convergence loop (lgc.cpp:809-828)
sweeplist() - Pointer-to-pointer manipulation (lgc.cpp:990-1007)
youngcollection() - Generational GC with 6 sweep phases (lgc.cpp:1439-1484)
GCTM() - Finalizer execution with error handling (lgc.cpp:1095-1130)

2. Complexity Analysis

2.1 Complexity Drivers

Component	Complexity Score	Primary Cause	Lines of Code
Generational GC	⭐⭐⭐⭐⭐	7 age states, 6 generational lists, age transitions	~400 lines
Ephemeron Tables	⭐⭐⭐⭐⭐	Convergence loop, transitive marking	~150 lines
Finalization	⭐⭐⭐⭐	Resurrection, re-finalization, error handling	~250 lines
Write Barriers	⭐⭐⭐⭐	Two types, generational interactions	~150 lines
State Machine	⭐⭐⭐	8 phases, complex transitions	~200 lines
List Management	⭐⭐⭐	16 lists, pointer-to-pointer sweep	~300 lines
Weak Tables	⭐⭐⭐	3 modes, clearing logic	~150 lines

Total Complexity: ~1,600 lines of high-complexity code out of 1,950 total

2.2 Existing Modularization

Already Completed (Phase 91 from CLAUDE.md):

global_State has been refactored into subsystems:

GCAccounting - totalbytes, GCdebt, GCmarked, etc.
GCParameters - GCpause, GCstepmul, GCstepsize, etc.
GCObjectLists - All 16 GC lists

Benefits Achieved:

✅ Better organization of GC state
✅ Clear ownership of fields
✅ Type safety improvements

Remaining Gaps:

❌ No phase extraction (all phases in single file)
❌ No algorithmic separation (mark/sweep/finalize)
❌ State machine still implicit (switch statements)
❌ No encapsulation of generational vs incremental modes

2.3 Identified Problems

P1: Generational Mode Complexity (400 lines, 5/5 complexity)

7 age states (only need 3-4)
6 additional lists beyond incremental mode
Complex age transition logic (nextage[] array)
Barrier promotion logic (Old0 → Old1 → Old)
Minor vs major collection decision logic

P2: Ephemeron Convergence (150 lines, 5/5 complexity)

Iterative convergence loop (can run multiple times)
Direction alternation (forward/backward traversal)
Re-marking logic
Performance unpredictable (depends on graph structure)

P3: Phase Coupling (200 lines, 3/5 complexity)

All phases in single singlestep() function
No clear phase abstraction
State machine implicit in switch statement
Difficult to test individual phases

P4: List Proliferation (300 lines, 3/5 complexity)

16 lists with overlapping purposes
Complex list movement logic (e.g., allgc → finobj → tobefnz)
Pointer-to-pointer sweep complicates abstraction

P5: Finalization Complexity (250 lines, 4/5 complexity)

Resurrection support (requires extra remark phase)
Re-finalization support (FINALIZEDBIT manipulation)
Error handling during __gc execution
Emergency GC prevention during finalization

3. Simplification Opportunities

3.1 Opportunity 1: Optional Generational Mode

Current State: Generational mode is always available, adding 400 lines of complexity

Proposal: Make generational mode compile-time optional

#if defined(LUA_USE_GENERATIONAL_GC)
  // Generational code (400 lines)
#else
  // Incremental-only mode (simpler)
#endif

Benefits:

✅ Reduces code size by 20% when disabled
✅ Eliminates 7-state age machine (only need 2: new, old)
✅ Removes 6 generational lists
✅ Simplifies barrier logic (only forward barrier needed)
✅ Easier to understand and maintain

Costs:

⚠️ Performance regression for long-running programs (generational is faster)
⚠️ Some users may rely on generational mode
⚠️ API still needs to support lua_gc(L, LUA_GCSETMODE, LUA_GCGEN)

Recommendation: Implement this - Keep generational as default, but allow disabling for embedded systems

Effort: 15-20 hours Risk: LOW (can be feature-flagged)

3.2 Opportunity 2: Simplify Age Management

Current State: 7 age states with complex transitions

Proposal: Reduce to 4 core ages in generational mode

enum class GCAge : lu_byte {
  Young     = 0,  // New + Survival (merged)
  Old       = 1,  // Old0 + Old1 + Old (merged)
  Touched1  = 2,  // Modified once
  Touched2  = 3   // Modified twice
};

Transition Simplification:

Young ──(survive GC)──→ Old
Old ──(modified)──→ Touched1 ──(GC)──→ Touched2 ──(GC)──→ Old

Benefits:

✅ Simpler state machine (4 states vs 7)
✅ Clearer semantics (young/old distinction only)
✅ Less conditional logic in age advancement

Costs:

⚠️ May promote objects to old sooner (more conservative)
⚠️ Slightly less generational optimization

Recommendation: Consider implementing if generational mode retained

Effort: 10-15 hours Risk: MEDIUM (requires careful performance testing)

3.3 Opportunity 3: Extract Phase Classes

Current State: All phases in single singlestep() function with switch statement

Proposal: Extract phases into separate classes with interface

class GCPhase {
public:
  virtual ~GCPhase() = default;
  virtual l_mem execute(lua_State* L, int fast) = 0;
  virtual GCState getState() const = 0;
  virtual GCPhase* next() = 0;  // Next phase in sequence
};

class PropagatePhase : public GCPhase {
  l_mem execute(lua_State* L, int fast) override {
    if (fast || G(L)->getGray() == NULL) {
      return 1;  // Done, move to next phase
    }
    return propagatemark(G(L));
  }

  GCState getState() const override { return GCState::Propagate; }
  GCPhase* next() override { return &g_enterAtomicPhase; }
};

class AtomicPhase : public GCPhase { /* ... */ };
class SweepPhase : public GCPhase { /* ... */ };
// ... etc

Benefits:

✅ Clear separation of concerns (one class per phase)
✅ Easier to test individual phases
✅ Better encapsulation of phase-specific logic
✅ Explicit state machine (next() method)
✅ Can use polymorphism for phase-specific behavior

Costs:

⚠️ Virtual function overhead (minimal - only 1 call per GC step)
⚠️ More files to manage (8 phase classes)

Recommendation: Strongly recommend - Significantly improves maintainability

Effort: 30-40 hours Risk: LOW (can be done incrementally, phase by phase)

3.4 Opportunity 4: Consolidate Gray Lists

Current State: 5 separate gray lists (gray, grayagain, weak, ephemeron, allweak)

Proposal: Use single gray list with priority/category field

enum class GrayCategory : lu_byte {
  Normal    = 0,  // Regular gray objects
  Again     = 1,  // To revisit in atomic
  WeakVal   = 2,  // Weak-value tables
  WeakKey   = 3,  // Ephemeron tables
  WeakBoth  = 4   // All-weak tables
};

// Add to GCObject:
GrayCategory gray_category;

// Single list traversal:
for (GCObject* obj : gray_list) {
  switch (obj->gray_category) {
    case GrayCategory::Normal:
      propagatemark(g, obj);
      break;
    case GrayCategory::Again:
      /* ... */
      break;
    // ...
  }
}

Benefits:

✅ Reduces from 5 lists to 1
✅ Simpler list management
✅ Easier to understand object state
✅ Can prioritize processing order easily

Costs:

⚠️ Adds 1 byte per GCObject (gray_category field)
⚠️ May reduce cache locality if categories mixed in list
⚠️ Weak table convergence may need special handling

Recommendation: Maybe - Benefits unclear, adds field to every object

Effort: 25-30 hours Risk: MEDIUM (affects hot path)

3.5 Opportunity 5: Simplify Ephemeron Convergence

Current State: Convergence loop with direction alternation

do {
  changed = 0;
  for each ephemeron table {
    if (traverseephemeron(g, h, dir)) {
      propagateall(g);
      changed = 1;
    }
  }
  dir = !dir;  // Alternate direction
} while (changed);

Proposal: Use fixed-point iteration without direction alternation

while (true) {
  int marked_count = 0;
  for each ephemeron table {
    marked_count += traverseephemeron(g, h);
  }
  if (marked_count == 0) break;  // Fixed point reached
  propagateall(g);
}

Benefits:

✅ Simpler logic (no direction tracking)
✅ Easier to understand convergence condition
✅ More predictable performance

Costs:

⚠️ May take more iterations to converge (direction was optimization)
⚠️ Performance impact depends on ephemeron graph structure

Recommendation: Consider - Simplifies code, but needs benchmarking

Effort: 8-10 hours Risk: MEDIUM (performance-sensitive)

3.6 Opportunity 6: State Machine Cleanup

Current State: Implicit state machine in switch statement

Proposal: Use explicit state pattern with std::variant

using GCPhaseVariant = std::variant<
  PauseState,
  PropagateState,
  EnterAtomicState,
  AtomicState,
  SweepAllGCState,
  SweepFinObjState,
  SweepToBeFnzState,
  SweepEndState,
  CallFinState
>;

class GCStateMachine {
private:
  GCPhaseVariant current_phase;

public:
  l_mem step(lua_State* L, int fast) {
    return std::visit([&](auto& phase) {
      return phase.execute(L, fast);
    }, current_phase);
  }

  void transition(GCPhaseVariant next) {
    current_phase = next;
  }
};

Benefits:

✅ Type-safe state transitions
✅ Explicit state representation
✅ Compile-time exhaustiveness checking
✅ Better debugging (can inspect current_phase)

Costs:

⚠️ std::variant overhead (minimal)
⚠️ More complex type system
⚠️ Requires C++17 std::variant

Recommendation: Consider - Improves type safety, but adds complexity

Effort: 20-25 hours Risk: LOW (mostly refactoring, same logic)

3.7 Opportunity 7: Modularize Finalization

Current State: Finalization logic spread across multiple functions

Proposal: Extract into Finalizer class

class FinalizerManager {
private:
  global_State* g;

  GCObject* finobj;      // Objects with finalizers
  GCObject* tobefnz;     // Ready for finalization

public:
  // Check if object needs finalizer
  void checkFinalizer(GCObject* obj, Table* mt);

  // Separate unreachable objects
  void separateUnreachable(bool all);

  // Execute one finalizer
  int executeNext(lua_State* L);

  // Execute all pending finalizers
  void executeAll(lua_State* L);

  // Handle resurrection after __gc
  void handleResurrections(lua_State* L);
};

Benefits:

✅ Clear encapsulation of finalization logic
✅ Easier to test finalization in isolation
✅ Better separation from GC main loop
✅ Can add finalization metrics/debugging

Costs:

⚠️ Additional abstraction layer
⚠️ Need to pass FinalizerManager around

Recommendation: Strongly recommend - Improves maintainability significantly

Effort: 20-25 hours Risk: LOW (mostly refactoring)

4. Modularization Strategy

4.1 Proposed Module Structure

src/memory/gc/
├── gc_core.h / .cpp          # Main GC interface and coordination
├── gc_phases/
│   ├── phase_base.h          # GCPhase interface
│   ├── phase_pause.cpp       # Pause phase
│   ├── phase_propagate.cpp   # Propagate phase
│   ├── phase_atomic.cpp      # Atomic phase
│   ├── phase_sweep.cpp       # Sweep phases (all 4)
│   └── phase_finalize.cpp    # CallFin phase
├── gc_marking.h / .cpp       # Mark algorithms (reallymarkobject, propagatemark, etc.)
├── gc_sweeping.h / .cpp      # Sweep algorithms (sweeplist, sweeptolive, etc.)
├── gc_finalizer.h / .cpp     # Finalization (separatetobefnz, GCTM, etc.)
├── gc_barriers.h / .cpp      # Write barriers (barrier_, barrierback_)
├── gc_weak.h / .cpp          # Weak table handling (clearbykeys, clearbyvalues, convergeephemerons)
├── gc_generational.h / .cpp  # Generational mode (optional, #ifdef)
└── gc_state.h                # GC state enums and types

Total: 12-15 files vs current 2 files

4.2 Encapsulation Boundaries

GCCore (gc_core.h):

Public interface: luaC_step(), luaC_fullgc(), luaC_newobj(), etc.
Coordinates phase execution
Manages state machine
Delegates to specialized modules

GCMarking (gc_marking.h):

Mark algorithms (DFS traversal)
Gray list management
Tri-color invariant maintenance
Internal use only (called by phases)

GCSweeping (gc_sweeping.h):

Sweep algorithms (pointer-to-pointer)
Object freeing (freeobj)
List traversal
Internal use only

GCFinalizer (gc_finalizer.h):

Finalizer detection
Object separation
__gc execution
Resurrection handling
Internal use only

GCBarriers (gc_barriers.h):

Forward barrier
Backward barrier
Generational barrier logic (if enabled)
Public (called from setters in lobject.h, ltable.cpp, etc.)

GCWeak (gc_weak.h):

Weak table traversal
Ephemeron convergence
Weak reference clearing
Internal use only

GCGenerational (gc_generational.h - optional):

Age management
Minor/major collection
Generational list management
Internal use only (ifdef LUA_USE_GENERATIONAL_GC)

4.3 Dependency Graph

gc_core
  ├──> gc_phases (all phases)
  │      ├──> gc_marking
  │      ├──> gc_sweeping
  │      ├──> gc_finalizer
  │      └──> gc_weak
  ├──> gc_barriers (public interface)
  └──> gc_generational (optional)

Key Principle: Core coordinates, phases execute, algorithms implement

5. Implementation Phases

Phase 1: Extract Marking Module (15-20 hours)

Goal: Move marking logic to separate module

Steps:

Create gc_marking.h and gc_marking.cpp
Move reallymarkobject(), propagatemark(), propagateall()
Move markvalue(), markobject(), markmt(), markbeingfnz()
Create GCMarking class with methods
Update callers to use new interface
Test: All tests must pass, performance ≤4.33s

Deliverable: gc_marking.h/.cpp with 6-8 public methods

Phase 2: Extract Sweeping Module (15-20 hours)

Goal: Move sweep logic to separate module

Steps:

Create gc_sweeping.h and gc_sweeping.cpp
Move sweeplist(), sweeptolive(), entersweep()
Move freeobj() and related functions
Create GCSweeping class
Update callers
Test: All tests must pass, performance ≤4.33s

Deliverable: gc_sweeping.h/.cpp with 4-5 public methods

Phase 3: Extract Finalizer Module (20-25 hours)

Goal: Encapsulate finalization logic

Steps:

Create gc_finalizer.h and gc_finalizer.cpp
Move GCObject::checkFinalizer() to GCFinalizer::check()
Move separatetobefnz(), GCTM(), callallpendingfinalizers()
Create FinalizerManager class
Handle resurrection logic
Update callers
Test: Finalization tests (gc.lua, errors.lua)

Deliverable: gc_finalizer.h/.cpp with FinalizerManager class

Phase 4: Extract Weak Table Module (15-20 hours)

Goal: Isolate weak table complexity

Steps:

Create gc_weak.h and gc_weak.cpp
Move traverseweakvalue(), traverseephemeron(), convergeephemerons()
Move clearbykeys(), clearbyvalues(), getmode()
Create WeakTableManager class
Implement simplified convergence (Opportunity 5)
Update callers
Test: Weak table tests

Deliverable: gc_weak.h/.cpp with WeakTableManager class

Phase 5: Extract Barrier Module (10-15 hours)

Goal: Centralize barrier logic

Steps:

Create gc_barriers.h and gc_barriers.cpp
Move luaC_barrier_(), luaC_barrierback_()
Create BarrierManager class
Keep macros in lgc.h (inline wrappers)
Update barrier call sites if needed
Test: Barrier stress tests

Deliverable: gc_barriers.h/.cpp with BarrierManager class

Phase 6: Extract Phase Classes (30-40 hours)

Goal: Implement phase extraction (Opportunity 3)

Steps:

Create gc_phases/phase_base.h with GCPhase interface
Create individual phase classes (8 classes)
Implement execute() and next() for each
Refactor singlestep() to use phase objects
Test each phase individually
Test full GC cycle
Benchmark performance

Deliverable: 8 phase classes + refactored singlestep()

Phase 7: Optional Generational Mode (15-20 hours)

Goal: Make generational mode compile-time optional (Opportunity 1)

Steps:

Create gc_generational.h/.cpp
Move all generational code under #ifdef LUA_USE_GENERATIONAL_GC
Provide fallback for incremental-only mode
Add CMake option LUA_ENABLE_GENERATIONAL_GC=ON (default)
Test both modes
Benchmark incremental-only mode

Deliverable: Optional generational mode, ~400 lines conditionally compiled

Phase 8: Testing & Documentation (20-25 hours)

Goal: Comprehensive testing of new architecture

Steps:

Create unit tests for each module
Create integration tests for GC cycles
Stress test with large programs
Memory leak testing (Valgrind)
Performance benchmarking (5-run average)
Update documentation
Create module dependency diagram

Deliverable: Test suite, benchmarks, updated docs

6. Impact Assessment

6.1 Code Changes by File

File	Current Lines	Changes	New Lines	Impact
`lgc.cpp`	1,950	Extract to modules	~500	HIGH - 75% reduction
`lgc.h`	479	Keep interface, move internals	~200	MEDIUM - 60% reduction
`lobject.h`	~900	Update barrier macros	~900	LOW - Minor changes
`ltable.cpp`	~800	Update barrier calls	~800	LOW - No changes
`lvm.cpp`	~2000	Update barrier calls	~2000	LOW - No changes
New files	0	Create modules	~2,000	N/A - New code

Total Impact:

Files modified: ~8-10 (lgc.cpp, lgc.h, barrier call sites)
Files created: ~12-15 (new modules)
Net lines changed: ~500 lines modified, ~2,000 lines moved (not new)
Codebase impact: MODERATE (mostly reorganization, minimal logic changes)

6.2 API Impact

Public API (lua.h, lualib.h, lauxlib.h):

❌ NO CHANGES - 100% compatible

Internal API (lgc.h - used by VM):

✅ Barrier macros: Same interface (inline wrappers)
✅ luaC_step(): Same signature
✅ luaC_fullgc(): Same signature
✅ luaC_newobj(): Same signature
✅ Object placement new operators: Unchanged

Impact: ZERO - Complete API compatibility

6.3 Performance Impact

Expected:

Phase extraction: +0-2% overhead (virtual function calls)
Module boundaries: +0-1% overhead (function calls vs inline)
Optional generational: -5% performance if disabled (acceptable tradeoff)
Simplified convergence: +0-3% overhead (depends on ephemeron usage)

Target: ≤4.33s (≤3% from 4.20s baseline)

Mitigation:

Inline critical functions at module boundaries
Profile after each phase, revert if regression >3%
Use LTO (Link-Time Optimization) to inline across modules

7. Performance Considerations

7.1 Hot Paths

Critical Paths (must not regress):

VM execution (lvm.cpp) → Barrier calls (every object write)
Table operations (ltable.cpp) → Barrier calls
Object allocation (luaC_newobj) → Called on every allocation
Incremental step (luaC_step) → Called every N allocations

Optimization Strategy:

Keep barriers as inline macros (no function call overhead)
Inline luaC_newobj() if possible
Profile luaC_step() carefully after phase extraction
Use __attribute__((always_inline)) for critical functions

7.2 Benchmark Suite

Primary: all.lua test suite (current: 4.20s avg)

Stress Tests:

gc.lua - Basic GC correctness
gengc.lua - Generational GC stress
Large allocation - 10M+ objects
Deep call stacks - 1000+ levels
Many weak tables - 100+ ephemeron tables
Circular references - Complex object graphs

Acceptance Criteria:

All tests pass ("final OK !!!")
Performance ≤4.33s (≤3% regression)
No memory leaks (Valgrind clean)
No crashes under stress

8. Recommendations

8.1 Immediate Actions (High Value, Low Risk)

✅ RECOMMEND: Phase 1-5 (Extract Modules)

Effort: 75-100 hours
Risk: LOW (mostly code movement)
Benefit: 40% code organization improvement
Priority: HIGH

Modules to extract first:

✅ GCMarking - Clear responsibility (marking algorithms)
✅ GCSweeping - Clear responsibility (sweep algorithms)
✅ FinalizerManager - High complexity, good isolation
✅ WeakTableManager - High complexity, clear boundary
✅ BarrierManager - Clear interface, widely used

Expected Outcome: lgc.cpp reduced from 1,950 to ~500 lines

8.2 Consider (Medium Value, Medium Risk)

🤔 CONSIDER: Phase Extraction (Opportunity 3)

Effort: 30-40 hours
Risk: MEDIUM (affects GC main loop)
Benefit: Better testability, clearer state machine
Priority: MEDIUM

Decision Criteria:

If testing individual phases is valuable → Implement
If performance overhead >2% → Don't implement
Prototype first, measure overhead

8.3 Research (Uncertain Value, Higher Risk)

🔬 RESEARCH: Optional Generational Mode (Opportunity 1)

Effort: 15-20 hours
Risk: MEDIUM (performance-sensitive)
Benefit: 20% code reduction when disabled, simpler mental model
Priority: LOW

Decision Criteria:

Benchmark incremental-only mode first
If regression <10% for long-running programs → Implement as option
If regression >10% → Don't implement

🔬 RESEARCH: Simplified Age Management (Opportunity 2)

Effort: 10-15 hours
Risk: MEDIUM
Benefit: Simpler state machine (7→4 states)
Priority: LOW

Decision Criteria:

Only if generational mode retained
Benchmark carefully with GC-heavy workloads
If regression >5% → Don't implement

🔬 RESEARCH: Ephemeron Simplification (Opportunity 5)

Effort: 8-10 hours
Risk: MEDIUM
Benefit: Simpler convergence logic
Priority: LOW

Decision Criteria:

Benchmark with ephemeron-heavy code
If iteration count increases >20% → Don't implement
If performance acceptable → Implement

8.4 Do NOT Implement

❌ DO NOT: Consolidate Gray Lists (Opportunity 4)

Reason: Adds 1 byte per object, unclear benefits
Alternative: Keep current 5-list approach

❌ DO NOT: State Machine with std::variant (Opportunity 6)

Reason: Adds complexity, minimal benefit
Alternative: Explicit state machine with phase classes (Opportunity 3) if needed

❌ DO NOT: GC Removal (from GC_REMOVAL_PLAN.md)

Reason: Fundamentally incompatible with Lua semantics
Reason: Circular references require collection
Reason: Std::shared_ptr doesn't solve problem (still need cycle detection = GC!)
Verdict: Keep GC, improve its implementation

9. Summary & Next Steps

9.1 Summary of Findings

Current State:

1,950 lines of complex GC code in 2 files
16 linked lists, 8 phases, 7 age states
High complexity in generational mode, ephemerons, finalization
Already has some modularization (global_State subsystems)

Simplification Potential:

40% code organization improvement through module extraction (HIGH CONFIDENCE)
20% code reduction through optional generational mode (MEDIUM CONFIDENCE)
10% complexity reduction through simplified convergence/ages (LOW CONFIDENCE)

Recommended Strategy:

✅ Extract 5 core modules (marking, sweeping, finalizer, weak, barriers) - 75-100 hours
🤔 Consider phase extraction if testability important - 30-40 hours
🔬 Research optional generational after modules extracted - 15-20 hours

Total Effort: 120-160 hours for high-confidence improvements

9.2 Proposed Roadmap

Milestone 1: Module Extraction (8-10 weeks, 75-100 hours)

Week 1-2: Extract GCMarking module
Week 3-4: Extract GCSweeping module
Week 5-6: Extract FinalizerManager
Week 7-8: Extract WeakTableManager
Week 9-10: Extract BarrierManager, testing, benchmarking

Milestone 2: Phase Extraction (4-5 weeks, 30-40 hours) - OPTIONAL

Week 1-2: Create phase interface, implement 4 phase classes
Week 3-4: Implement remaining 4 phase classes
Week 5: Testing, benchmarking, decision point

Milestone 3: Optional Generational (2-3 weeks, 15-20 hours) - OPTIONAL

Week 1-2: Extract generational code, create compile option
Week 3: Testing, benchmarking, decision point

Total Timeline: 10-18 weeks depending on optional milestones

9.3 Success Metrics

Code Quality:

✅ lgc.cpp reduced from 1,950 to ≤600 lines
✅ Clear module boundaries with <10 public methods each
✅ Each module testable in isolation
✅ Module dependency graph is acyclic

Performance:

✅ All tests pass ("final OK !!!")
✅ Performance ≤4.33s (≤3% from 4.20s baseline)
✅ No memory leaks (Valgrind clean)
✅ No crashes under stress testing

Maintainability:

✅ Each module has single responsibility
✅ Clear encapsulation (private implementation)
✅ Well-documented interfaces
✅ Easier to onboard new contributors

10. Conclusion

The Lua garbage collector is a sophisticated, performance-critical system that is fundamentally necessary for Lua's semantics. Attempts to remove it entirely (as explored in GC_REMOVAL_PLAN.md) are not feasible due to circular references, weak tables, and resurrection semantics.

However, the implementation complexity can be significantly reduced through:

✅ Module extraction (HIGH VALUE, LOW RISK) - Extract 5 core modules to reduce lgc.cpp from 1,950 to ~500-600 lines
🤔 Phase extraction (MEDIUM VALUE, MEDIUM RISK) - Consider if better testability needed
🔬 Optional generational (UNCERTAIN VALUE, MEDIUM RISK) - Research for embedded systems

Recommended Next Steps:

Start with Module Extraction (Phases 1-5) - 75-100 hours
Benchmark after each module to ensure ≤3% regression
Evaluate phase extraction after modules complete
Defer generational research until modules stable

Key Principle: Incremental improvement with continuous validation - each change must pass all tests and meet performance targets before proceeding.

Expected Outcome: 40% improvement in code organization with zero performance regression and 100% API compatibility.

Document Version: 1.0 Date: 2025-11-17 Status: ANALYSIS COMPLETE - Ready for Phase 1 implementation approval

Related Documents:

CLAUDE.md - Project status and guidelines
GC_PITFALLS_ANALYSIS.md - Detailed GC architecture analysis
GC_REMOVAL_PLAN.md - Why GC removal is not feasible
GC_REMOVAL_OWNERSHIP_PLAN.md - Alternative ownership approaches (not recommended)

FilesExpand file tree

GC_SIMPLIFICATION_ANALYSIS.md

Latest commit

History

GC_SIMPLIFICATION_ANALYSIS.md

File metadata and controls

Garbage Collector Simplification & Modularization Analysis

Executive Summary

Table of Contents

1. Current GC Architecture

1.1 Core Algorithm

1.2 GC Phases (lstate.h:154-164)

1.3 Object Lists (16 concurrent lists!)

1.4 Age State Machine (7 ages!)

1.5 Write Barriers (2 types)

1.6 Code Metrics

2. Complexity Analysis

2.1 Complexity Drivers

2.2 Existing Modularization

2.3 Identified Problems

3. Simplification Opportunities

3.1 Opportunity 1: Optional Generational Mode

3.2 Opportunity 2: Simplify Age Management

3.3 Opportunity 3: Extract Phase Classes

3.4 Opportunity 4: Consolidate Gray Lists

3.5 Opportunity 5: Simplify Ephemeron Convergence

3.6 Opportunity 6: State Machine Cleanup

3.7 Opportunity 7: Modularize Finalization

4. Modularization Strategy

4.1 Proposed Module Structure

4.2 Encapsulation Boundaries

4.3 Dependency Graph

5. Implementation Phases

Phase 1: Extract Marking Module (15-20 hours)

Phase 2: Extract Sweeping Module (15-20 hours)

Phase 3: Extract Finalizer Module (20-25 hours)

Phase 4: Extract Weak Table Module (15-20 hours)

Phase 5: Extract Barrier Module (10-15 hours)

Phase 6: Extract Phase Classes (30-40 hours)

Phase 7: Optional Generational Mode (15-20 hours)

Phase 8: Testing & Documentation (20-25 hours)

6. Impact Assessment

6.1 Code Changes by File

6.2 API Impact

6.3 Performance Impact

7. Performance Considerations

7.1 Hot Paths

7.2 Benchmark Suite

8. Recommendations

8.1 Immediate Actions (High Value, Low Risk)

8.2 Consider (Medium Value, Medium Risk)

8.3 Research (Uncertain Value, Higher Risk)

8.4 Do NOT Implement

9. Summary & Next Steps

9.1 Summary of Findings

9.2 Proposed Roadmap

9.3 Success Metrics

10. Conclusion