Skip to content

Latest commit

 

History

History
1008 lines (796 loc) · 30.6 KB

File metadata and controls

1008 lines (796 loc) · 30.6 KB

Single Responsibility Principle Analysis - Lua C++ Classes

Date: 2025-11-15 Status: Analysis Complete - Implementation Planning Phase Impact: Architectural improvement opportunities identified


Executive Summary

This document analyzes all 19 classes in the Lua C++ codebase for Single Responsibility Principle (SRP) violations. The analysis identifies 3 major violations (global_State, lua_State, FuncState) where decomposition could improve maintainability, and several classes that are well-designed.

Key Finding: While some classes violate SRP from a pure OOP perspective, Lua's design philosophy prioritizes performance and simplicity over strict architectural purity. Any refactoring must maintain zero-cost abstraction and C API compatibility.


🔴 MAJOR SRP VIOLATIONS

1. global_State (46+ fields) - HIGHEST PRIORITY

Location: lstate.h:644-872 Complexity: God Object with 8 distinct responsibilities Refactoring Impact: High value, medium risk

Current Responsibilities

Responsibility Fields Description
Memory Allocation 2 frealloc, ud
GC Accounting 4 GCtotalbytes, GCdebt, GCmarked, GCmajorminor
GC Parameters 7 gcparams[LUA_GCPN], currentwhite, gcstate, gckind, gcstopem, gcstp, gcemergency
GC Object Lists (Incremental) 10 allgc, sweepgc, finobj, gray, grayagain, weak, ephemeron, allweak, tobefnz, fixedgc
GC Object Lists (Generational) 7 survival, old1, reallyold, firstold1, finobjsur, finobjold1, finobjrold
String Management 2+ strt, strcache[STRCACHE_N][STRCACHE_M]
Type System 6 l_registry, nilvalue, seed, mt[LUA_NUMTYPES], tmname[TM_N]
Runtime Services 6 twups, panic, memerrmsg, warnf, ud_warn, mainth

Proposed Decomposition

// 1. Memory Management
class MemoryAllocator {
private:
    lua_Alloc frealloc;
    void *ud;

public:
    void* allocate(size_t size);
    void* reallocate(void* ptr, size_t oldsize, size_t newsize);
    void deallocate(void* ptr, size_t size);
};

// 2. GC Accounting
class GCAccounting {
private:
    l_mem totalbytes;  // Total allocated + debt
    l_mem debt;        // Debt to trigger GC
    l_mem marked;      // Objects marked in current cycle
    l_mem majorminor;  // Major/minor shift counter

public:
    void addBytes(l_mem bytes);
    void removeBytes(l_mem bytes);
    bool shouldCollect() const;
    l_mem getTotalBytes() const { return totalbytes - debt; }
};

// 3. GC Configuration
class GCParameters {
private:
    lu_byte params[LUA_GCPN];  // GC tuning parameters
    lu_byte currentwhite;      // Current white color
    GCState state;             // Current GC phase
    GCKind kind;               // Incremental/Generational
    lu_byte stopem;            // Stop emergency collections
    lu_byte stp;               // Stop GC flag
    lu_byte emergency;         // Emergency collection flag

public:
    lu_byte getParam(int idx) const;
    void setParam(int idx, lu_byte value);
    bool isRunning() const { return stp == 0; }
    GCState getState() const { return state; }
    void setState(GCState s) { state = s; }
};

// 4. GC Object Lists Manager
class GCObjectLists {
private:
    // Incremental collector lists
    GCObject *allgc;      // All collectable objects
    GCObject **sweepgc;   // Current sweep position
    GCObject *finobj;     // Objects with finalizers
    GCObject *gray;       // Gray objects (mark phase)
    GCObject *grayagain;  // Objects to revisit
    GCObject *weak;       // Weak-value tables
    GCObject *ephemeron;  // Ephemeron tables
    GCObject *allweak;    // All-weak tables
    GCObject *tobefnz;    // To be finalized
    GCObject *fixedgc;    // Never collected (reserved words)

    // Generational collector lists
    GCObject *survival;   // Survived one cycle
    GCObject *old1;       // Old generation 1
    GCObject *reallyold;  // Old generation 2+
    GCObject *firstold1;  // First OLD1 object (optimization)
    GCObject *finobjsur;  // Survival objects with finalizers
    GCObject *finobjold1; // Old1 objects with finalizers
    GCObject *finobjrold; // Really old objects with finalizers

public:
    GCObject* getAllGC() const { return allgc; }
    GCObject** getAllGCPtr() { return &allgc; }
    void setAllGC(GCObject* gc) { allgc = gc; }

    // ... accessors for all lists

    void linkObject(GCObject* obj);
    void unlinkObject(GCObject* obj);
};

// 5. String Cache & Interning
class StringCache {
private:
    stringtable strt;                            // String interning table
    TString *cache[STRCACHE_N][STRCACHE_M];      // API string cache

public:
    TString* intern(const char* str, size_t len);
    TString* getCached(unsigned int hash) const;
    void setCached(unsigned int hash, TString* str);

    stringtable* getTable() { return &strt; }
};

// 6. Type System
class TypeSystem {
private:
    TValue registry;              // Lua registry
    TValue nilvalue;              // Canonical nil value
    unsigned int seed;            // Hash seed
    Table *metatables[LUA_NUMTYPES];  // Type metatables
    TString *tmnames[TM_N];       // Tag method names

public:
    TValue* getRegistry() { return &registry; }
    Table* getMetatable(int type) const { return metatables[type]; }
    void setMetatable(int type, Table* mt) { metatables[type] = mt; }
    TString* getTMName(TMS tm) const { return tmnames[tm]; }
    unsigned int getSeed() const { return seed; }
};

// 7. Runtime Services
class RuntimeServices {
private:
    lua_State *twups;         // Threads with open upvalues
    lua_CFunction panic;      // Panic handler
    TString *memerrmsg;       // Memory error message
    lua_WarnFunction warnf;   // Warning function
    void *ud_warn;            // Warning user data
    LX mainth;                // Main thread

public:
    lua_State* getMainThread() { return &mainth.l; }
    lua_CFunction getPanic() const { return panic; }
    void setPanic(lua_CFunction p) { panic = p; }
    TString* getMemErrMsg() const { return memerrmsg; }

    void warning(lua_State* L, const char* msg, int tocont);
};

// 8. New global_State - Composition over Inheritance
class global_State {
private:
    MemoryAllocator memory;
    GCAccounting gcAccounting;
    GCParameters gcParams;
    GCObjectLists gcLists;
    StringCache strings;
    TypeSystem types;
    RuntimeServices runtime;

public:
    // Delegation methods (inline for zero cost)
    MemoryAllocator& getMemory() { return memory; }
    GCAccounting& getGCAccounting() { return gcAccounting; }
    GCParameters& getGCParams() { return gcParams; }
    GCObjectLists& getGCLists() { return gcLists; }
    StringCache& getStrings() { return strings; }
    TypeSystem& getTypes() { return types; }
    RuntimeServices& getRuntime() { return runtime; }

    // Backward compatibility wrappers
    lua_Alloc getFrealloc() const { return memory.getFrealloc(); }
    l_mem getGCTotalBytes() const { return gcAccounting.getTotalBytes(); }
    GCState getGCState() const { return gcParams.getState(); }
    // ... etc
};

Benefits of Decomposition

  1. Clarity: Each component has a single, clear purpose
  2. Testing: Components can be tested independently
  3. Maintenance: Changes to GC don't affect string caching, etc.
  4. Documentation: Smaller classes are easier to document
  5. Reusability: Components could be reused in other contexts

Risks & Mitigation

Risk Mitigation
Performance regression All accessors must be inline - compiler should optimize away
Increased memory usage Use composition, not pointers - same memory layout
API breakage Provide backward-compatible wrapper methods
Cache misses Keep frequently-accessed fields together in memory

Implementation Strategy

  1. Phase 1: Create new component classes alongside existing fields
  2. Phase 2: Add delegation methods to global_State
  3. Phase 3: Migrate code to use new interfaces (one module at a time)
  4. Phase 4: Benchmark after each module (must stay ≤2.21s)
  5. Phase 5: Remove old direct field access
  6. Phase 6: Make fields private, complete encapsulation

Estimated Effort: 40-60 hours Risk Level: Medium (requires careful performance validation)


2. lua_State (27 fields)

Location: lstate.h:374-604 Complexity: Multiple responsibilities mixed together Refactoring Impact: Medium-high value, high risk (VM hot path)

Current Responsibilities

Responsibility Fields Description
Stack Management 4 top, stack_last, stack, tbclist
Call Chain 3 ci, base_ci, nci
GC Tracking 4 l_G, openupval, gclist, twups
Error Handling 3 status, errorJmp, errfunc
Debug Hooks 7 hook, hookmask, allowhook, oldpc, basehookcount, hookcount, transferinfo
Call Counting 1 nCcalls
(Total: 22 core fields)

Proposed Decomposition

// 1. Stack Manager
class StackManager {
private:
    StkIdRel top;          // First free slot
    StkIdRel stack_last;   // End of stack
    StkIdRel stack;        // Stack base
    StkIdRel tbclist;      // To-be-closed variables

public:
    // Inline accessors for hot path
    inline StkIdRel& getTop() noexcept { return top; }
    inline StkIdRel& getStackLast() noexcept { return stack_last; }
    inline StkIdRel& getStack() noexcept { return stack; }
    inline int getSize() const noexcept {
        return cast_int(stack_last.p - stack.p);
    }

    // Methods (implemented in ldo.cpp)
    void grow(int n);
    void shrink();
    int sizeInUse();
    void reallocate(int newsize);
};

// 2. Call Info Manager
class CallInfoManager {
private:
    CallInfo *current;     // Current call frame
    CallInfo baseCI;       // Base-level call info
    int numCallInfos;      // Number of CallInfo nodes

public:
    inline CallInfo* getCurrent() noexcept { return current; }
    inline CallInfo** getCurrentPtr() noexcept { return &current; }
    inline CallInfo* getBase() noexcept { return &baseCI; }

    CallInfo* push();
    void pop();
    void extend();
    void shrink();
};

// 3. Debug Hook Manager
class DebugHookManager {
private:
    volatile lua_Hook hook;
    volatile l_signalT hookmask;
    lu_byte allowhook;
    int oldpc;                  // Last traced PC
    int basehookcount;
    int hookcount;
    struct {
        int ftransfer;          // Transfer offset
        int ntransfer;          // Transfer count
    } transferinfo;

public:
    inline lua_Hook getHook() const noexcept { return hook; }
    inline l_signalT getMask() const noexcept { return hookmask; }
    inline bool isEnabled() const noexcept { return allowhook != 0; }

    void setHook(lua_Hook h, l_signalT mask);
    void call(lua_State* L, int event, int line);
    void callOnReturn(CallInfo* ci, int nres);
    void resetCount();
};

// 4. Error Context
class ErrorContext {
private:
    TStatus status;
    lua_longjmp *errorJmp;
    ptrdiff_t errfunc;

public:
    inline TStatus getStatus() const noexcept { return status; }
    inline ptrdiff_t getErrFunc() const noexcept { return errfunc; }

    [[noreturn]] void doThrow(TStatus code);
    [[noreturn]] void throwError();
    void setErrorObj(lua_State* L, TStatus code, StkId oldtop);
};

// 5. Simplified lua_State
class lua_State : public GCBase<lua_State> {
private:
    // Core subsystems
    StackManager stack;
    CallInfoManager calls;
    DebugHookManager hooks;
    ErrorContext errors;

    // Simple fields (GC tracking)
    global_State *globalState;
    UpVal *openupval;
    GCObject *gclist;
    lua_State *twups;

    // Call depth tracking
    l_uint32 nCcalls;

public:
    // Inline delegation (zero cost)
    inline StkIdRel& getTop() noexcept { return stack.getTop(); }
    inline CallInfo* getCI() noexcept { return calls.getCurrent(); }
    inline TStatus getStatus() const noexcept { return errors.getStatus(); }

    // Direct subsystem access for complex operations
    StackManager& getStack() { return stack; }
    CallInfoManager& getCalls() { return calls; }
    DebugHookManager& getHooks() { return hooks; }
    ErrorContext& getErrors() { return errors; }

    // High-level operations
    void call(StkId func, int nResults);
    void callNoYield(StkId func, int nResults);
    [[noreturn]] void runError(const char* fmt, ...);
};

Benefits

  1. Testability: Stack/call/hook managers can be unit tested
  2. Clarity: Each manager has focused responsibility
  3. Maintenance: Hook changes don't affect stack, etc.

Risks

⚠️ CRITICAL PERFORMANCE CONCERN: lua_State is accessed in the VM hot path. Every accessor call must be inlined or performance will degrade.

Mitigation:

  • All accessors must be inline
  • Verify assembly output matches current code
  • Benchmark VM intensive operations (loops, function calls)
  • Consider keeping direct field access in lvm.cpp if needed

Implementation Strategy

Status: ⚠️ DEFER until global_State refactoring complete Reason: Too risky for VM performance

If Attempted:

  1. Start with DebugHookManager (least performance-critical)
  2. Benchmark after each subsystem extraction
  3. Abort if any regression > 0.5%

3. FuncState (16 fields)

Location: lparser.h:237-468 Complexity: Compiler state with mixed responsibilities Refactoring Impact: High value, low risk (compile-time only)

Current Responsibilities

Responsibility Fields Description
Proto Management 2 f, np
Context Linking 3 prev, ls, bl
Constant Pool 2 kcache, nk
Code Buffer 5 pc, lasttarget, previousline, nabslineinfo, iwthabs
Variable Scope 4 firstlocal, firstlabel, ndebugvars, nactvar
Register Allocation 1 freereg
Upvalue Management 2 nups, needclose

Proposed Decomposition

// 1. Code Buffer - Bytecode generation
class CodeBuffer {
private:
    int pc;               // Program counter (next instruction)
    int lasttarget;       // Last jump label
    int previousline;     // Previous line (for line info)
    int nabslineinfo;     // Absolute line info count
    lu_byte iwthabs;      // Instructions since abs line info

public:
    inline int getPC() const noexcept { return pc; }
    inline int nextPC() noexcept { return pc++; }

    int emit(Instruction i);
    void patchJump(int position, int destination);
    void saveLineInfo(int line);
    void removeLastInstruction();
};

// 2. Constant Pool - Constant deduplication
class ConstantPool {
private:
    Table *cache;         // Constant cache (for deduplication)
    int count;            // Number of constants in proto

public:
    inline int getCount() const noexcept { return count; }

    int addConstant(TValue *value);
    int stringK(TString *s);
    int numberK(lua_Number n);
    int intK(lua_Integer i);
    int boolK(bool b);
    int nilK();
};

// 3. Variable Scope Tracker
class VariableScope {
private:
    int firstlocal;       // First local in this function
    int firstlabel;       // First label in this function
    short ndebugvars;     // Number of debug variables
    short nactvar;        // Number of active variables

public:
    inline int getFirstLocal() const noexcept { return firstlocal; }
    inline short getNumActiveVars() const noexcept { return nactvar; }

    void addLocalVar(TString *name);
    void removeVars(int tolevel);
    Vardesc* getLocalVar(int idx);
};

// 4. Register Allocator
class RegisterAllocator {
private:
    lu_byte freereg;      // First free register

public:
    inline lu_byte getFreeReg() const noexcept { return freereg; }
    inline void setFreeReg(lu_byte reg) noexcept { freereg = reg; }

    int allocReg();
    void freeReg(int reg);
    void freeRegs(int r1, int r2);
    void reserveRegs(int n);
    void checkStack(int n);
};

// 5. Upvalue Tracker
class UpvalueTracker {
private:
    lu_byte nups;         // Number of upvalues
    lu_byte needclose;    // Needs close on return

public:
    inline lu_byte getNumUpvalues() const noexcept { return nups; }
    inline bool needsClose() const noexcept { return needclose != 0; }

    int searchUpvalue(TString *name);
    int newUpvalue(TString *name, expdesc *v);
    void markToBeClosed();
};

// 6. Simplified FuncState
class FuncState {
private:
    // Context (kept as-is)
    Proto *proto;
    FuncState *parent;
    LexState *lexer;
    BlockCnt *block;

    // Subsystems
    CodeBuffer code;
    ConstantPool constants;
    VariableScope scope;
    RegisterAllocator registers;
    UpvalueTracker upvalues;

    // Nested function count
    int numProtos;

public:
    // High-level code generation
    void codeABC(OpCode o, int a, int b, int c);
    void codeABx(OpCode o, int a, int bx);

    // Expression compilation
    void exp2nextreg(expdesc *e);
    void discharge2reg(expdesc *e, int reg);

    // Access subsystems
    CodeBuffer& getCode() { return code; }
    ConstantPool& getConstants() { return constants; }
    VariableScope& getScope() { return scope; }
    RegisterAllocator& getRegisters() { return registers; }
    UpvalueTracker& getUpvalues() { return upvalues; }
};

Benefits

  1. Clarity: Each subsystem has clear purpose
  2. Testing: Can test constant pool, register allocator independently
  3. Maintenance: Changes to one subsystem don't affect others
  4. Safety: No performance risk (compile-time only)

Implementation Strategy

Status: ✅ RECOMMENDED - Low risk, high value Priority: Start here before attempting global_State or lua_State

  1. Phase 1: Create subsystem classes (1-2 hours per subsystem)
  2. Phase 2: Add delegation methods to FuncState
  3. Phase 3: Migrate lcode.cpp to use new interfaces (10 hours)
  4. Phase 4: Migrate lparser.cpp to use new interfaces (10 hours)
  5. Phase 5: Remove old direct field access
  6. Phase 6: Make subsystems private

Estimated Effort: 30-40 hours Risk Level: Low (compile-time only, no runtime impact) Expected Benefit: Significantly improved compiler code clarity


🟡 MODERATE SRP VIOLATIONS

4. Proto (19 fields)

Location: lobject.h:883-1023 Complexity: Function prototype with debug info mixed in Refactoring Impact: Medium value, low risk

Current Responsibilities

Responsibility Fields Description
Function Signature 3 numparams, maxstacksize, flag
Bytecode 2 code, sizecode
Constants 2 k, sizek
Nested Functions 2 p, sizep
Upvalues 2 upvalues, sizeupvalues
Line Info 4 lineinfo, sizelineinfo, abslineinfo, sizeabslineinfo
Local Variables 2 locvars, sizelocvars
Source Info 3 linedefined, lastlinedefined, source
GC 1 gclist

Analysis

Observation: Debug information (line info, local variables, source) is only needed for debugging. Runtime execution only needs bytecode, constants, upvalues.

Proposed Decomposition

// 1. Debug Information (optional, can be stripped)
class ProtoDebugInfo {
private:
    // Line information
    ls_byte *lineinfo;
    int lineinfo_size;
    AbsLineInfo *abslineinfo;
    int abslineinfo_size;

    // Local variables
    LocVar *locvars;
    int locvars_size;

    // Source information
    int linedefined;
    int lastlinedefined;
    TString *source;

public:
    int getLine(int pc) const;
    const char* getLocalName(int local, int pc) const;
    void free(lua_State* L);
};

// 2. Simplified Proto
class Proto : public GCBase<Proto> {
private:
    // Runtime data (always present)
    lu_byte numparams;
    lu_byte flag;
    lu_byte maxstacksize;

    Instruction *code;
    int sizecode;

    TValue *k;
    int sizek;

    Proto **p;
    int sizep;

    Upvaldesc *upvalues;
    int sizeupvalues;

    // Debug data (can be null if stripped)
    ProtoDebugInfo *debug;

    GCObject *gclist;

public:
    // Runtime accessors (inline)
    inline Instruction* getCode() const noexcept { return code; }
    inline TValue* getConstants() const noexcept { return k; }
    inline lu_byte getNumParams() const noexcept { return numparams; }

    // Debug accessors (check for null)
    inline bool hasDebugInfo() const noexcept { return debug != nullptr; }
    inline int getLine(int pc) const {
        return debug ? debug->getLine(pc) : linedefined;
    }

    void stripDebugInfo(lua_State* L);
};

Benefits

  1. Memory savings: Can strip debug info in production
  2. Clarity: Separates runtime vs debug concerns
  3. Loading: Can load debug info on-demand

Risks

  • Memory overhead if debug info always allocated separately
  • Complexity in serialization (dump/undump)

Recommendation

Status: ⏸️ OPTIONAL - Consider for future optimization Priority: Low - Current design works well

Alternative: Keep current design but add comments explaining separation


5. LexState (11 fields)

Location: llex.h:93-246 Complexity: Lexer state with multiple concerns Refactoring Impact: Low-medium value, low risk

Current Responsibilities

Responsibility Fields Description
Input Stream 2 current, z
Token Buffer 2 t, lookahead
Line Tracking 2 linenumber, lastline
Parser Context 3 fs, L, dyd
String Interning 2 buff, h
Predefined Names 4 source, envn, brkn, glbn

Proposed Decomposition

// 1. Input Scanner
class InputScanner {
private:
    int current;          // Current character
    ZIO *stream;          // Input stream
    int linenumber;       // Current line
    int lastline;         // Last line consumed

public:
    inline int getCurrent() const noexcept { return current; }
    inline int getLine() const noexcept { return linenumber; }

    int next();
    void skip();
    bool isNewline() const;
    void skipLine();
};

// 2. Token Buffer
class TokenBuffer {
private:
    Token current;
    Token lookahead;

public:
    inline const Token& getCurrent() const noexcept { return current; }
    inline Token& getCurrentRef() noexcept { return current; }

    void next();
    int peek();
    void consume(int expected);
};

// 3. String Interner
class StringInterner {
private:
    Mbuffer *buffer;      // String buffer
    Table *table;         // String cache

public:
    TString* intern(const char *str, size_t len);
    void save(char c);
    void reset();
};

// 4. Simplified LexState
class LexState {
private:
    InputScanner scanner;
    TokenBuffer tokens;
    StringInterner strings;

    // Predefined names (could be extracted to PredefinedNames class)
    TString *source, *envn, *brkn, *glbn;

    // Context (kept as-is)
    FuncState *funcState;
    lua_State *luaState;
    Dyndata *dyndata;

public:
    void nextToken();
    int lookaheadToken();
    [[noreturn]] void syntaxError(const char *msg);
};

Benefits

  1. Clarity: Scanner/tokenizer/interner separated
  2. Testing: Can test scanner independently
  3. Reusability: Scanner could be reused for other tools

Recommendation

Status: ⏸️ OPTIONAL - Minor improvement Priority: Low - Current design is acceptable


🟢 ACCEPTABLE (No Significant SRP Violations)

6. Table (7 fields)

Status: ✅ GOOD DESIGN Reason: Dual representation (array + hash) is fundamental to Lua's table design. Splitting would harm the tight coupling needed for efficient resize operations.

7. CallInfo (9 fields in unions)

Status: ✅ GOOD DESIGN Reason: Discriminated union for C vs Lua calls is correct. Splitting would require vtables (violates zero-cost principle).

8. TString (9 fields)

Status: ✅ GOOD DESIGN Reason: Variable-length string with short/long optimization. Union is intentional.

9. UpVal (2 unions)

Status: ✅ GOOD DESIGN Reason: Open/closed state with different data. Union is correct.

10. CClosure / LClosure (4-5 fields each)

Status: ✅ GOOD DESIGN Reason: Function closures with variable-size arrays. Clean design.

11. Udata (5 fields)

Status: ✅ GOOD DESIGN Reason: Userdata with metadata. Appropriate for use case.

12. Node (union)

Status: ✅ GOOD DESIGN Reason: Compact hash node representation. Intentional optimization.

13. expdesc (8 fields in union)

Status: ✅ GOOD DESIGN Reason: Expression descriptor with multiple kinds. Union is correct for compile-time data.

14. Vardesc (6 fields in union)

Status: ✅ GOOD DESIGN Reason: Variable descriptor with compile-time constant value. Union is appropriate.


📊 SUMMARY & RECOMMENDATIONS

Refactoring Priority Matrix

Class SRP Violation Value Risk Priority Effort
FuncState High High Low ⭐⭐⭐ START HERE 30-40h
global_State Very High Very High Medium ⭐⭐ Consider 40-60h
Proto Medium Medium Low ⭐ Optional 20-30h
lua_State High High Very High ⚠️ Defer 60-80h
LexState Low Low Low ⏸️ Skip 15-20h

Implementation Roadmap

Phase 1: Low-Risk Wins (FuncState)

Goal: Prove decomposition works without performance impact Duration: 4-6 weeks Tasks:

  1. Extract CodeBuffer class
  2. Extract ConstantPool class
  3. Extract VariableScope class
  4. Extract RegisterAllocator class
  5. Extract UpvalueTracker class
  6. Verify compiler tests pass
  7. Benchmark compilation time (should be same or better)

Success Criteria:

  • All tests pass
  • Compilation time ≤ baseline
  • Code is clearer and more testable

Phase 2: High-Value Refactoring (global_State)

Goal: Decompose god object into focused components Duration: 6-8 weeks Tasks:

  1. Extract MemoryAllocator class
  2. Extract GCAccounting class
  3. Extract GCParameters class
  4. Extract GCObjectLists class
  5. Extract StringCache class
  6. Extract TypeSystem class
  7. Extract RuntimeServices class
  8. Migrate modules one at a time
  9. Benchmark after each module (must stay ≤2.21s)

Success Criteria:

  • All tests pass
  • Performance ≤2.21s (≤1% regression)
  • Code is significantly clearer
  • Components can be tested independently

Phase 3: Optional Refinements

Goal: Further improvements based on lessons learned Duration: 2-4 weeks Tasks:

  • Consider Proto debug info separation
  • Consider LexState refinements
  • Document architectural decisions

Phase 4: High-Risk (lua_State) - IF NEEDED

Goal: Decompose VM state (only if proven beneficial) Duration: 8-10 weeks Tasks:

  • Prototype in branch
  • Extensive benchmarking
  • Assembly verification
  • Abort if any regression

⚠️ WARNING: Only attempt if Phases 1-2 show clear benefits


⚠️ CRITICAL CONSTRAINTS

1. Performance Requirements

  • Zero-cost abstraction: All refactoring must compile to identical or better code
  • Benchmark threshold: ≤2.21s (≤1% regression from 2.17s baseline)
  • Hot path protection: VM interpreter (lvm.cpp) must not be slowed
  • Inline everything: Accessors must be inline or constexpr

2. C API Compatibility

  • Public API in lua.h, lauxlib.h, lualib.h must remain unchanged
  • Memory layout can change for internal types
  • Opaque pointers (lua_State*, Table*, etc.) can be refactored internally

3. Lua Design Philosophy

  • Simplicity over purity: Don't add complexity for theoretical benefits
  • Performance first: Lua is an embedded language - speed matters
  • Proven design: Current architecture has decades of production use
  • Global state is OK: Lua is designed with global GC, string interning, etc.

4. Testing Requirements

  • All 30+ test files must pass
  • Benchmark after every significant change
  • No silent behavior changes
  • Consider fuzzing for new interfaces

🎯 PRACTICAL GUIDELINES

When to Refactor

DO refactor if:

  • Responsibility separation is clear and natural
  • No performance impact expected (compile-time code)
  • Testing becomes significantly easier
  • Code becomes significantly clearer

DON'T refactor if:

  • Performance risk is high (VM hot path)
  • Separation feels forced or artificial
  • Current code is already clear
  • Benefits are purely theoretical

How to Measure Success

  1. Clarity: Is the code easier to understand?
  2. Testing: Can we test components in isolation?
  3. Performance: Benchmark shows ≤1% regression
  4. Maintenance: Are future changes easier?

Design Principles

  1. Composition over inheritance: Use aggregation, not base classes
  2. Inline everything: Zero-cost abstraction via inline/constexpr
  3. Cohesion: Keep related data together (cache locality)
  4. Encapsulation: Private fields with controlled access

📚 REFERENCES

Related Documents

  • CLAUDE.md - Main project documentation
  • ENCAPSULATION_PLAN.md - Field encapsulation strategy (completed)
  • CMAKE_BUILD.md - Build system documentation

Performance Benchmarks

  • Baseline: 2.17s (testes/all.lua)
  • Target: ≤2.21s (≤1% regression)
  • Command: for i in 1 2 3 4 5; do ../build/lua all.lua 2>&1 | grep "total time:"; done

Code Locations

  • VM hot path: src/vm/lvm.cpp
  • Compiler: src/compiler/lparser.cpp, src/compiler/lcode.cpp
  • GC: src/memory/lgc.cpp
  • State: src/core/lstate.cpp, src/core/ldo.cpp

📝 NOTES & OBSERVATIONS

Why Some "Violations" Are Actually Good Design

  1. global_State as God Object

    • Lua has inherently global state (GC, string pool)
    • Centralizing in one object can improve cache locality
    • Splitting might hurt performance if fields accessed together
  2. lua_State Mixed Responsibilities

    • VM needs everything in one place for hot path access
    • Stack/calls/hooks are often accessed together
    • Splitting could introduce pointer chasing
  3. Table Dual Representation

    • Array and hash parts must coordinate for resize
    • Splitting would harm the tight coupling needed
    • Current design is optimal for Lua's use cases

Lessons from Original Lua C Code

Lua's original C code used structs with direct field access. The C++ conversion has already added significant abstraction:

  • Private fields + accessors (19/19 classes)
  • Type safety (enum classes, inline functions)
  • CRTP for GC objects
  • Exception handling

Further abstraction must prove its value - we're already more abstract than original Lua!


🔄 REVISION HISTORY

  • 2025-11-15: Initial SRP analysis (based on Phase 80-87 codebase)
  • Status: Analysis complete, awaiting implementation decision

END OF DOCUMENT