Project: Lua 5.5 C++ Conversion Codebase Size: ~37,646 lines across 72 files Current Progress: ~63% modernized (19/19 classes encapsulated, ~500 macros converted) Analysis Date: 2025-11-17
This analysis identifies all remaining C idioms that can be aggressively converted to modern, idiomatic C++23 while maintaining:
- ✅ Zero performance regression (target ≤4.33s)
- ✅ C API compatibility (public interface only)
- ✅ All tests passing
| Category | Found | Status | Priority | Est. Hours |
|---|---|---|---|---|
| Macros | ~478 definitions | 37% converted | HIGH | 40-60h |
| Old-Style Enums | 8 typedef enum | Unconverted | HIGH | 8-12h |
| Plain Enums | 2 enum (OpMode, RESERVED) | Unconverted | MEDIUM | 2-3h |
| Typedefs | 90 declarations | 0% converted | MEDIUM | 15-20h |
| NULL Literals | 591 occurrences | Unconverted | MEDIUM | 10-15h |
| goto Statements | 133 uses (21 files) | Valid uses | LOW | Case-by-case |
| C-style Casts | ~60 remaining | Partially converted | MEDIUM | 8-10h |
| Unions | 5 unions (StackValue, Value, etc.) | Keep (necessary) | N/A | N/A |
| Function Pointers | 21 occurrences | Keep (C API) | N/A | N/A |
| Manual Memory Mgmt | 330 malloc/free calls | Keep (custom allocator) | N/A | N/A |
- Total macros: ~478 across 31 header files
- Converted: ~500 macros (~37%)
- Remaining: ~75 convertible macros
Location: include/lua.h:440-447
// CONVERTIBLE to inline functions (but keep macros for C API compatibility)
#define lua_isfunction(L,n) (lua_type(L, (n)) == LUA_TFUNCTION)
#define lua_istable(L,n) (lua_type(L, (n)) == LUA_TTABLE)
#define lua_islightuserdata(L,n) (lua_type(L, (n)) == LUA_TLIGHTUSERDATA)
#define lua_isnil(L,n) (lua_type(L, (n)) == LUA_TNIL)
#define lua_isboolean(L,n) (lua_type(L, (n)) == LUA_TBOOLEAN)
#define lua_isthread(L,n) (lua_type(L, (n)) == LUA_TTHREAD)
#define lua_isnone(L,n) (lua_type(L, (n)) == LUA_TNONE)
#define lua_isnoneornil(L, n) (lua_type(L, (n)) <= 0)Recommendation: Keep as macros for C API compatibility, but add C++ overloads:
// C API macros (keep)
#define lua_isfunction(L,n) (lua_type(L, (n)) == LUA_TFUNCTION)
// C++ inline overloads for internal use
inline bool lua_isfunction(lua_State* L, int n) noexcept {
return lua_type(L, n) == LUA_TFUNCTION;
}Location: src/memory/llimits.h, include/luaconf.h
// KEEP - Platform detection
#define L_P2I uintptr_t
#define l_noret void __attribute__((noreturn))
// KEEP - Compile-time configuration
#define MAX_LMEM (cast(l_mem, (cast(lu_mem, 1) << (l_numbits(l_mem) - 1)) - 1))
#define MAX_SIZE (sizeof(size_t) < sizeof(lua_Integer) ? MAX_SIZET : cast_sizet(LUA_MAXINTEGER))Location: src/memory/llimits.h:153-206
✅ Already converted (good work!):
constexpr inline lua_Number cast_num(auto i) noexcept;
constexpr inline int cast_int(auto i) noexcept;
constexpr inline lu_byte cast_byte(auto i) noexcept;
// ... etc#define cast(t, exp) ((t)(exp)) // Generic cast - needed
#define cast_void(i) static_cast<void>(i) // Macro for comma expressions
#define cast_sizet(i) cast(size_t, (i)) // Avoid conversion warnings
#define cast_voidp(i) cast(void*, (i)) // Pointer casts
#define cast_charp(i) cast(char*, (i)) // String castsRecommendation: Keep as-is (necessary for flexibility).
Location: include/luaconf.h:401-612
// CONVERTIBLE to constexpr inline functions
#define l_floor(x) (l_mathop(floor)(x))
#define l_floatatt(n) (FLT_##n) // Token pasting - KEEP
#define lua_str2number(s,p) ((lua_Number)strtod((s), (p)))Conversion:
inline lua_Number l_floor(lua_Number x) noexcept {
return std::floor(x);
}
inline lua_Number lua_str2number(const char* s, char** p) noexcept {
return static_cast<lua_Number>(strtod(s, p));
}Phase-by-Phase Approach:
-
Phase 100: Public API wrapper macros (8h)
- Add C++ inline overloads for lua.h macros
- Keep existing macros for C compatibility
- Target: 15-20 functions
-
Phase 101: Math/utility macros (4h)
- Convert luaconf.h math operations
- Target: 10-15 functions
- Benchmark after completion
-
Phase 102: Character type macros (Already done! ✅)
- lctype.h already uses inline functions
- No work needed
Total Estimated Time: 12-15 hours
A. Binary/Unary Operators (src/compiler/llex.h:26-43)
// OLD STYLE - C89
typedef enum BinOpr {
OPR_ADD, OPR_SUB, OPR_MUL, OPR_MOD, OPR_POW,
OPR_DIV, OPR_IDIV,
OPR_BAND, OPR_BOR, OPR_BXOR,
OPR_SHL, OPR_SHR,
OPR_CONCAT,
OPR_EQ, OPR_LT, OPR_LE,
OPR_NE, OPR_GT, OPR_GE,
OPR_AND, OPR_OR,
OPR_NOBINOPR
} BinOpr;
typedef enum UnOpr {
OPR_MINUS, OPR_BNOT, OPR_NOT, OPR_LEN, OPR_NOUNOPR
} UnOpr;MODERN C++23:
enum class BinOpr : lu_byte {
Add, Sub, Mul, Mod, Pow,
Div, IDiv,
BAnd, BOr, BXor,
Shl, Shr,
Concat,
Eq, Lt, Le,
Ne, Gt, Ge,
And, Or,
NoBinOpr
};
enum class UnOpr : lu_byte {
Minus, BNot, Not, Len, NoUnOpr
};Impact: ~30 files use BinOpr/UnOpr - need scoped access changes.
B. Tag Methods (src/core/ltm.h:18-45)
// OLD STYLE
typedef enum {
TM_INDEX,
TM_NEWINDEX,
TM_GC,
TM_MODE,
TM_LEN,
TM_EQ, /* last tag method with fast access */
TM_ADD,
// ... 20+ more
TM_N /* number of elements in the enum */
} TMS;MODERN C++23:
enum class TagMethod : lu_byte {
Index,
NewIndex,
GC,
Mode,
Len,
Eq, // last tag method with fast access
Add,
// ... converted names
Count // number of elements
};Impact: Critical - used throughout VM for metamethod dispatch. Risk: Medium (need careful conversion of array indexing).
C. Expression Kinds (src/compiler/lparser.h:28-60)
typedef enum {
VVOID,
VNIL,
VTRUE,
VFALSE,
VK,
VKFLT,
VKINT,
VKSTR,
// ... 15+ more
} expkind;MODERN C++23:
enum class ExpKind : lu_byte {
Void,
Nil,
True,
False,
K,
KFlt,
KInt,
KStr,
// ... etc
};D. F2Imod (Float-to-Integer Mode) (src/vm/lvm.h:55-59, duplicated in 3 files!)
// PROBLEM: Defined 3 times with header guards
#ifndef F2Imod_defined
#define F2Imod_defined
typedef enum {
F2Ieq, /* no rounding */
F2Ifloor, /* floor */
F2Iceil /* ceiling */
} F2Imod;
#endifMODERN C++23 (single definition):
// In src/vm/lvm.h (single source of truth)
enum class F2Imod : lu_byte {
Eq, // no rounding
Floor, // floor
Ceil // ceiling
};Impact: Remove duplicate definitions, consolidate to single header.
E. OpCode (src/compiler/lopcodes.h:358-450)
typedef enum {
OP_MOVE,
OP_LOADI,
OP_LOADF,
// ... 80+ opcodes
OP_EXTRAARG
} OpCode;Analysis:
F. Comparison Modes (src/objects/lobject.h:1759-1763)
typedef enum {
CMP_EQ, CMP_LT, CMP_LE
} CmpOp;MODERN C++23:
enum class CmpOp : lu_byte {
Eq, Lt, Le
};Impact: LOW - simple conversion.
A. OpMode (src/compiler/lopcodes.h:36)
enum OpMode {iABC, ivABC, iABx, iAsBx, iAx, isJ};MODERN C++23:
enum class OpMode : lu_byte {
ABC, vABC, ABx, AsBx, Ax, sJ
};B. RESERVED (src/compiler/llex.h:61)
enum RESERVED {
/* terminal symbols denoted by reserved words */
TK_AND = FIRST_RESERVED, TK_BREAK,
// ... 20+ keywords
};MODERN C++23:
enum class Reserved : int {
And = FIRST_RESERVED,
Break,
// ... etc
};Phased Approach:
-
Phase 103: Low-impact enums (4h)
- Convert: UnOpr, CmpOp, OpMode, Reserved
- Low file impact, simple conversions
- Benchmark after completion
-
Phase 104: Medium-impact enums (6h)
- Convert: BinOpr, expkind, F2Imod
- Update ~30 call sites
- Consolidate F2Imod duplicates
- Benchmark after completion
-
Phase 105: High-impact enums (8h)
- Convert: TMS (TagMethod)
- Critical VM component
- Careful array indexing conversion
- Extensive testing required
-
Phase 106: Hot-path enums (DEFER)
- OpCode - only after proving zero overhead
- VM instruction decode path
- Requires performance validation
Total Estimated Time: 18-20 hours (excluding OpCode)
- Total typedefs: 90 declarations across 30 files
- Using declarations: 0 (none converted yet)
// lua.h - Public C API (KEEP AS-IS)
typedef int (*lua_CFunction) (lua_State *L);
typedef int (*lua_KFunction) (lua_State *L, int status, lua_KContext ctx);
typedef const char * (*lua_Reader) (lua_State *L, void *ud, size_t *sz);
typedef int (*lua_Writer) (lua_State *L, const void *p, size_t sz, void *ud);
typedef void * (*lua_Alloc) (void *ud, void *ptr, size_t osize, size_t nsize);
typedef void (*lua_WarnFunction) (void *ud, const char *msg, int tocont);
typedef void (*lua_Hook) (lua_State *L, lua_Debug *ar);
// Internal (CONVERTIBLE)
typedef void (*Pfunc) (lua_State *L, void *ud); // lstate.hRecommendation:
// Public API - keep typedef for C compatibility
typedef int (*lua_CFunction) (lua_State *L);
// Internal - convert to using
using Pfunc = void (*)(lua_State* L, void* ud);typedef struct lua_State lua_State; // lua.h
typedef struct lua_Debug lua_Debug; // lua.h
typedef struct CallInfo CallInfo; // lstate.h
typedef struct luaL_Buffer luaL_Buffer; // lauxlib.h
typedef struct luaL_Reg luaL_Reg; // lauxlib.h
typedef struct Zio ZIO; // lzio.hMODERN C++23:
// Already classes - remove typedef entirely!
class lua_State;
class lua_Debug;
class CallInfo;
// Or for structs still needed:
struct luaL_Reg; // Just forward declare, no typedef// llimits.h - Integer types
typedef LUAI_MEM l_mem;
typedef LUAI_UMEM lu_mem;
typedef unsigned char lu_byte;
typedef signed char ls_byte;
typedef lu_byte TStatus;
// llimits.h - Float types
typedef LUAI_UACNUMBER l_uacNumber;
typedef LUAI_UACINT l_uacInt;
// lobject.h - Instruction/stack types
typedef l_uint32 Instruction;
typedef StackValue *StkId;MODERN C++23:
// Type aliases with using
using l_mem = LUAI_MEM;
using lu_mem = LUAI_UMEM;
using lu_byte = unsigned char;
using ls_byte = signed char;
using TStatus = lu_byte;
using l_uacNumber = LUAI_UACNUMBER;
using l_uacInt = LUAI_UACINT;
using Instruction = l_uint32;
using StkId = StackValue*;// lobject.h
typedef union StackValue { ... } StackValue;
typedef union Value { ... } Value;
typedef union UValue { ... } UValue;
typedef union StkIdRel { ... } StkIdRel;
typedef union Closure { ... } Closure;
// llex.h
typedef union SemInfo { ... } SemInfo;
typedef struct Token { ... } Token;MODERN C++23:
// Remove redundant typedef - just name the union/struct
union StackValue { ... };
union Value { ... };
union UValue { ... };
struct Token { ... };Phase 107: typedef → using migration (15-20h)
- Step 1: Public API typedefs (keep as-is for C compatibility)
- Step 2: Internal type aliases (l_mem, lu_byte, etc.) - convert to
using - Step 3: Forward declarations - remove typedef, use direct class/struct declaration
- Step 4: Union/struct typedefs - remove redundant typedef
Total Estimated Time: 15-20 hours
- NULL occurrences: 591 across 44 files
- nullptr usage: Already used in modern code
High-impact files:
src/memory/lgc.cpp: 54 NULL usessrc/libraries/loadlib.cpp: 42 NULLsrc/core/ldebug.cpp: 39 NULLsrc/testing/ltests.cpp: 44 NULLsrc/libraries/lstrlib.cpp: 36 NULLsrc/core/lstate.cpp: 30 NULLsrc/core/lapi.cpp: 29 NULLsrc/libraries/liolib.cpp: 28 NULL
Strategy:
# Automated conversion (safe with modern compilers)
# Can be done file-by-file with verification
sed -i 's/\bNULL\b/nullptr/g' src/memory/lgc.cpp
# Build, test, benchmark after each filePhase 108: NULL → nullptr (10-15h)
- Systematic file-by-file conversion
- Build and test after each file
- Benchmark after major subsystems
Total Estimated Time: 10-15 hours
- Total goto statements: 133 across 21 files
- Test files: 49 goto (goto.lua) - valid test cases
- Source files: 84 goto uses
// VALID USE - jump table pattern
case 'a': c = '\a'; goto read_save;
case 'b': c = '\b'; goto read_save;
case 'f': c = '\f'; goto read_save;
case 'n': c = '\n'; goto read_save;
case 'r': c = '\r'; goto read_save;
case 't': c = '\t'; goto read_save;
case 'v': c = '\v'; goto read_save;
case 'x': c = readHexaEsc(); goto read_save;Analysis: Classic jump table pattern - KEEP. Alternative: Could use function pointer table, but goto is clearer.
retry:
// ... attempt operation ...
if (condition) {
// ... adjust state ...
goto retry; /* try again */
}Analysis: Retry pattern - VALID, but could be refactored to while loop.
Alternative:
bool retry = true;
while (retry) {
retry = false;
// ... attempt operation ...
if (condition) {
// ... adjust state ...
retry = true; /* try again */
}
}// Complex GC sweep logic with multiple exit points
if (condition1)
goto remove;
if (condition2)
goto remain;
// ...
remove:
// cleanup code
remain:
// continue iterationAnalysis: State machine pattern in GC - ACCEPTABLE. Alternative: Could use labeled continue/break with structured loops.
// Break statement handling
if (found_break_target) {
goto ok;
}
// error handling
ok:
// continue parsingAnalysis: Error handling shortcut - ACCEPTABLE.
Recommendation: DEFER - Most gotos are legitimate.
Optional Phase 109: goto refactoring (Case-by-case, 20-30h)
- Only convert where it improves readability
- Focus on retry loops (easy conversion to while)
- Keep lexer jump tables (clearer as goto)
- Keep GC state machine (complex to refactor)
Priority: LOW - Not idiomatic concern, performance-neutral.
Remaining occurrences: ~60 C-style casts in codebase
Examples:
// Libraries still use C-style casts
(int)lua_tointeger(L, 1) // ldblib.cpp
(unsigned int)lua_tounsigned(L, 1) // ltablib.cpp
(char*)malloc(size) // VariousConversion:
static_cast<int>(lua_tointeger(L, 1))
static_cast<unsigned int>(lua_tounsigned(L, 1))
static_cast<char*>(malloc(size))Phase 110: C-style cast elimination (8-10h)
- Convert to static_cast/reinterpret_cast
- Particularly in library files (ldblib, lstrlib, ltablib, liolib)
Occurrences: 24 across 12 files
Current usage (appropriate for low-level VM):
memcpy(dest, src, size); // Byte copying
memset(ptr, 0, size); // Zero initializationModern alternatives:
std::copy_n(src, size, dest); // For typed arrays
std::fill_n(ptr, size, value); // For initialization
std::memcpy(dest, src, size); // For byte blobs (acceptable in C++)Recommendation: KEEP - memcpy/memset are appropriate for low-level VM operations. Only convert where type safety adds value.
Status: 330 malloc/realloc/free calls across 36 files
Analysis: MUST KEEP - Lua uses custom allocator (lua_Alloc) for:
- Memory tracking
- Sandboxing
- Embedding scenarios
- GC integration
Not convertible to:
- ❌ std::unique_ptr (custom allocator integration complex)
- ❌ std::vector (need precise memory control)
- ❌ RAII wrappers (Lua's error handling via exceptions handles cleanup)
Exception: LuaVector wrapper already used successfully! ✅
| Phase | Category | Target | Est. Hours | Risk | Priority |
|---|---|---|---|---|---|
| 100 | Public API wrapper functions | lua.h macros → inline overloads | 8h | LOW | HIGH |
| 101 | Math/utility macros | luaconf.h → constexpr | 4h | LOW | HIGH |
| 103 | Low-impact enums | UnOpr, CmpOp, OpMode, Reserved | 4h | LOW | HIGH |
| 104 | Medium-impact enums | BinOpr, expkind, F2Imod | 6h | MED | HIGH |
| 105 | High-impact enums | TMS (TagMethod) | 8h | MED | HIGH |
| 107 | typedef → using | All internal typedefs | 15-20h | LOW | MEDIUM |
| 108 | NULL → nullptr | All 591 occurrences | 10-15h | LOW | MEDIUM |
| 110 | C-style casts | Libraries & internal code | 8-10h | LOW | MEDIUM |
| 106 | Hot-path enums | OpCode (DEFER) | TBD | HIGH | LOW |
| 109 | goto refactoring | Optional cleanup | 20-30h | MED | LOW |
High Priority: 30-35 hours Medium Priority: 33-45 hours Low Priority (optional): 20-30 hours
Grand Total: 63-80 hours for full aggressive modernization
Benchmark after every phase:
cd /home/user/lua_cpp/testes
for i in 1 2 3 4 5; do \
../build/lua all.lua 2>&1 | grep "total time:"; \
donePerformance Targets:
- ✅ Target: ≤4.33s (3% tolerance from 4.20s baseline)
⚠️ Warning: 4.33-4.50s (investigate)- ❌ Revert: >4.50s (exceeds tolerance)
Hot Path Risk Analysis:
- LOW RISK: typedef→using, NULL→nullptr, cast conversions (no codegen impact)
- MEDIUM RISK: Non-hot-path enums (BinOpr, expkind)
- HIGH RISK: VM enums (OpCode, TagMethod - extensive usage)
# 1. Create branch
git checkout -b claude/phase-N-description
# 2. Make changes (file-by-file with Edit tool)
# 3. Build
cmake --build build
# 4. Test
cd testes && ../build/lua all.lua
# 5. Benchmark (if significant change)
for i in 1 2 3 4 5; do ../build/lua all.lua 2>&1 | grep "total time:"; done
# 6. Commit
git add . && git commit -m "Phase N: Description"
# 7. Push
git push -u origin claude/phase-N-description✅ KEEP - C API Compatibility:
- Public API macros in
lua.h,lauxlib.h,lualib.h - Function pointer typedefs for C callbacks
- C-compatible struct layouts
✅ KEEP - Performance Critical:
- OpCode enum (defer until proven safe)
- Instruction decode macros
- VM hot-path gotos
✅ KEEP - Platform Compatibility:
- Configuration macros (
L_P2I,l_noret, etc.) - Preprocessor conditionals for platform detection
- Token-pasting macros
✅ KEEP - Necessary Patterns:
- Unions (Value, StackValue, Closure) - type punning required
- Manual memory management - custom allocator integration
- Low-level memcpy/memset - appropriate for bytecode manipulation
Excellent work already done:
- ✅ 19/19 classes fully encapsulated
- ✅ CRTP inheritance for GC objects
- ✅ ~500 macros converted to inline functions
- ✅ Character type functions (lctype.h)
- ✅ Cast functions (cast_int, cast_byte, etc.)
- ✅ Modern CMake build system
- ✅ C++ exceptions replacing setjmp/longjmp
- ✅ LuaStack centralization (Phase 94)
- ✅ SRP refactoring (FuncState, global_State, Proto)
Week 1-2: Enum Modernization (18-20h)
- Phase 103: Low-impact enums (UnOpr, CmpOp, OpMode, Reserved)
- Phase 104: Medium-impact enums (BinOpr, expkind, F2Imod)
- Phase 105: High-impact enums (TMS/TagMethod)
- Benchmark after each phase
- Commit after each successful conversion
Week 3-4: Type System Cleanup (23-35h) 4. Phase 107: typedef → using declarations 5. Phase 108: NULL → nullptr conversion 6. Phase 110: C-style cast elimination
- Lower risk, systematic conversions
- Can batch commits by subsystem
Week 5: Public API Enhancement (Optional, 12h) 7. Phase 100: Add C++ inline overloads for lua.h macros 8. Phase 101: Math macro conversion
- Enhances C++ user experience
- Maintains C compatibility
Technical Goals:
- ✅ All old-style enums → enum class
- ✅ All typedef → using (internal code)
- ✅ Zero NULL literals (all nullptr)
- ✅ Zero C-style casts (use static_cast/reinterpret_cast)
- ✅ Performance: ≤4.33s (maintain baseline)
- ✅ All tests passing
Modernization Progress:
- Current: ~63% modern C++
- Target: ~85-90% modern C++ (realistic with C API boundary)
- Remaining 10-15%: Necessary C idioms for compatibility
- typedef → using declarations ✅
- NULL → nullptr ✅
- C-style casts → static_cast ✅
- Non-hot-path enums ✅
- TagMethod enum (extensive usage)
⚠️ - BinOpr/expkind enums (compiler-only)
⚠️ - Public API wrapper functions
⚠️
- OpCode enum (VM instruction decode)
⚠️ ⚠️ - Lexer/parser gotos (complex control flow)
⚠️ ⚠️ - Instruction manipulation macros
⚠️ ⚠️
- C API compatibility layer ❌
- Platform detection macros ❌
- Custom allocator system ❌
- Necessary unions ❌
The Lua C++ conversion project is 63% complete with excellent foundational work:
- All classes fully encapsulated ✅
- CRTP inheritance active ✅
- ~500 macros converted ✅
- Zero performance regression ✅
Remaining modernization potential: ~37% of codebase
Aggressive conversion targets:
- ⭐ 8 old-style enums → enum class (HIGH IMPACT)
- ⭐ 90 typedefs → using declarations (CLARITY)
- ⭐ 591 NULL → nullptr (SAFETY)
- ⚡ ~75 macros → constexpr inline (EXPRESSIVENESS)
- ⚡ ~60 C-casts → static_cast (TYPE SAFETY)
Estimated total effort: 63-80 hours for full modernization
Recommendation: Proceed with phased approach, prioritizing high-impact, low-risk conversions (enum class modernization) first, then systematic cleanup (typedef/NULL/casts).
This will achieve ~85-90% modern C++23 idioms while maintaining C API compatibility and zero performance regression! 🚀
- Phase 100: Public API inline overloads (8h)
- Phase 101: Math macro conversion (4h)
- Phase 103: Low-impact enum class (4h)
- Phase 104: Medium-impact enum class (6h)
- Phase 105: TagMethod enum class (8h)
- Phase 107: typedef → using (15-20h)
- Phase 108: NULL → nullptr (10-15h)
- Phase 110: C-style cast elimination (8-10h)
After each phase, run:
cmake --build build && cd testes
for i in 1 2 3 4 5; do ../build/lua all.lua 2>&1 | grep "total time:"; doneTarget: Average ≤4.33s Baseline: 4.20s (Nov 2025)
Enum conversions:
src/compiler/llex.h- BinOpr, UnOprsrc/core/ltm.h- TagMethodsrc/compiler/lparser.h- ExpKindsrc/compiler/lopcodes.h- OpMode, OpCode
NULL→nullptr conversions:
src/memory/lgc.cpp(54)src/libraries/loadlib.cpp(42)src/core/ldebug.cpp(39)src/testing/ltests.cpp(44)
typedef→using conversions:
src/memory/llimits.h- Type aliasessrc/objects/lobject.h- Core typesinclude/lua.h- Public API (keep typedef)
Document Version: 1.0 Last Updated: 2025-11-17 Author: Claude (Anthropic)