Quantum Language is a modern, cybersecurity-focused scripting language implemented as a tree-walk interpreter in C++17. The language processes .sa (Quantum Script) files through a classic compilation pipeline: Lexical Analysis → Parsing → Abstract Syntax Tree Generation → Interpretation.
This document provides a comprehensive overview of how .sa files are compiled and executed, detailing the tokenizer, lexer, parser, and interpreter components.
- File Extension:
.sa(Quantum Script) - Entry Point:
src/main.cpp - Language Version: 1.0.0
- Runtime: Tree-walk interpreter
The compilation process follows these distinct phases:
.sa Source File → Lexer → Token Stream → Parser → AST → Interpreter → Execution
The main.cpp file serves as the orchestrator for the entire compilation process:
// Key compilation flow in runFile():
Lexer lexer(source); // Phase 1: Lexical analysis
auto tokens = lexer.tokenize(); // Generate token stream
Parser parser(std::move(tokens)); // Phase 2: Parsing
auto ast = parser.parse(); // Generate AST
Interpreter interp; // Phase 3: Interpretation
interp.execute(*ast); // Execute ASTThe main function supports three modes:
- REPL Mode: Interactive parsing and execution line-by-line
- File Mode: Complete
.safile compilation and execution - Check Mode: Syntax validation only (parses without execution)
The Lexer transforms raw source code into a stream of tokens. It operates character by character, recognizing patterns and categorizing them into lexical units.
State Management:
std::string src; // Source code
size_t pos; // Current position
int line, col; // Line and column trackingToken Recognition Methods:
readNumber(): Handles decimal and hexadecimal numbersreadString(): Processes string literals with escape sequencesreadIdentifierOrKeyword(): Distinguishes identifiers from keywords
The lexer recognizes 50+ token types organized into categories:
Literals:
NUMBER,STRING,BOOL_TRUE,BOOL_FALSE,NIL
Keywords:
- Control Flow:
let,const,fn,return,if,else,elif,while,for,in,break,continue - I/O:
print,input - Logic:
and,or,not - Modules:
import - Cybersecurity (reserved):
scan,payload,encrypt,decrypt,hash
Operators:
- Arithmetic:
+,-,*,/,%,**(power) - Comparison:
==,!=,<,>,<=,>= - Assignment:
=,+=,-=,*=,/= - Bitwise:
&,|,^,~,<<,>>
Delimiters:
(,),{,},[,],,,;,:,.,->
- Whitespace Handling: Skips spaces and tabs, tracks newlines
- Comment Processing: Skips
#single-line comments - Number Recognition:
- Decimal numbers:
123,45.67 - Hexadecimal numbers:
0xFF,0x1A2B
- Decimal numbers:
- String Processing: Supports both single
'and double"quotes with escape sequences - Keyword/Identifier Distinction: Uses hash map for O(1) keyword lookup
- Operator Parsing: Handles multi-character operators (
==,!=,**,+=, etc.)
For the input: let x = 42 + "hello"
Generated tokens:
LET(let) IDENTIFIER(x) ASSIGN(=) NUMBER(42) PLUS(+) STRING(hello)
The Parser transforms the token stream into a structured Abstract Syntax Tree (AST) using a combination of Recursive Descent for statements and Pratt Parsing for expressions.
The AST uses a variant-based design with these node types:
Expression Nodes:
NumberLiteral,StringLiteral,BoolLiteral,NilLiteralIdentifier- Variable referencesBinaryExpr- Binary operations (+,-,*,/, etc.)UnaryExpr- Unary operations (-,not,~)AssignExpr- Assignment operations (=,+=, etc.)CallExpr- Function callsIndexExpr- Array/dict indexingarr[i]MemberExpr- Member accessobj.propArrayLiteral,DictLiteral- Data structure literalsLambdaExpr- Anonymous functions
Statement Nodes:
VarDecl- Variable declarations (let,const)FunctionDecl- Function definitionsReturnStmt- Return statementsIfStmt- Conditional statements with elif chainsWhileStmt- While loopsForStmt- For-in loopsBlockStmt- Code blocksExprStmt- Expression statementsPrintStmt- Print statementsImportStmt- Module importsBreakStmt,ContinueStmt- Loop control
Statement Parsing (Recursive Descent):
ASTNodePtr parseStatement() {
switch (current().type) {
case TokenType::LET: return parseVarDecl(false);
case TokenType::FN: return parseFunctionDecl();
case TokenType::IF: return parseIfStmt();
// ... other statement types
default: return parseExprStmt();
}
}Expression Parsing (Pratt Precedence): The parser uses operator precedence climbing with these levels (highest to lowest):
- Primary: Literals, identifiers, parentheses
- Postfix: Function calls, indexing, member access
- Unary:
-,not,~ - Power:
**(right-associative) - Multiplication:
*,/,% - Addition:
+,- - Shift:
<<,>> - Comparison:
<,>,<=,>= - Equality:
==,!= - Bitwise AND:
& - Bitwise XOR:
^ - Bitwise OR:
| - Logical AND:
and - Logical OR:
or - Assignment:
=,+=,-=,*=,/=
For the code: let result = x * (y + 2)
Generated AST structure:
BlockStmt
└── VarDecl (name: "result", isConst: false)
└── AssignExpr (op: "=")
├── Identifier ("result")
└── BinaryExpr (op: "*")
├── Identifier ("x")
└── BinaryExpr (op: "+")
├── Identifier ("y")
└── NumberLiteral (2.0)
The Interpreter executes the AST using a tree-walk approach with lexical scoping and dynamic typing.
The language uses a dynamic type system with these value types:
- Number: Double-precision floating point
- String: Text data
- Boolean:
true/false - Nil: Null value
- Function: User-defined and native functions
- Array: Ordered collections
- Dictionary: Key-value mappings
class Environment {
std::shared_ptr<Environment> enclosing; // Parent scope
std::unordered_map<std::string, QuantumValue> values;
};- Lexical Scoping: Variables resolved through parent environments
- Function Closures: Functions capture their defining environment
- Global Scope: Built-in functions and global variables
Statement Execution:
- Variable Declaration: Creates bindings in current environment
- Function Declaration: Creates function objects with closure
- Control Flow: Implements if/elif/else, while, for loops
- Return: Propagates return values up call stack
Expression Evaluation:
- Literals: Direct value creation
- Variables: Environment lookup
- Binary Operations: Type-aware arithmetic, comparison, logical operations
- Function Calls: Argument evaluation, environment creation, execution
- Member Access: Object property resolution
The interpreter registers native functions at startup:
Type Conversion:
num(value): Convert to numberstr(value): Convert to stringbool(value): Convert to boolean
Mathematical:
abs(),sqrt(),floor(),ceil(),round()pow(),log(),log2(),sin(),cos(),tan()min(),max()
I/O:
__input__(): Read user input (internal)print(): Output values
Let's trace the compilation of a simple .sa file:
Source Code (example.sa):
# Calculate factorial
fn factorial(n) {
if n <= 1 {
return 1
}
return n * factorial(n - 1)
}
let result = factorial(5)
print("Factorial of 5 is:", result)
Generated token stream (simplified):
FN(fn) IDENTIFIER(factorial) LPAREN(()) IDENTIFIER(n) RPAREN()) LBRACE({)
IF(if) IDENTIFIER(n) LTE(<=) NUMBER(1) LBRACE({) RETURN(return) NUMBER(1)
RBRACE(}) RETURN(return) IDENTIFIER(n) STAR(*) IDENTIFIER(factorial)
LPAREN(()) IDENTIFIER(n) MINUS(-) NUMBER(1) RPAREN()) RBRACE(})
LET(let) IDENTIFIER(result) ASSIGN(=) IDENTIFIER(factorial) LPAREN(())
NUMBER(5) RPAREN()) PRINT(print) LPAREN(()) STRING(Factorial of 5 is:)
COMMA(,) IDENTIFIER(result) RPAREN())
Generated AST structure:
BlockStmt
├── FunctionDecl (name: "factorial", params: ["n"])
│ └── BlockStmt
│ ├── IfStmt
│ │ ├── BinaryExpr (op: "<=")
│ │ │ ├── Identifier ("n")
│ │ │ └── NumberLiteral (1.0)
│ │ └── BlockStmt
│ │ └── ReturnStmt
│ │ └── NumberLiteral (1.0)
│ └── ReturnStmt
│ └── BinaryExpr (op: "*")
│ ├── Identifier ("n")
│ └── CallExpr
│ ├── Identifier ("factorial")
│ └── BinaryExpr (op: "-")
│ ├── Identifier ("n")
│ └── NumberLiteral (1.0)
├── VarDecl (name: "result", isConst: false)
│ └── AssignExpr (op: "=")
│ ├── Identifier ("result")
│ └── CallExpr
│ ├── Identifier ("factorial")
│ └── NumberLiteral (5.0)
└── PrintStmt
├── StringLiteral ("Factorial of 5 is:")
└── Identifier ("result")
- Function Declaration: Creates
factorialfunction with closure - Variable Declaration: Evaluates
factorial(5)call- Calls factorial with n=5
- Recursive calls: factorial(4), factorial(3), factorial(2), factorial(1)
- Returns 120
- Assignment: Stores 120 in
resultvariable - Print: Outputs "Factorial of 5 is: 120"
The language implements comprehensive error handling at each phase:
- Unexpected characters: Invalid symbols
- Unterminated strings: Missing closing quotes
- Invalid numbers: Malformed numeric literals
- Syntax errors: Unexpected tokens
- Missing delimiters: Unclosed brackets, parentheses
- Invalid constructs: Malformed statements
- Type errors: Invalid operations on types
- Name errors: Undefined variables/functions
- Runtime exceptions: Division by zero, index out of bounds
- Lexing: O(n) - Linear scan of source
- Parsing: O(n) - Single pass with backtracking limited
- Interpretation: O(n × m) where n is AST size, m is operations per node
- Tokens: O(n) - Proportional to source size
- AST: O(n) - Tree structure proportional to tokens
- Environments: O(s × d) where s is scope size, d is call depth
The language includes reserved keywords for future cybersecurity functionality:
scan: Network/port scanning capabilitiespayload: Exploit payload creationencrypt/decrypt: Cryptographic operationshash: Hashing functions
These keywords are lexically recognized but not yet implemented, providing a foundation for security-focused language extensions.
The project uses CMake for cross-platform compilation:
# Key components
src/Lexer.cpp # Lexical analysis
src/Parser.cpp # Syntax analysis
src/Interpreter.cpp # Execution engine
src/main.cpp # Entry point
include/ # Header files
examples/ # Sample .sa filesQuantum Language implements a clean, modular architecture following established compiler design principles. The separation of concerns between lexical analysis, parsing, and interpretation provides a solid foundation for language evolution and the addition of cybersecurity features. The use of modern C++ features ensures efficient memory management and type safety throughout the compilation pipeline.