Skip to content

Commit 0c1ad5e

Browse files
Add FLT (binary floating‑point) type and arithmetic; add stats library
Introduce FLT runtime type and binary fixed‑point float literals (lexer + parser). Add TYPE_FLT to the interpreter and seal it in the extension/type registry. Extend numeric builtins to support INT/FLT (no implicit mixing): ADD, SUB, MUL, DIV, MOD, POW, ROOT, NEG, ABS, GCD, LCM, GT/LT/GTE/LTE, LOG, SUM, PROD, MAX, MIN, etc. Add explicit conversions and predicates: INT(…), FLT(…), ISFLT, and ROUND (multiple rounding modes and ndigits). Add tensor numeric support for FLT (elementwise ops, tensor‑scalar ops) and common numeric helpers; enforce uniform element types and reject INT/FLT mixing. Implement string/float parsing and to_str printing for FLT. Add lib\stats.asmln with statistical utilities and update test.asmln to exercise new stats APIs.
1 parent fc64f90 commit 0c1ad5e

File tree

8 files changed

+979
-69
lines changed

8 files changed

+979
-69
lines changed

SPECIFICATION.html

Lines changed: 47 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@
4545

4646
## 1. Overview
4747

48-
The language is a familiar statement-based, imperative language. Programs consist of variable declarations via assignment, expressions, and control-flow constructs such as `IF`, `ELSIF`, `ELSE`, `WHILE`, and `FOR`. ASM-Lang now has three runtime data types: binary integers (`INT`), strings (`STR`), and non-scalar tensors (`TNS`). Identifiers, function parameters, and return values are statically typed; the type of every symbol must be declared when it is first introduced. Computation proceeds by evaluating expressions and executing statements in sequence, with explicit constructs for branching and looping. Input and output are modeled through built-in operators, in particular `INPUT` and `PRINT`.
48+
The language is a familiar statement-based, imperative language. Programs consist of variable declarations via assignment, expressions, and control-flow constructs such as `IF`, `ELSIF`, `ELSE`, `WHILE`, and `FOR`. ASM-Lang has four runtime data types: binary integers (`INT`), binary floating-point numbers (`FLT`, IEEE754), strings (`STR`), and non-scalar tensors (`TNS`). Identifiers, function parameters, and return values are statically typed; the type of every symbol must be declared when it is first introduced. Computation proceeds by evaluating expressions and executing statements in sequence, with explicit constructs for branching and looping. Input and output are modeled through built-in operators, in particular `INPUT` and `PRINT`.
4949
5050
The interpreter compiles source code into a single initial configuration (the seed state), which includes the program code, an empty variable environment, and an initial I/O history. It then advances execution by repeatedly applying a single, fixed small-step transition function that is independent of the particular program. A disassembler and log view expose all intermediate states so that every step and every control-flow decision is inspectable and replay-able.
5151
@@ -93,10 +93,18 @@
9393

9494
## 3. Data Model
9595

96-
ASM-Lang supports three literal data types: binary integers, strings, and non-scalar tensors.
96+
ASM-Lang supports four runtime data types: binary integers, binary floating-point numbers, strings, and non-scalar tensors.
9797

9898
Binary integer literal: an unsigned non-empty sequence of `{0,1}` (for example, `0`, `1`, `1011`), or a signed literal formed by a leading `-` (the dash is part of the literal, not an operator) followed by optional spaces, tabs, or carriage returns and then a non-empty sequence of `{0,1}`. A `-` that does not immediately introduce a literal is a syntax error.
9999

100+
Binary floating-point literal: an IEEE754 floating-point value written in binary fixed-point notation `n.n`, where both sides of the radix point are non-empty sequences of `{0,1}`. Examples:
101+
102+
- `0.1` denotes one-half.
103+
- `0.01` denotes one-quarter.
104+
- `0.11` denotes three-quarters.
105+
106+
FLT literals MUST NOT begin with the radix point (so `.1` is invalid). A leading `-` may prefix a FLT literal using the same rules as for integers (the dash is part of the literal and is not an operator).
107+
100108
String literal: a sequence of ASCII characters enclosed in either double quotation marks (`"`) or single quotation marks (`'`) with no escape processing. A string opened with one delimiter must be closed with the same delimiter; other quotation characters appearing inside the string are treated as ordinary characters. Newlines are not permitted inside string literals. The literal's value is the contained character sequence.
101109

102110
Tensor literal: a non-empty bracketed collection of expressions. Each pair of matching brackets introduces a dimension; nested brackets must form a rectangular shape (all sublists at a given depth have the same length) or a syntax error is raised. The outermost bracket corresponds to dimension 1. Examples (lengths shown in binary):
@@ -111,7 +119,11 @@
111119

112120
All other tensor operations are non-mutating: tensor literals and tensor-valued built-ins produce new tensor values rather than mutating existing ones. Because indexed assignment mutates a tensor object, if the same tensor value is aliased (bound to multiple identifiers, passed as an argument, or stored inside another tensor), all aliases observe the mutation.
113121

114-
Every runtime value has a static type: `INT`, `STR`, or `TNS`. Integers are conceptually unbounded mathematical integers. Strings are byte strings of ASCII characters. Tensors are non-scalar aggregates whose elements may be `INT`, `STR`, or `TNS`. When a Boolean interpretation is required, `INT` treats 0 as false and non-zero as true; `STR` treats the empty string as false and any non-empty string as true; `TNS` is true if any contained element is true by these rules, otherwise false. Control-flow conditions (`IF`, `ELSIF`, `WHILE`) and `ASSERT` convert strings to integers using the same rules as the `INT` built-in; tensors are first reduced to their Boolean truth value (1 or 0).
122+
Every runtime value has a static type: `INT`, `FLT`, `STR`, or `TNS`. Integers are conceptually unbounded mathematical integers. Floats are IEEE754 binary floating-point numbers. Strings are byte strings of ASCII characters. Tensors are non-scalar aggregates whose elements may be `INT`, `FLT`, `STR`, or `TNS`.
123+
124+
When a Boolean interpretation is required, `INT` treats 0 as false and non-zero as true; `FLT` treats 0.0 as false and any non-zero value as true; `STR` treats the empty string as false and any non-empty string as true; `TNS` is true if any contained element is true by these rules, otherwise false. Control-flow conditions (`IF`, `ELSIF`, `WHILE`) and `ASSERT` convert strings to integers using the same rules as the `INT` built-in; tensors are first reduced to their Boolean truth value (1 or 0).
125+
126+
`INT` and `FLT` are not interoperable: no implicit conversion occurs. Operators that accept both types require that all numeric arguments have the same numeric type.
115127
116128
## 4. Statements and Control Flow
117129
@@ -343,7 +355,8 @@
343355
- `MUL(INT:a, INT:b):INT` ; a * b
344356
- `DIV(INT: a, INT: b):INT` ; floor(a / b)
345357
- `CDIV(INT: a, INT: b):INT` ; ceil(a / b)
346-
- `POW(INT: a, INT: b):INT` ; a ^ b (b >= 0)
358+
- `POW(INT: a, INT: b):INT` ; a ^ b
359+
- `ROOT(INT|FLT: x, INT|FLT: n):INT|FLT` ; nth root of `x`. No mixing of `INT` and `FLT` is allowed. For `INT` arguments `n` must be non-zero; positive `n` returns the integer nth root (largest integer r with r^n <= x for x >= 0); negative `n` yields an integer result only for `x` equal to `1` or `-1` (reciprocal is integer), and `x < 0` requires odd `n`. For `FLT` arguments the result is `x^(1/n)` (negative `n` allowed); negative `x` is allowed only when `n` is an odd integer. Division by zero is an error.
347360
- `MOD(INT: a, INT: b):INT` ; remainder of a / b
348361
- `NEG(INT: a):INT` ; -a (additive inverse)
349362
- `ABS(INT: a):INT` ; absolute value of a
@@ -366,39 +379,52 @@
366379
367380
### Comparisons
368381
- `EQ(ANY: a, ANY: b):INT` ; 1 if a == b else 0
369-
- `GT(INT: a, INT: b):INT` ; 1 if a > b else 0
370-
- `LT(INT: a, INT: b):INT` ; 1 if a < b else 0
371-
- `GTE(INT: a, INT: b):INT` ; 1 if a >= b else 0
372-
- `LTE(INT: a, INT: b):INT` ; 1 if a <= b else 0
382+
- `GT(INT|FLT: a, INT|FLT: b):INT` ; 1 if a > b else 0 (no mixing INT/FLT)
383+
- `LT(INT|FLT: a, INT|FLT: b):INT` ; 1 if a < b else 0 (no mixing INT/FLT)
384+
- `GTE(INT|FLT: a, INT|FLT: b):INT` ; 1 if a >= b else 0 (no mixing INT/FLT)
385+
- `LTE(INT|FLT: a, INT|FLT: b):INT` ; 1 if a <= b else 0 (no mixing INT/FLT)
373386
374387
### Aggregates / Utilities
375-
- `MAX(INT|STR: a1, ..., INT|STR: aN):INT|STR` ; `INT` -> numeric max; `STR` -> longest string; mixing `INT` and `STR` or supplying tensors is an error
376-
- `MIN(INT|STR: a1, ..., INT|STR: aN):INT|STR` ; `INT` -> numeric min; `STR` -> shortest string; mixing `INT` and `STR` or supplying tensors is an error
377-
- `SUM(INT: a1, ..., INT: aN):INT` ; sum of the arguments
388+
- `MAX(INT|FLT|STR: a1, ..., INT|FLT|STR: aN):INT|FLT|STR` ; numeric max for `INT`/`FLT`, longest for `STR`; supplying tensors or mixing types is an error
389+
- `MIN(INT|FLT|STR: a1, ..., INT|FLT|STR: aN):INT|FLT|STR` ; numeric min for `INT`/`FLT`, shortest for `STR`; supplying tensors or mixing types is an error
390+
- `SUM(INT|FLT: a1, ..., INT|FLT: aN):INT|FLT` ; sum of the arguments (no mixing INT/FLT)
378391
- `LEN(INT|STR: a1, ..., INT|STR: aN):INT` ; number of arguments (N), rejects tensors
379392
- `ALL(ANY: a1, ..., ANY: aN):INT` ; Boolean AND (empty string -> false, non-empty -> true)
380393
- `ANY(ANY: a1, ..., ANY: aN):INT` ; Boolean OR (empty string -> false, non-empty -> true)
381394
- `JOIN(INT|STR: a1, INT|STR: a2, ..., INT|STR: aN):INT|STR` ; `INT` -> concatenate binary spellings with consistent sign; `STR` -> concatenate strings; mixing `INT` and `STR` or supplying tensors raises an error
382395
- `PROD(INT: a1, ..., INT: aN):INT` ; product of the arguments
396+
- `PROD(INT|FLT: a1, ..., INT|FLT: aN):INT|FLT` ; product of the arguments (no mixing INT/FLT)
383397
384398
### Tensor operations
385399
- `SHAPE(TNS: tensor):TNS` — Returns the tensor's shape as a 1D `TNS` (vector) of `INT` lengths (one entry per dimension).
386400
- `TLEN(TNS: tensor, INT: dim):INT` — Returns the length of the specified 1-based dimension. Errors if `dim` is out of range.
387401
- `FILL(TNS: tensor, ANY: value):TNS` — Returns a new tensor with the same shape as `tensor`, filled with `value`. The supplied value`s type must match the existing element type at every position.
388402
- `TNS(TNS: shape, ANY: value):TNS` Creates a new `TNS` with the shape described by a 1D `TNS` of positive `INT` lengths, filled with `value`.
389-
- `MADD/MSUB/MMUL/MDIV(TNS: x, TNS: y):TNS` Elementwise addition, subtraction, multiplication, and integer division. Shapes must match; all elements must be `INT`. `MDIV` raises on division by zero.
390-
- `MSUM(TNS: t1, ..., TNS: tN):TNS` — Elementwise sum across tensors. Shapes must match; elements must be `INT`.
391-
- `MPROD(TNS: t1, ..., TNS: tN):TNS` — Elementwise product across tensors. Shapes must match; elements must be `INT`.
392-
- `TADD(TNS: x, INT: y):TNS` - Elementwise `INT`-scalar addition
393-
- `TSUB(TNS: x, INT: y):TNS` - Elementwise `INT`-scalar subtraction
394-
- `TMUL(TNS: x, INT: y):TNS` - Elementwise `INT`-scalar multiplication
395-
- `TDIV(TNS: x, INT: y):TNS` - Elementwise `INT`-scalar integer division. Division by zero is an error.
396-
- `TPOW(TNS: x, INT: y):TNS` - Elementwise `INT`-scalar exponentiation. Negative exponents are an error.
403+
- `MADD/MSUB/MMUL/MDIV(TNS: x, TNS: y):TNS` Elementwise addition, subtraction, multiplication, and division. Shapes must match; all elements must be `INT` or all `FLT` (no mixing). Division by zero is an error.
404+
- `MSUM(TNS: t1, ..., TNS: tN):TNS` Elementwise sum across tensors. Shapes must match; elements must be all `INT` or all `FLT` (no mixing).
405+
- `MPROD(TNS: t1, ..., TNS: tN):TNS` Elementwise product across tensors. Shapes must match; elements must be all `INT` or all `FLT` (no mixing).
406+
- `TADD/TSSUB/TMUL/TDIV/TPOW(TNS: x, INT|FLT: y):TNS` Tensor-scalar arithmetic. Tensor elements and scalar must both be `INT` or both be `FLT` (no mixing). Division by zero is an error.
397407

398408
### Logarithms
399409
- `LOG(INT: a):INT` ; floor(log2(a)) for a > 0
410+
- `LOG(FLT: a):FLT` ; floor(log2(a)) for a > 0
400411
- `CLOG(INT: a):INT` ; ceil(log2(a)) for a > 0
401412

413+
### Numeric conversions / predicates
414+
- `INT(ANY: a):INT` Explicit conversion to integer. If `a` is `FLT`, conversion truncates toward zero.
415+
- `FLT(ANY: a):FLT` Explicit conversion to float.
416+
- `ISFLT(SYMBOL: name):INT` 1 if `name` is bound and has type `FLT`, otherwise 0.
417+
418+
### Rounding
419+
- `ROUND(FLT: float, STR: mode="floor", INT: ndigits=0):FLT` Round `float` to `ndigits` places right of the radix point (binary places; `ndigits` may be negative). Modes are:
420+
421+
- `ROUND(FLT: float, STR: mode="floor", INT: ndigits=0):FLT` Round `float` to `ndigits` places right of the radix point (binary places; `ndigits` may be negative). When exactly two arguments are supplied and the second is an `INT`, it is treated as `ndigits` with the mode defaulting to `"floor"`. Modes are:
422+
423+
- `"floor"` round toward $-\infty$
424+
- `"ceiling"` or `"ceil"` round toward $+\infty$
425+
- `"zero"` round toward zero
426+
- `"logical"` or `"half-up"` round half away from zero
427+
402428
### Module operations:
403429
- `IMPORT(MODULE: name)` or `IMPORT(MODULE: name, SYMBOL: alias)` Loads another source file and exposes it as a distinct module namespace. When an optional alias identifier is supplied, the imported module's bindings are exposed under the `alias` prefix rather than the module's own name (for example, `IMPORT(mod, ali)` makes `ali.F()` valid while `mod.F()` is not).
404430

@@ -433,7 +459,7 @@
433459
- `ARGV():TNS` — Returns the interpreter's argument vector as a one-dimensional `TNS` of `STR`. The tensor's elements are the command-line argument strings supplied to the process, in the same order as the process `argv`, with index 1 holding the interpreter's invocation entry (TNS indices are 1-based).
434460
435461
### Control / Function / Statement Signatures (statement position)
436-
- Assignment: `TYPE : identifier = expression` on first use; subsequent assignments omit TYPE but must match the original type. TYPE is `INT`, `STR`, or `TNS`. Tensor elements may be reassigned with `identifier[i1,...,iN] = expression` (indices are 1-based with negative-index support, and the stored type at that element must not change).
462+
- Assignment: `TYPE : identifier = expression` on first use; subsequent assignments omit TYPE but must match the original type. TYPE is `INT`, `FLT`, `STR`, or `TNS`. Tensor elements may be reassigned with `identifier[i1,...,iN] = expression` (indices are 1-based with negative-index support, and the stored type at that element must not change).
437463
- Block: `{ statement1 ... statementN }`
438464
- `IF(condition){ block }` (optional `ELSIF(condition){ block }` ... `ELSE{ block }`)
439465
- `WHILE(condition){ block }`
@@ -447,7 +473,7 @@
447473
- `GOTO(n)` ; jump to a previously-registered gotopoint with identifier `n` (`INT` or `STR`) within the same function or top-level scope; runtime error if not registered in that scope
448474

449475
### Notes
450-
- Built-ins are statically typed. Boolean contexts treat `INT` 0 as false and non-zero as true; `STR` is false when empty and true when non-empty unless a rule explicitly converts via `INT`. A `TNS` is true if any element is true by those `INT`/`STR` rules.
476+
- Built-ins are statically typed. Boolean contexts treat `INT` 0 as false and non-zero as true; `FLT` is false when 0.0 and true otherwise; `STR` is false when empty and true when non-empty unless a rule explicitly converts via `INT`. A `TNS` is true if any element is true by those rules.
451477
- Argument evaluation order: left-to-right.
452478
- User-defined functions use the same call syntax as built-ins; keyword arguments are permitted only after positional arguments and only for parameters that declare defaults. Built-ins reject keyword arguments except that `READFILE` and `WRITEFILE` accept an optional `coding=` keyword. When a keyword parameter is omitted, its default expression is evaluated at call time in the function's defining environment.
453479

asm-lang.exe

13.4 KB
Binary file not shown.

extensions.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,7 @@ def build_default_services() -> RuntimeServices:
394394
# Reserve built-in type names so extensions cannot redefine them.
395395
# The interpreter will register their concrete semantics at runtime.
396396
services.type_registry.seal("INT")
397+
services.type_registry.seal("FLT")
397398
services.type_registry.seal("STR")
398399
services.type_registry.seal("TNS")
399400
return services

0 commit comments

Comments
 (0)