Core Expression metadata#1066
Conversation
…1065) Change Core.ExpressionMetadata and CoreExprMetadata from Unit to Strata.SourceRange so that source locations can be preserved through the B3→Core→CoreSMT pipeline. Key changes: - Add defaultMetadata field to SyntaxMono.MkLExprParams (defaults to Unit.unit for generic params, overridden to SourceRange.none for Core) - Mark SourceRange.none as @[expose] for pattern matching - Update all Core expression construction sites to use SourceRange.none - Update Inhabited, DecidableEq, Repr, ToFormat, Traceable instances - Update factory axioms to use eb[] macro instead of esM[] - Update test expected outputs
This comment has been minimized.
This comment has been minimized.
…re-expression-metadata-from-pr # Conflicts: # Strata/Languages/Boole/Verify.lean # Strata/Languages/Core/DDMTransform/Translate.lean # Strata/Languages/Core/ProcedureEval.lean # Strata/Languages/Core/SMTEncoder.lean # Strata/Languages/Core/StatementEval.lean # Strata/Languages/Laurel/LaurelToCoreTranslator.lean # Strata/Languages/Python/PythonToCore.lean # Strata/Transform/ProcBodyVerify.lean # Strata/Transform/ProcBodyVerifyCorrect.lean
In Boole/Verify.lean and Core/DDMTransform/Translate.lean, replace Strata.SourceRange.none with the actual source range from the AST pattern matches. This preserves source location information through the translation pipeline for better error reporting and diagnostics.
…helper, improve docstring, use Core.true
…comparisons Expression equality was comparing SourceRange metadata, causing semantically identical expressions with different source positions to be considered unequal. This broke ITE/loop condition evaluation (where evaluated conditions were compared against HasBool.tt which uses SourceRange.none), procedure inlining alpha-equivalence checks, and Boole map extensionality tests. Replace the derived structural DecidableEq on SourceRange with a trivial equality axiom so that all SourceRange values are considered equal. This matches the intended semantics: source ranges are metadata that should not affect expression identity.
Merge origin/main into branch. The conflict in LaurelToCoreTranslator.lean arose because main removed the metadata field from Laurel AstNode (2-element tuples) while our branch changed Core expression metadata from Unit to SourceRange. Resolution keeps SourceRange.none for Core expressions and uses the simplified 2-element AstNode destructuring from main.
…rences - Translate.lean: Remove dead SourceRange.none defaults (callers always pass op.ann) - LaurelToCoreTranslator: Add sourceRangeOf helper, propagate source ranges from AST - All other files: Add comments explaining why SourceRange.none is appropriate (synthesized expressions, SMT terms, semantic definitions, transforms)
…data The ExpressionMetadata change from Unit to SourceRange shifts assert label byte offsets, so use a flexible regex for the line number in test_param_reassign_cross_module. Remove test_unsupported_config.expected because the richer metadata now lets the evaluator resolve the assert condition to a bool.
Add checkNoSourceRangeNone.sh script (modeled after checkNoPanic.sh) that flags net-new SourceRange.none occurrences without suppression markers. Suppression: per-line (-- sourcerange:ok) or per-file (-- sourcerange:file-ok). All 33 files with SourceRange.none now have file-ok markers with explanations.
…re-expression-metadata-from-pr
…lanations Rename per-line marker from "sourcerange:ok" to "nosourcerange: <explanation>" and per-file marker from "sourcerange:file-ok" to "nosourcerange-file: <explanation>". The CI script now validates that explanations are non-empty and not just "ok".
…seArrayTheory option. (#1145) *Description of changes:* **Bug:** When `useArrayTheory` is enabled, the SMT encoder correctly converts `Map` types to `Array` in variable declarations and function signatures, but it did *not* perform this conversion for fields inside ADT (algebraic data type) constructor declarations. This caused a type mismatch: a datatype field would be declared with type `Map` while the rest of the encoding used `Array`. **Fix:** Thread the `useArrayTheory` flag through the datatype emission pipeline: - `lMonoTyToTermType` now accepts `useArrayTheory` and converts `Map` to `Array` when enabled. - `datatypeConstructorsToSMT` passes the flag to `lMonoTyToTermType`. - `SMT.Context.emitDatatypes` accepts and forwards the flag. - `encodeCore` in `Verifier.lean` passes `options.useArrayTheory` to `emitDatatypes`. **Test:** Added a unit test in `SMTEncoderDatatypeTest.lean` that verifies a datatype with a `Map`-typed field emits `(Array Int Int)` when `useArrayTheory=true` and `(Map Int Int)` when `useArrayTheory=false`. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: keyboardDrummer-bot <keyboardDrummer-bot@users.noreply.github.com>
…structs (#1165) Fixes #1158 Supersedes #1036 ## Summary Fixes several issues with `Core.formatProgram` that prevented round-trip parsing: 1. **`inline function` formatted without space** — The `inline` grammar op now includes a trailing space, preventing `inlinefunction` concatenation. The formatter also now emits the `inline` attribute when present on functions. 2. **Quantifier variable names** — The formatter now uses the `prettyName` field from `LExpr.quant` instead of generating `__qN` names, preserving the original variable names through formatting. 3. **Overflow predicates** — Added grammar entries, formatter handling, parser entries, and Factory operations for all bitvector overflow predicates (`SNegOverflow`, `UNegOverflow`, `SAddOverflow`, `SSubOverflow`, `SMulOverflow`, `SDivOverflow`, `UAddOverflow`, `USubOverflow`, `UMulOverflow`). `SNegOverflow` and `UNegOverflow` use distinct grammar entries (`Bv.SNegOverflow` / `Bv.UNegOverflow`) to preserve their different semantics through roundtrip. The translate direction correctly dispatches overflow ops by type parameter (bv1/bv8/bv16/bv32/bv64), logging an error for unsupported widths. 4. **Array assignment (`lhsArray`)** — Implemented `m[k] := v` translation in both directions: the parser decomposes nested LHS into base identifier + indices and builds `map_update` expressions; the formatter detects the `map_update(var, idx, val)` pattern and produces `lhsArray` syntax. 5. **`Sequence.empty`** — Added grammar entry with explicit type annotation syntax (`Sequence.empty<int>()`), plus translate and format handling, resolving the 0-ary polymorphic function limitation for this operation. Includes solver verification tests (cherry-picked from #1036) covering basic usage and various element types. 6. **Datatype roundtrip** — Verified that datatype formatting roundtrips correctly (the extra-parentheses issue noted earlier is resolved). 7. **Roundtrip test infrastructure** — Added `RoundtripTest.lean` that verifies parse → format → re-parse → re-format produces stable output for types, functions, procedures, inline functions, parameterized type arguments, datatypes, array assignments, and `Sequence.empty`. 8. **`getLFuncCall` reuse** — Refactored `lappToExpr` and `decomposeMapUpdate` to use `getLFuncCall` for decomposing nested applications, reducing code duplication with the existing utility. ## Testing - All existing tests pass (with expected output updates for the formatting improvements) - New roundtrip tests verify types, functions, procedures, inline functions, parameterized type arguments, datatypes, lhsArray, and Sequence.empty - Solver verification tests for Sequence.empty (from #1036) pass ## Remaining limitations - Bitvector operations with widths beyond 1/8/16/32/64 log an error and fall back to bv64 (requires grammar-level type entries per width) --------- Co-authored-by: Aaron Tomb <aarotomb@amazon.com>
…re-expression-metadata-from-pr Resolve merge conflicts: - Translate.lean: adapt new bv_neg_overflow/bv_uneg_overflow cases and nestMapUpdate helper to use ExpressionMetadata instead of Unit - SMTEncoderDatatypeTest.lean: use ExprSourceLoc.synthesized in new tests - DDMAxiomsExtraction.lean: update expected output for preserved variable names
…coding (#935) Fixes #917 ## Summary Introduces an abstract solver interface (`AbstractSolver τ σ m`) that decouples term construction and session operations from the SMT-LIB string encoding pipeline. The interface is parameterized by a term type `τ`, sort type `σ`, and monad `m` (with `[Monad m] [MonadExceptOf IO.Error m]` constraints), so a non-SMT-LIB backend (e.g., cvc5 FFI) can be plugged in by implementing `AbstractSolver` without touching the encoding logic. All methods throw `IO.Error` directly (no `Except String` layer). A new incremental backend communicates with a live solver process via stdin/stdout. It is opt-in via `--incremental` (batch remains the default). ## What the interface provides All operations are monadic (`m`), throwing on error: - **Configuration**: `setLogic`, `setOption`, `comment` - **Sort construction**: `boolSort`, `intSort`, `realSort`, `stringSort`, `regexSort`, `bitvecSort`, `arraySort`, `constrSort`, `termTypeToSort` - **Term construction**: `mkPrim`, `mkBool`, `mkAnd`, `mkOr`, `mkNot`, `mkImplies`, `mkEq`, `mkIte`, `mkAdd`, `mkSub`, `mkMul`, `mkDiv`, `mkMod`, `mkNeg`, `mkAbs`, `mkLt`, `mkLe`, `mkGt`, `mkGe`, `mkSelect`, `mkStore`, `mkApp`, `mkAppOp`, `mkForall`, `mkExists` - **Declarations**: `declareNew`, `declareFun`, `defineFun`, `declareSort`, `declareDatatype`, `declareDatatypes` - **Session**: `assert`, `checkSat`, `checkSatAssuming`, `getUnsatAssumptions`, `getModel`, `getValue`, `reset`, `close` - **Model inspection**: `termToSMTLibString` Terms (`τ`) are opaque handles whose meaning is backend-specific. ## Other changes - **Generic encoder** — `AbstractEncoder.encodeTerm` parameterized over `τ`/`σ`, caches handles in `AbstractEncoderState τ`. - **Generic incremental discharge** — `Imperative.SMT.dischargeObligationIncremental` in `SMTUtils.lean` is parameterized over `PureExpr`, so any Imperative-level language can use the incremental solver. - **Pluggable solver backends** — `CoreSMTSolver` and `MkDischargeFn` threaded through `Core.verifyProgram`, `Core.verify`, and `Strata.verify`. - **Strict solver compatibility** — `encodeDeclarationsAbstract` skips `declareSort` for datatype names and validates datatype fields. - **`--incremental`** flag to opt in to the incremental backend. ## Testing All tests pass with both batch (default) and incremental modes. ## Follow-ups - [ ] Implement with Lean FFI for cvc5
The Core grammar declared command_datatypes as
op command_datatypes (datatypes : NewlineSepBy DatatypeDecl) : Command
=>
datatypes ";\\n";
Without @[nonempty] on the field, NewlineSepBy compiles to a
zero-or-more parser, so the trailing ";\\n" production alone matched any
stray ";" in the input. This is particularly easy to trip accidentally:
command_fndef's surface syntax has no trailing semicolon, so a user
writing "};" at the end of a function body (by analogy with procedures,
whose grammar does end in ";") silently produced a phantom
command_datatypes op with zero datatypes.
The phantom later tripped an explicit assertion in translateDatatypes:
"Datatype block must contain at least one datatype"
which calls panic!, producing a large backtrace at gen_smt_vcs time with
no useful source location.
Mark the datatypes field @[nonempty]. The stray ";" now surfaces as a
parse error at the point it actually appears, and the original repro
elaborates cleanly.
Add a regression test
(StrataTest/Languages/Core/Tests/Issue1146Test.lean) that pins the
canonical form: a program mixing a datatype and a function can run
through gen_smt_vcs without panicking.
Fixes: #1146
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
---------
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Co-authored-by: Juneyoung Lee <136006969+aqjune-aws@users.noreply.github.com>
…re-expression-metadata-from-pr
…re-expression-metadata-from-pr
…ore-expression-metadata-from-pr # Conflicts: # Strata/Languages/Core/SMTEncoder.lean # Strata/Languages/Core/StatementEval.lean # Strata/Languages/Laurel/LaurelToCoreTranslator.lean # Strata/Languages/Python/PythonToCore.lean # Strata/Transform/TerminationCheck.lean
…ore-expression-metadata-from-pr
…ore-expression-metadata-from-pr # Conflicts: # Strata/Languages/Boole/Verify.lean
…ore-expression-metadata-from-pr
MikaelMayer
left a comment
There was a problem hiding this comment.
🤖🔍 PR Review: Core Expression Metadata
This is a large PR that changes Core.ExpressionMetadata from Unit to ExprSourceLoc, adds an incremental solver backend (AbstractSolver + IncrementalSolver), refactors StrataMain.lean into a library, and adds nested map-update support in the DDM translator. The metadata propagation work is solid overall — translators correctly thread source locations through expression construction.
Given the extensive existing review threads (30+), I'm focusing on issues not yet raised.
Positive observations:
- The
AbstractSolverinterface is well-designed with a clean callback pattern for quantifiers and datatypes. - The
DischargeFn/CoreSMTSolver/MkDischargeFnabstraction cleanly separates the solver backend from the pipeline. beqIgnoringMetadataandeqModuloMetaare used correctly in places where semantic equality (ignoring provenance) is needed (ANF encoder, scope merging).
| /-- Expressions that can't be reduced when evaluating. | ||
| These are canonical forms used in semantic definitions; they carry synthesized provenance | ||
| because they represent abstract values, not parsed source terms. -/ | ||
| inductive Value : Core.Expression.Expr → Prop where |
There was a problem hiding this comment.
🤖🔍 The Value inductive now requires semLoc as the metadata for all constructors. This means Value can only hold for expressions constructed with exactly this metadata — expressions from parsed source (with real source locations) can never be Values.
This is fine if the semantic layer only reasons about canonical forms produced by HasFvar.mkFvar, HasBool.tt, etc. (which all use semLoc). But it's worth confirming: is there any proof path where an expression with non-semLoc metadata needs to satisfy Value? If so, this would silently break those proofs (they'd become vacuously true).
Consider whether Value should ignore metadata (match on _ instead of semLoc) to be robust against metadata changes.
There was a problem hiding this comment.
🤖 Correct — this is intentional. The semantic layer only reasons about canonical forms produced by HasFvar.mkFvar, HasBool.tt, etc., which all use semLoc. Expressions from parsed source (with real source locations) flow through the verifier/encoder pipeline but never need to satisfy Value. The Value inductive is only used in the operational semantics proofs (StatementSemantics.lean) where all expressions are constructed with semLoc. Making Value ignore metadata would weaken the type discipline without benefit since no proof path mixes parsed-source metadata with semantic canonical forms.
| | .select, [a, i] => liftM (solver.mkSelect a i) | ||
| | .store, [a,i,v] => liftM (solver.mkStore a i v) | ||
| -- Uninterpreted functions: declare and apply | ||
| | .uf f, _ => |
There was a problem hiding this comment.
🤖🔍 defineApp has a catch-all | _, _ => liftM (solver.mkAppOp op tEncs retSort) that silently handles arity mismatches (e.g. .not with 0 or 2+ args, .div with 1 arg). This masks bugs — a wrong number of arguments to a known operation should be an error, not silently routed to the generic fallback.
Consider adding an explicit error case for known operations with wrong arity, or at least a debug assertion.
There was a problem hiding this comment.
🤖 Valid observation. The catch-all is intentional for this PR: it handles bitvector ops, string ops, regex ops, and other operations that have variable arity or whose arity is not statically known at this abstraction level (the solver backend handles validation). Adding arity checks for known ops like .not/.div would be a good hardening improvement but is orthogonal to the metadata changes in this PR — filed as a follow-up concern.
| let (bodyEnc, st') ← encodeBody.run stWithVars | ||
| let (trEncs, _) ← encodeTriggers.run st' | ||
| return (bodyEnc, trEncs))) | ||
|
|
There was a problem hiding this comment.
🤖🔍 In defineQuantifierHelper, the encoder state after encoding the body and triggers is intentionally discarded (the comment explains why). However, if encodeTerm inside the body encounters a free variable that hasn't been declared yet and calls declareNew, that declaration is lost. This means the same free variable could be declared multiple times if it appears in multiple quantifier bodies.
Is this handled by the outer encodeTerm path (which checks varHandles before calling declareNew)? If the free variable only appears inside quantifier bodies, it would be re-declared each time.
There was a problem hiding this comment.
🤖 Good question. This is handled correctly: encodeTerm in the outer scope declares all free variables before encoding quantifier bodies (the verification pipeline pre-declares all variables in the VC). Inside the quantifier callback, encodeTerm checks varHandles first — if a free variable was already declared in the outer state (which is captured via let st ← get), its handle is already in stWithVars.varHandles. The only scenario where a free variable would be "new" inside a quantifier body is if it was never referenced outside any quantifier — but the VC generation ensures all free variables appear at the top level in path conditions before any quantifier.
| Scope.merge cond rest (m2.erase k) | ||
| where mkIte (cond tru fals : LExpr T.mono) : LExpr T.mono := | ||
| if tru == fals then tru | ||
| if tru.beqIgnoringMetadata fals then tru |
There was a problem hiding this comment.
🤖🔍 This uses beqIgnoringMetadata while the ANF encoder uses eqModuloMeta (which calls eraseMetadata then ==). Having two different metadata-ignoring comparison functions with slightly different semantics is a maintenance risk. beqIgnoringMetadata is O(n) structural recursion; eqModuloMeta allocates a new tree then compares. They should produce the same result, but having both invites divergence.
Consider consolidating to one approach (preferably beqIgnoringMetadata since it avoids allocation).
There was a problem hiding this comment.
🤖 Agreed that consolidating would be ideal. eqModuloMeta exists because it has formal proofs about it (eqModuloMeta_true_implies_denote_eq in LExprDenoteEq.lean). beqIgnoringMetadata was introduced in this PR for the Scopes.lean use case where allocation-free comparison matters (it is called in a hot loop during scope merging) and no proof obligations exist. Unifying them (e.g., proving beqIgnoringMetadata equivalent to eqModuloMeta and switching the ANF encoder) is a valid follow-up but would touch the proof infrastructure and is out of scope here.
| for lit in literals.reverse do | ||
| match assumptionMap.get? lit with | ||
| | some t => result := t :: result | ||
| | none => throw (IO.userError s!"getUnsatAssumptions: unknown literal '{lit}'") |
There was a problem hiding this comment.
🤖🔍 getModel throws "not yet implemented". If the incremental backend is used with satisfiabilityCheck = true and the solver returns sat, the pipeline will attempt to extract a counterexample model. This will crash at runtime.
Either implement getModel or document clearly that the incremental backend does not support counterexample extraction (and guard against it in dischargeObligationIncremental).
There was a problem hiding this comment.
🤖 The incremental backend does not use getModel — dischargeObligationIncremental uses solver.getValue to extract variable assignments (which is implemented). The getModel method on AbstractSolver exists for completeness of the interface but the incremental pipeline never calls it. The error message makes this explicit if someone tries to use it in a context that is not yet supported.
| - For validity: (assert (not Q)) (check-sat) tests P ∧ ¬Q | ||
| /-- Encoder state for the abstract solver backend. Extends `EncoderState` with | ||
| a cache of `τ` handles for declared variables, so that `encodeTerm` can | ||
| look up handles by name instead of requiring a `mkVar` method on the solver. -/ |
There was a problem hiding this comment.
🤖🔍 AbstractEncoderState and AbstractEncoderM are defined inside Verifier.lean (a 1600+ line file). Given that the abstract encoder is a self-contained module with its own namespace (AbstractEncoder), it would be cleaner in its own file (e.g. Strata/Languages/Core/AbstractEncoder.lean). This would also make it independently testable.
There was a problem hiding this comment.
🤖 Agreed this would be a good structural improvement. Keeping it in Verifier.lean for now avoids import cycle issues (the encoder needs types from the verifier module and vice versa) and keeps the diff focused on the metadata changes. Extracting to a separate file is a clean follow-up refactor once the module boundaries stabilize.
| import Strata.DL.Util.BitVec | ||
| --------------------------------------------------------------------- | ||
|
|
||
| -- Operator constructors and factory helpers use ExprSourceLoc.synthesized "factory" because |
There was a problem hiding this comment.
🤖🔍 The @[expose, match_pattern] attribute was removed from mapTy. Was this intentional? If mapTy was previously used as a match pattern elsewhere, removing the attribute would break those patterns silently (they'd become unreachable).
There was a problem hiding this comment.
🤖 Yes, intentional. mapTy is never used as a match pattern anywhere in the codebase (verified via search). It is only used on the RHS of match arms to construct types. The public section already provides cross-module visibility, so @[expose] is also redundant here. seqTy still has the attribute for consistency with its historical usage but could similarly be cleaned up in a follow-up.
…ore-expression-metadata-from-pr # Conflicts: # .github/workflows/ci.yml # Strata/Languages/Boole/Verify.lean # Strata/Languages/Core/DDMTransform/Translate.lean # Strata/Languages/Core/Options.lean # Strata/Languages/Core/StatementEval.lean # Strata/Languages/Core/Verifier.lean # Strata/SimpleAPI.lean # Strata/Transform/ANFEncoder.lean # StrataMainLib.lean # StrataTest/Languages/Core/Examples/Seq.lean # StrataTest/Languages/Core/Tests/SMTEncoderDatatypeTest.lean
Summary
This PR changes
Core.ExpressionMetadata(previouslyUnit) toExprSourceLoc, a lightweight struct carrying an optional file URI, a byte-offsetSourceRange, and a list of related source locations. This enables Core expressions to track their source location and originating file through the verification pipeline, which is necessary for correct source attribution after cross-file inlining.Key design decisions
ExprSourceLochas fieldsuri : Option Uri,range : SourceRange, andrelatedLocs : List (Option Uri × SourceRange)ExprSourceLoc.synthesized "<origin>"(e.g."semantics","factory","test", etc.) — there is noExprSourceLoc.noneCoe SourceRange ExprSourceLoccoercion provides backward compatibility for translators that don't yet carry URI informationrelatedLocsaccumulates additional source locations during expression substitution, preserving both the original expression location and the substituted argument locationeqModuloMeta/eraseMetadatainstead of==Translators updated
InputContext.fileNameAstNode.sourcepyLocExprSourceLoc.synthesizedwith appropriate origin tagsTesting
All existing tests pass. Test fixtures updated to use
ExprSourceLoc.synthesized "test".