Add Concrete Syntax for Unstructured Programs in Strata Core#1196
Open
PROgram52bc wants to merge 67 commits into
Open
Add Concrete Syntax for Unstructured Programs in Strata Core#1196PROgram52bc wants to merge 67 commits into
PROgram52bc wants to merge 67 commits into
Conversation
added 30 commits
May 5, 2026 11:24
Address PR review comments: note that the GOTO backend could be refactored into a two-stage pipeline (structured→cfg, then cfg→GOTO) to eliminate the pattern matching on Procedure.Body.
- Add Body.getCfg accessor (mirrors getStructured) - CoreToCBMC: throw error on CFG body instead of returning [] - StatementEval: interpret CFG bodies by linearizing blocks - Grammar: add comment explaining why 'branch' is used instead of 'if' (DDM registers tokens globally, causing conflict with if-statement)
… with CFG as body
- WF.lean: replace .stmts with case-split premises (bodyIsStructured) - Procedure.lean: replace isEmpty with isAbstract/isStructured/isCfg, remove Body.stmts definition - ProcedureInlining test: remove unused getStmts helper - Boole test: use explicit pattern matching with comment - Update all .body.stmts usages to explicit match expressions - Fix ProcBodyVerifyCorrect proof to use new WF structure
Replace 5 instances of the "match body with structured/cfg-error" pattern with the existing getStructured helper, which was defined but unused. Uses .mapError where the caller's error type differs from Except String. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `procedureToGotoCtxViaCFG`, a parallel pipeline that translates Core
procedures to CProver GOTO via the CFG representation:
Structured body → stmtsToCFG → coreCFGToGotoTransform → GOTO
CFG body → coreCFGToGotoTransform → GOTO
This coexists with the existing direct path in `procedureToGotoCtx`, which
remains unchanged.
Changes:
1. CFGToCProverGOTO.lean — close two gaps in `detCFGToGotoTransform`:
- Source locations: transfer command metadata is now used to derive
source locations via `metadataToSourceLoc` (previously ignored as
`_md`).
- Loop contracts: backward-edge GOTOs targeting loop entry blocks are
annotated with `#spec_loop_invariant` and `#spec_decreases` named
fields on the guard, matching CBMC's DFCC expectations. A post-
processing pass detects loop entries (by presence of contract
metadata on their condGoto transfer) and annotates GOTOs whose
target location ≤ source location (i.e., backward edges).
2. CoreToGOTOPipeline.lean — make `renameIdent`, `renameExpr`,
`renameCmd`, and `collectFuncDecls` non-private so the new pipeline
can reuse them.
3. CoreCFGToGOTOPipeline.lean (new) — contains:
- Rename helpers for Core commands (`CmdExt`): `renameCoreCommand`,
`renameCoreStmt`, `renameCoreDetCFG`.
- `coreCFGToGotoTransform`: Core-specific CFG-to-GOTO translation
that handles `CmdExt.call` (emits FUNCTION_CALL instructions) and
delegates `CmdExt.cmd` to `Cmd.toGotoInstructions`. Also handles
source locations and loop contract annotation.
- `procedureToGotoCtxViaCFG`: full pipeline wrapper mirroring
`procedureToGotoCtx` — renaming, type environment, axioms,
distinct declarations, contracts, lifted functions — but routing
through `stmtsToCFG` + `coreCFGToGotoTransform`.
4. E2E_CFGPipeline.lean (new) — 11 equivalence tests that run both
pipelines on the same Core programs and compare:
- Semantic instruction types (DECL, ASSIGN, ASSERT, ASSUME, etc.)
match between direct and CFG paths.
- Contract annotations (#spec_requires, #spec_ensures) match.
- Both paths produce valid, non-null JSON output.
Test programs cover: simple assert, var decl/assign, if-then-else,
contracts, axioms/distinct, free specs, cover, bitvector ops,
assume, multiple commands, and CFG-only output validation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a pull request description template that warns about the upcoming repository split. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Localized merge. --------- Co-authored-by: David Deng <htd@amazon.com>
The `StrataTestMain` test driver spawns `lean` on files under `StrataTestExtra/`, which is not declared as a `lean_lib`. Some of those files (e.g. `DDM/Integration/Java/TestGen.lean`) import modules from the `Strata` library that are not in `StrataTest`'s transitive closure, most notably `Strata.DDM.Integration.Java`. Before this change, `lake test` on a clean `.lake` would only build the closure reachable from `StrataTest` and then fail at subprocess time with: error: object file '.../Strata/DDM/Integration/Java.olean' of module Strata.DDM.Integration.Java does not exist Running `lake build` first masked the bug because the `Strata` library is in `defaultTargets`. Add `Strata` to the `needs` list of `StrataTestMain` so Lake builds the full `Strata` library before the test driver runs, regardless of whether `lake build` was run beforehand. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
…ove unlabeled exit from Core (#1141) *Issue #, if available:* Closes #372 *Description of changes:* Formalizing scoped variable initialization: - Adds scoped environment semantics to blocks: step_block_done now projects the store through the parent store via projectStore, discarding block-local variables on exit. This applies to both StepStmt (deterministic) and StepKleene (nondeterministic). - Adds PostWF helper definition in Specification.lean for postcondition stability under projectStore. - Adds .block constructor to KleeneStmt and corresponding step_block/step_block_body/step_block_done to StepKleene, mirroring the deterministic block semantics. - Changes StmtToKleeneStmt (.block _ bss _) to produce .block b (wrapping in a Kleene block), so the Kleene translation preserves scoping. Removing unlabeled `exit` command (`exit;`, not `exit lbl;`): - The unlabeled `exit` command doesn't have clear meaning when it is inside `while`. In `while cond { ... exit; ... }`, is `exit` equivalent to `continue` or `break` in C/Python? No translation to Core was introducing the unlabeled `exit`, and the small-step semantics wasn't clear about the meaning of `exit` inside a loop. - The type checker of Core (Strata/Languages/Core/StatementType.lean) fails when an unlabeled `exit` appears inside the body of a while loop / if statement (unless it is wrapped by another nested block), but the DDM syntax of core allows it, so from user's perspective this gap is kind of surprising. - This patch removes the unlabeled `exit` case because it can always be simulated by a labeled block + labeled exit. On top of this, this also renames `touchedVars` to `modifiedOrDefinedVars` for clarity, and instead makes `touchedVars` all vars that are read + modified + defined. ### How to review? - Strata/DL/Imperative/StmtSemantics.lean has the most important update: the `.block` constructor now has the input store which will be used to 'project' the output store to the variables that have been defined before. Also, any block-exiting small steps like `step_block_done` will do the projection, to define variables which were defined inside the inner scope. - Correspondingly, the Kleene language was updated to add a notion of block and scoping to its syntax and semantics, otherwise DetToKleene doesn't prove. (Strata/DL/Imperative/KleeneStmt.lean, Strata/DL/Imperative/KleeneStmtSemantics.lean) - StrataTest/DL/Imperative/StepStmtTest.lean has a few new tests that show scoping works well. - DetToKleeneCorrect.lean and ProcBodyVerifyCorrect.lean shows that the top-level statements are not touched after this update. - Strata/DL/Imperative/HasVars.lean has the `touchedVars` update and Strata/Languages/Core/StatementSemantics.lean has some additional well-formedness about evaluator extension that are useful. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
…1109) translateSort already handles .prim .string, but translateTerm had no arm for .prim (.string _): any SMT string literal produced by the encoder would fall through to the catch-all and raise 'unsupported term'. Add the missing arm plus the two string operations already supported in Denote.lean so that the two translators agree on the end-to-end-supported subset. Specifically, add arms for: - .prim (.string s) -> (mkString, toExpr s) - .app .str_length [s] _ -> (mkInt, Int.ofNat s.length) - .app .str_concat as _ -> (mkString, leftAssocOp mkStringAppend as) mkStringAppend uses instHAppendOfAppend + instAppendString, which is the same instance chain Lean elaborates inferInstance to for HAppend String String String. Regression tests in StrataTest/DL/SMT/TranslateTests.lean cover all three arms; each fails on origin/main (catch-all throw observable via #guard_msgs mismatch) and passes with this change. I additionally verified under a Meta.check-enabled harness that the produced Expr type-checks in the kernel, so the (fst = mkString/mkInt, snd = ...) pairs are internally consistent. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: Kiro <kiro-agent@users.noreply.github.com> Co-authored-by: keyboardDrummer-bot <keyboardDrummer-bot@users.noreply.github.com>
…1140) Fixes #1139 Introduces a `Provenance` type that fully replaces `FileRange` and `SourceRange` as the canonical way to store source locations in metadata. This structurally eliminates the "SourceRange without file name" problem. The `Provenance` type has two constructors: - `Provenance.loc uri range` — a real source location (always requires both URI and range) - `Provenance.synthesized origin` — a node created programmatically, with a `SynthesizedOrigin` inductive type A `SynthesizedOrigin` inductive enforces that only canonical origins are used (`smtEncode`, `nondetIte`, `laurelParse`, `laurel`, `laurelToCore`, `structuredToUnstructured`), preventing typos and duplicates at the type level. Key changes: - The `.fileRange` variant is removed from `MetaDataElem.Value` — all metadata values now use `.provenance` exclusively - `MetaData.ofSourceRange` emits only a provenance element - `getProvenance` is the single source of truth for reading source locations from metadata - `setCallSiteFileRange` works directly with `Provenance` instead of roundtripping through `FileRange` - `getFileRange` delegates to `getProvenance` for extraction - `FileRange.unknown` and `SourceRange.none` eliminated from metadata construction - SMT DDM translator uses `smtAnn` combinator to reduce annotation boilerplate - SARIF output uses `getFileRange` (which reads provenance) `FileRange` remains as a utility struct for extraction and formatting (e.g., in `DiagnosticModel`), but is no longer a metadata value type. Existing tests pass unchanged. ## Follow-up - Migrate the B3, Boole, and Python grammar ASTs from using `SourceRange` as their annotation type parameter to `Provenance`. This would allow combining multiple files at the AST level and enable proper provenance tracking through translation passes.
Conflicts: - Strata/Transform/StructuredToUnstructured.lean: keep branch's loop-contract metadata (specLoopInvariant/specDecreases pushed to the transfer) and exit metadata propagation, while adopting main's String-only exit label and synthesizedMd-tagged synthesized commands. - Strata/Transform/ProcBodyVerifyCorrect.lean: adopt main's new Config.block (Option String × σ_parent × inner) shape, but keep the branch's destructured `ss : List Statement` (from procToVerifyStmt_is_structured) instead of `proc.body : Procedure.Body`. Follow-on fixes: - ProcBodyVerifyCorrect.lean lines 856/859/862: replace `proc.body` with `ss` so the new wf-preservation lemmas (core_wfVar_preserved, core_wfCong_preserved, core_wfExprCongr_preserved) typecheck. - Loops.lean / Exit.lean snapshots: replace `[fileRange]` tag with `[provenance]` and update shifted byte offsets to match the new Provenance metadata format introduced by #1140.
…seArrayTheory option. (#1145) *Description of changes:* **Bug:** When `useArrayTheory` is enabled, the SMT encoder correctly converts `Map` types to `Array` in variable declarations and function signatures, but it did *not* perform this conversion for fields inside ADT (algebraic data type) constructor declarations. This caused a type mismatch: a datatype field would be declared with type `Map` while the rest of the encoding used `Array`. **Fix:** Thread the `useArrayTheory` flag through the datatype emission pipeline: - `lMonoTyToTermType` now accepts `useArrayTheory` and converts `Map` to `Array` when enabled. - `datatypeConstructorsToSMT` passes the flag to `lMonoTyToTermType`. - `SMT.Context.emitDatatypes` accepts and forwards the flag. - `encodeCore` in `Verifier.lean` passes `options.useArrayTheory` to `emitDatatypes`. **Test:** Added a unit test in `SMTEncoderDatatypeTest.lean` that verifies a datatype with a `Map`-typed field emits `(Array Int Int)` when `useArrayTheory=true` and `(Map Int Int)` when `useArrayTheory=false`. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: keyboardDrummer-bot <keyboardDrummer-bot@users.noreply.github.com>
…structs (#1165) Fixes #1158 Supersedes #1036 ## Summary Fixes several issues with `Core.formatProgram` that prevented round-trip parsing: 1. **`inline function` formatted without space** — The `inline` grammar op now includes a trailing space, preventing `inlinefunction` concatenation. The formatter also now emits the `inline` attribute when present on functions. 2. **Quantifier variable names** — The formatter now uses the `prettyName` field from `LExpr.quant` instead of generating `__qN` names, preserving the original variable names through formatting. 3. **Overflow predicates** — Added grammar entries, formatter handling, parser entries, and Factory operations for all bitvector overflow predicates (`SNegOverflow`, `UNegOverflow`, `SAddOverflow`, `SSubOverflow`, `SMulOverflow`, `SDivOverflow`, `UAddOverflow`, `USubOverflow`, `UMulOverflow`). `SNegOverflow` and `UNegOverflow` use distinct grammar entries (`Bv.SNegOverflow` / `Bv.UNegOverflow`) to preserve their different semantics through roundtrip. The translate direction correctly dispatches overflow ops by type parameter (bv1/bv8/bv16/bv32/bv64), logging an error for unsupported widths. 4. **Array assignment (`lhsArray`)** — Implemented `m[k] := v` translation in both directions: the parser decomposes nested LHS into base identifier + indices and builds `map_update` expressions; the formatter detects the `map_update(var, idx, val)` pattern and produces `lhsArray` syntax. 5. **`Sequence.empty`** — Added grammar entry with explicit type annotation syntax (`Sequence.empty<int>()`), plus translate and format handling, resolving the 0-ary polymorphic function limitation for this operation. Includes solver verification tests (cherry-picked from #1036) covering basic usage and various element types. 6. **Datatype roundtrip** — Verified that datatype formatting roundtrips correctly (the extra-parentheses issue noted earlier is resolved). 7. **Roundtrip test infrastructure** — Added `RoundtripTest.lean` that verifies parse → format → re-parse → re-format produces stable output for types, functions, procedures, inline functions, parameterized type arguments, datatypes, array assignments, and `Sequence.empty`. 8. **`getLFuncCall` reuse** — Refactored `lappToExpr` and `decomposeMapUpdate` to use `getLFuncCall` for decomposing nested applications, reducing code duplication with the existing utility. ## Testing - All existing tests pass (with expected output updates for the formatting improvements) - New roundtrip tests verify types, functions, procedures, inline functions, parameterized type arguments, datatypes, lhsArray, and Sequence.empty - Solver verification tests for Sequence.empty (from #1036) pass ## Remaining limitations - Bitvector operations with widths beyond 1/8/16/32/64 log an error and fall back to bv64 (requires grammar-level type entries per width) --------- Co-authored-by: Aaron Tomb <aarotomb@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Support for Unstructured Programs in Strata Core
Description of changes
Prior to this PR, the only way to obtain unstructured programs was to apply
StructuredToUnstructuredtransformation..core.stfile examples illustrating the syntax of unstructured programs.Procedure.bodyto handle thecfgcase based on the contextAdaptation Methods for
Procedure.bodyUsespipelines, graph-level inlining).
Procedure.Bodysum type.(Bolded items are updates since the initial PR draft)
postconditionsValidadapted withCoreBodyExecTransform/CoreSpecification.leanProcedureCorrect.postconditionsValidTransform/ProcBodyVerifyCorrect.leanprocToVerifyStmt_is_structured(new helper theorem)sswitnesses viasubstTransform/ProcBodyVerifyCorrect.leanprocBodyVerify_procedureCorrectLanguages/Core/ObligationExtraction.leanextractGo_okproofLanguages/Core/ProcedureEval.leanevalLanguages/Core/ProcedureType.leantypeCheck,checkModificationRightsprocedureToGotoCtxViaCFGBackends/CBMC/GOTO/CoreToCProverGOTO.leantransformToGotoprocedureToGotoCtxViaCFGBackends/CBMC/GOTO/CoreToGOTOPipeline.leanprocedureToGotoCtxprocedureToGotoCtxViaCFGStrataTest/.../E2E_CoreToGOTO.leancoreToGotoJsonWithSummaryprocedureToGotoCtxViaCFGBackends/CBMC/CoreToCBMC.leancreateImplementationSymbolFromASTextractCallsFromStatementsvsextractCallsFromDetCFGLanguages/Core/CallGraph.leanextractCallsFromProcedureblockToCSTvsdetCFGToCSTLanguages/Core/DDMTransform/FormatCore.leanprocToCSTrunStmtvsrunCFGLanguages/Core/StatementEval.leanCommand.runCallCoreBodyExecwith two constructors, andCoreStepStarvsCoreCFGStepStarLanguages/Core/StatementSemantics.leanEvalCommand.call_semLang.corevsLang.coreCFGTransform/CoreSpecification.leanAssertValidInProceduretransformStmtsvstransformDetCFGTransform/PrecondElim.leanprecondElimeraseTypes,stripMetaData,getVarsimplemented and used for CFGLanguages/Core/Procedure.leaneraseTypes,stripMetaData,getVarsextractFromStatementsvsextractFromDetCFGLanguages/Core/ObligationExtraction.leanextractObligationsreturn .noneon CFG, skip inlining, because the inlined procedure could be structured or unstructured. Semantics is largely undefined.Transform/ProcedureInlining.leaninlineCallCmdstmtsToCFGvs identity, latter is a no-op.StrataTest/.../Loops.leansingleCFGTransform/LoopElim.leanloopElim[]for CFG (no local funcDecls in CFG bodies)Languages/Core/Core.leanbuildEnvLanguages/Core/ObligationExtraction.leanextractFromDetCFGthrowon CFG, due to incompatible return type ofList Statementinstead ofList CommandTransform/CoreTransform.leanrunProgramthrowon CFG, VC for CFG bodies to be implemented, proof to be adapted as well.Transform/ProcBodyVerify.leanprocToVerifyStmtTransform/ANFEncoder.leananfEncodeProgramTransform/ProcedureInlining.leanrenameAllLocalNamesStrataTest/Boole/global_readonly_call.leancallHelperwfstmts,wfloclnd,bodyExitsCoveredconditioned on structured procedures, props for CFG to be implementedLanguages/Core/WF.leanWFProcedurePropBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.