Skip to content

Add Concrete Syntax for Unstructured Programs in Strata Core#1196

Open
PROgram52bc wants to merge 67 commits into
main2from
htd/unstructured-procedure
Open

Add Concrete Syntax for Unstructured Programs in Strata Core#1196
PROgram52bc wants to merge 67 commits into
main2from
htd/unstructured-procedure

Conversation

@PROgram52bc
Copy link
Copy Markdown
Contributor

Add Support for Unstructured Programs in Strata Core

Description of changes

  • Adds syntax and parser support for unstructured (CFG-based) procedure bodies in Strata Core. For example,
cfg entry { 
    // Deterministic block (default): 
    entry: {  
      x := 0; 
      if (x < n) goto entry else done;       
    } 
    done: {
      return; 
    }
  }
  
cfg entry {
    // Nondeterministic block:   
    entry: {   
      y := y + 1;     
      goto entry, done;   // nondeterministic choice 
    }
    done: {
      return; 
    }
  }

Prior to this PR, the only way to obtain unstructured programs was to apply StructuredToUnstructured transformation.

  • Adds .core.st file examples illustrating the syntax of unstructured programs.
  • Introduces Procedure.Body as a sum type:
  inductive Procedure.Body where
    | structured : List Statement → Procedure.Body
    | cfg : DetCFG → Procedure.Body
  • Adapts uses of Procedure.body to handle the cfg case based on the context
  • Adds metadata support to unstructured programs (new since initial PR draft)
  • Propagates loop invariants/decreases measure to unstructured CFG during transformation (new since initial PR draft)
  • Adds CFG-based Core-to-GOTO pipeline alongside direct path (new since initial PR draft)

Adaptation Methods for Procedure.body Uses

  1. CFG (Implemented) — Functions that handle both structured and CFG bodies with meaningful separate logic for each case.
  2. CFG (N/A) — Operations where CFGs legitimately contribute nothing (e.g., no local funcDecls, no structured loops), so returning [] or skipping is semantically correct.
  3. CFG (deferred) — Operations that currently error on CFGs or produce placeholder results, where proper CFG support is desirable but requires non-trivial additional work (e.g., dominator analysis, two-stage
    pipelines, graph-level inlining).
  4. Proof (Implemented) — Formal correctness proofs adapted to the Procedure.Body sum type.
  5. Proof (deferred) — Proof obligations where CFG correctness is not yet captured; the current formulation is trivially true for CFGs and is left for future PRs.

(Bolded items are updates since the initial PR draft)

Category Specific Strategy File Function/Location
Proof (Implemented) postconditionsValid adapted with CoreBodyExec Transform/CoreSpecification.lean ProcedureCorrect.postconditionsValid
Proves structured if transform succeeds Transform/ProcBodyVerifyCorrect.lean procToVerifyStmt_is_structured (new helper theorem)
Unifies two ss witnesses via subst Transform/ProcBodyVerifyCorrect.lean procBodyVerify_procedureCorrect
Adjusted case splits Languages/Core/ObligationExtraction.lean extractGo_ok proof
CFG (Implemented) Symbolic evaluation of CFG bodies using fuel measure, without path merging. Languages/Core/ProcedureEval.lean eval
Type-check CFG bodies (labels, vars, targets) implemented. Languages/Core/ProcedureType.lean typeCheck, checkModificationRights
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/GOTO/CoreToCProverGOTO.lean transformToGoto
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/GOTO/CoreToGOTOPipeline.lean procedureToGotoCtx
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG StrataTest/.../E2E_CoreToGOTO.lean coreToGotoJsonWithSummary
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/CoreToCBMC.lean createImplementationSymbolFromAST
extractCallsFromStatements vs extractCallsFromDetCFG Languages/Core/CallGraph.lean extractCallsFromProcedure
blockToCST vs detCFGToCST Languages/Core/DDMTransform/FormatCore.lean procToCST
runStmt vs runCFG Languages/Core/StatementEval.lean Command.runCall
Use CoreBodyExec with two constructors, and CoreStepStar vs CoreCFGStepStar Languages/Core/StatementSemantics.lean EvalCommand.call_sem
Lang.core vs Lang.coreCFG Transform/CoreSpecification.lean AssertValidInProcedure
transformStmts vs transformDetCFG Transform/PrecondElim.lean precondElim
eraseTypes, stripMetaData, getVars implemented and used for CFG Languages/Core/Procedure.lean eraseTypes, stripMetaData, getVars
extractFromStatements vs extractFromDetCFG Languages/Core/ObligationExtraction.lean extractObligations
CFG (N/A) return .none on CFG, skip inlining, because the inlined procedure could be structured or unstructured. Semantics is largely undefined. Transform/ProcedureInlining.lean inlineCallCmd
stmtsToCFG vs identity, latter is a no-op. StrataTest/.../Loops.lean singleCFG
Remove loops in structured, pass CFG through because it has no loops Transform/LoopElim.lean loopElim
[] for CFG (no local funcDecls in CFG bodies) Languages/Core/Core.lean buildEnv
CFG (deferred) Dominator-based path-condition propagation to be implemented Languages/Core/ObligationExtraction.lean extractFromDetCFG
throw on CFG, due to incompatible return type of List Statement instead of List Command Transform/CoreTransform.lean runProgram
throw on CFG, VC for CFG bodies to be implemented, proof to be adapted as well. Transform/ProcBodyVerify.lean procToVerifyStmt
Encode structured procedures, handling for CFG procedures to be implemented. Transform/ANFEncoder.lean anfEncodeProgram
CFG body renaming for inlining to be implemented Transform/ProcedureInlining.lean renameAllLocalNames
Call extraction for CFG bodies to be implemented StrataTest/Boole/global_readonly_call.lean callHelper
Proof (deferred) WF Properties wfstmts, wfloclnd, bodyExitsCovered conditioned on structured procedures, props for CFG to be implemented Languages/Core/WF.lean WFProcedureProp

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

David Deng added 30 commits May 5, 2026 11:24
Address PR review comments: note that the GOTO backend could be
refactored into a two-stage pipeline (structured→cfg, then cfg→GOTO)
to eliminate the pattern matching on Procedure.Body.
- Add Body.getCfg accessor (mirrors getStructured)
- CoreToCBMC: throw error on CFG body instead of returning []
- StatementEval: interpret CFG bodies by linearizing blocks
- Grammar: add comment explaining why 'branch' is used instead of 'if'
  (DDM registers tokens globally, causing conflict with if-statement)
- WF.lean: replace .stmts with case-split premises (bodyIsStructured)
- Procedure.lean: replace isEmpty with isAbstract/isStructured/isCfg,
  remove Body.stmts definition
- ProcedureInlining test: remove unused getStmts helper
- Boole test: use explicit pattern matching with comment
- Update all .body.stmts usages to explicit match expressions
- Fix ProcBodyVerifyCorrect proof to use new WF structure
David Deng and others added 16 commits May 12, 2026 16:41
Replace 5 instances of the "match body with structured/cfg-error"
pattern with the existing getStructured helper, which was defined
but unused. Uses .mapError where the caller's error type differs
from Except String.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `procedureToGotoCtxViaCFG`, a parallel pipeline that translates Core
procedures to CProver GOTO via the CFG representation:

  Structured body → stmtsToCFG → coreCFGToGotoTransform → GOTO
  CFG body → coreCFGToGotoTransform → GOTO

This coexists with the existing direct path in `procedureToGotoCtx`, which
remains unchanged.

Changes:

1. CFGToCProverGOTO.lean — close two gaps in `detCFGToGotoTransform`:
   - Source locations: transfer command metadata is now used to derive
     source locations via `metadataToSourceLoc` (previously ignored as
     `_md`).
   - Loop contracts: backward-edge GOTOs targeting loop entry blocks are
     annotated with `#spec_loop_invariant` and `#spec_decreases` named
     fields on the guard, matching CBMC's DFCC expectations. A post-
     processing pass detects loop entries (by presence of contract
     metadata on their condGoto transfer) and annotates GOTOs whose
     target location ≤ source location (i.e., backward edges).

2. CoreToGOTOPipeline.lean — make `renameIdent`, `renameExpr`,
   `renameCmd`, and `collectFuncDecls` non-private so the new pipeline
   can reuse them.

3. CoreCFGToGOTOPipeline.lean (new) — contains:
   - Rename helpers for Core commands (`CmdExt`): `renameCoreCommand`,
     `renameCoreStmt`, `renameCoreDetCFG`.
   - `coreCFGToGotoTransform`: Core-specific CFG-to-GOTO translation
     that handles `CmdExt.call` (emits FUNCTION_CALL instructions) and
     delegates `CmdExt.cmd` to `Cmd.toGotoInstructions`. Also handles
     source locations and loop contract annotation.
   - `procedureToGotoCtxViaCFG`: full pipeline wrapper mirroring
     `procedureToGotoCtx` — renaming, type environment, axioms,
     distinct declarations, contracts, lifted functions — but routing
     through `stmtsToCFG` + `coreCFGToGotoTransform`.

4. E2E_CFGPipeline.lean (new) — 11 equivalence tests that run both
   pipelines on the same Core programs and compare:
   - Semantic instruction types (DECL, ASSIGN, ASSERT, ASSUME, etc.)
     match between direct and CFG paths.
   - Contract annotations (#spec_requires, #spec_ensures) match.
   - Both paths produce valid, non-null JSON output.
   Test programs cover: simple assert, var decl/assign, if-then-else,
   contracts, axioms/distinct, free specs, cover, bitvector ops,
   assume, multiple commands, and CFG-only output validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a pull request description template that warns about the upcoming
repository split.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
Localized merge.

---------

Co-authored-by: David Deng <htd@amazon.com>
The `StrataTestMain` test driver spawns `lean` on files under
`StrataTestExtra/`, which is not declared as a `lean_lib`. Some of those
files (e.g. `DDM/Integration/Java/TestGen.lean`) import modules from the
`Strata` library that are not in `StrataTest`'s transitive closure, most
notably `Strata.DDM.Integration.Java`.

Before this change, `lake test` on a clean `.lake` would only build the
closure reachable from `StrataTest` and then fail at subprocess time
with:

  error: object file '.../Strata/DDM/Integration/Java.olean' of module
  Strata.DDM.Integration.Java does not exist

Running `lake build` first masked the bug because the `Strata` library
is in `defaultTargets`.

Add `Strata` to the `needs` list of `StrataTestMain` so Lake builds the
full `Strata` library before the test driver runs, regardless of whether
`lake build` was run beforehand.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
…ove unlabeled exit from Core (#1141)

*Issue #, if available:*

Closes #372

*Description of changes:*

Formalizing scoped variable initialization:

- Adds scoped environment semantics to blocks: step_block_done now
projects the store through the parent store via projectStore, discarding
block-local variables on exit. This applies to both StepStmt
(deterministic) and StepKleene
  (nondeterministic).
- Adds PostWF helper definition in Specification.lean for postcondition
stability under projectStore.
- Adds .block constructor to KleeneStmt and corresponding
step_block/step_block_body/step_block_done to StepKleene, mirroring the
deterministic block semantics.
- Changes StmtToKleeneStmt (.block _ bss _) to produce .block b
(wrapping in a Kleene block), so the Kleene translation preserves
scoping.

Removing unlabeled `exit` command (`exit;`, not `exit lbl;`):

- The unlabeled `exit` command doesn't have clear meaning when it is
inside `while`. In `while cond { ... exit; ... }`, is `exit` equivalent
to `continue` or `break` in C/Python? No translation to Core was
introducing the unlabeled `exit`, and the small-step semantics wasn't
clear about the meaning of `exit` inside a loop.
- The type checker of Core (Strata/Languages/Core/StatementType.lean)
fails when an unlabeled `exit` appears inside the body of a while loop /
if statement (unless it is wrapped by another nested block), but the DDM
syntax of core allows it, so from user's perspective this gap is kind of
surprising.
- This patch removes the unlabeled `exit` case because it can always be
simulated by a labeled block + labeled exit.

On top of this, this also renames `touchedVars` to
`modifiedOrDefinedVars` for clarity, and instead makes `touchedVars` all
vars that are read + modified + defined.

### How to review?

- Strata/DL/Imperative/StmtSemantics.lean has the most important update:
the `.block` constructor now has the input store which will be used to
'project' the output store to the variables that have been defined
before. Also, any block-exiting small steps like `step_block_done` will
do the projection, to define variables which were defined inside the
inner scope.
- Correspondingly, the Kleene language was updated to add a notion of
block and scoping to its syntax and semantics, otherwise DetToKleene
doesn't prove. (Strata/DL/Imperative/KleeneStmt.lean,
Strata/DL/Imperative/KleeneStmtSemantics.lean)
- StrataTest/DL/Imperative/StepStmtTest.lean has a few new tests that
show scoping works well.
- DetToKleeneCorrect.lean and ProcBodyVerifyCorrect.lean shows that the
top-level statements are not touched after this update.
- Strata/DL/Imperative/HasVars.lean has the `touchedVars` update and
Strata/Languages/Core/StatementSemantics.lean has some additional
well-formedness about evaluator extension that are useful.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
…1109)

translateSort already handles .prim .string, but translateTerm had no
arm for .prim (.string _): any SMT string literal produced by the
encoder would fall through to the catch-all and raise 'unsupported
term'. Add the missing arm plus the two string operations already
supported in Denote.lean so that the two translators agree on the
end-to-end-supported subset.

Specifically, add arms for:
- .prim (.string s) -> (mkString, toExpr s)
- .app .str_length [s] _ -> (mkInt, Int.ofNat s.length)
- .app .str_concat as _ -> (mkString, leftAssocOp mkStringAppend as)

mkStringAppend uses instHAppendOfAppend + instAppendString, which is the
same instance chain Lean elaborates inferInstance to for HAppend String
String String.

Regression tests in StrataTest/DL/SMT/TranslateTests.lean cover all
three arms; each fails on origin/main (catch-all throw observable via
#guard_msgs mismatch) and passes with this change. I additionally
verified under a Meta.check-enabled harness that the produced Expr
type-checks in the kernel, so the (fst = mkString/mkInt, snd = ...)
pairs are internally consistent.


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Co-authored-by: keyboardDrummer-bot <keyboardDrummer-bot@users.noreply.github.com>
…1140)

Fixes #1139

Introduces a `Provenance` type that fully replaces `FileRange` and
`SourceRange` as the canonical way to store source locations in
metadata. This structurally eliminates the "SourceRange without file
name" problem.

The `Provenance` type has two constructors:
- `Provenance.loc uri range` — a real source location (always requires
both URI and range)
- `Provenance.synthesized origin` — a node created programmatically,
with a `SynthesizedOrigin` inductive type

A `SynthesizedOrigin` inductive enforces that only canonical origins are
used (`smtEncode`, `nondetIte`, `laurelParse`, `laurel`, `laurelToCore`,
`structuredToUnstructured`), preventing typos and duplicates at the type
level.

Key changes:
- The `.fileRange` variant is removed from `MetaDataElem.Value` — all
metadata values now use `.provenance` exclusively
- `MetaData.ofSourceRange` emits only a provenance element
- `getProvenance` is the single source of truth for reading source
locations from metadata
- `setCallSiteFileRange` works directly with `Provenance` instead of
roundtripping through `FileRange`
- `getFileRange` delegates to `getProvenance` for extraction
- `FileRange.unknown` and `SourceRange.none` eliminated from metadata
construction
- SMT DDM translator uses `smtAnn` combinator to reduce annotation
boilerplate
- SARIF output uses `getFileRange` (which reads provenance)

`FileRange` remains as a utility struct for extraction and formatting
(e.g., in `DiagnosticModel`), but is no longer a metadata value type.

Existing tests pass unchanged.

## Follow-up

- Migrate the B3, Boole, and Python grammar ASTs from using
`SourceRange` as their annotation type parameter to `Provenance`. This
would allow combining multiple files at the AST level and enable proper
provenance tracking through translation passes.
Conflicts:
- Strata/Transform/StructuredToUnstructured.lean: keep branch's
  loop-contract metadata (specLoopInvariant/specDecreases pushed to the
  transfer) and exit metadata propagation, while adopting main's
  String-only exit label and synthesizedMd-tagged synthesized commands.
- Strata/Transform/ProcBodyVerifyCorrect.lean: adopt main's new
  Config.block (Option String × σ_parent × inner) shape, but keep the
  branch's destructured `ss : List Statement` (from procToVerifyStmt_is_structured)
  instead of `proc.body : Procedure.Body`.

Follow-on fixes:
- ProcBodyVerifyCorrect.lean lines 856/859/862: replace `proc.body` with
  `ss` so the new wf-preservation lemmas (core_wfVar_preserved,
  core_wfCong_preserved, core_wfExprCongr_preserved) typecheck.
- Loops.lean / Exit.lean snapshots: replace `[fileRange]` tag with
  `[provenance]` and update shifted byte offsets to match the new
  Provenance metadata format introduced by #1140.
…seArrayTheory option. (#1145)

*Description of changes:*

**Bug:** When `useArrayTheory` is enabled, the SMT encoder correctly
converts `Map` types to `Array` in variable declarations and function
signatures, but it did *not* perform this conversion for fields inside
ADT (algebraic data type) constructor declarations. This caused a type
mismatch: a datatype field would be declared with type `Map` while the
rest of the encoding used `Array`.

**Fix:** Thread the `useArrayTheory` flag through the datatype emission
pipeline:
- `lMonoTyToTermType` now accepts `useArrayTheory` and converts `Map` to
`Array` when enabled.
- `datatypeConstructorsToSMT` passes the flag to `lMonoTyToTermType`.
- `SMT.Context.emitDatatypes` accepts and forwards the flag.
- `encodeCore` in `Verifier.lean` passes `options.useArrayTheory` to
`emitDatatypes`.

**Test:** Added a unit test in `SMTEncoderDatatypeTest.lean` that
verifies a datatype with a `Map`-typed field emits `(Array Int Int)`
when `useArrayTheory=true` and `(Map Int Int)` when
`useArrayTheory=false`.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: keyboardDrummer-bot <keyboardDrummer-bot@users.noreply.github.com>
…structs (#1165)

Fixes #1158
Supersedes #1036

## Summary

Fixes several issues with `Core.formatProgram` that prevented round-trip
parsing:

1. **`inline function` formatted without space** — The `inline` grammar
op now includes a trailing space, preventing `inlinefunction`
concatenation. The formatter also now emits the `inline` attribute when
present on functions.

2. **Quantifier variable names** — The formatter now uses the
`prettyName` field from `LExpr.quant` instead of generating `__qN`
names, preserving the original variable names through formatting.

3. **Overflow predicates** — Added grammar entries, formatter handling,
parser entries, and Factory operations for all bitvector overflow
predicates (`SNegOverflow`, `UNegOverflow`, `SAddOverflow`,
`SSubOverflow`, `SMulOverflow`, `SDivOverflow`, `UAddOverflow`,
`USubOverflow`, `UMulOverflow`). `SNegOverflow` and `UNegOverflow` use
distinct grammar entries (`Bv.SNegOverflow` / `Bv.UNegOverflow`) to
preserve their different semantics through roundtrip. The translate
direction correctly dispatches overflow ops by type parameter
(bv1/bv8/bv16/bv32/bv64), logging an error for unsupported widths.

4. **Array assignment (`lhsArray`)** — Implemented `m[k] := v`
translation in both directions: the parser decomposes nested LHS into
base identifier + indices and builds `map_update` expressions; the
formatter detects the `map_update(var, idx, val)` pattern and produces
`lhsArray` syntax.

5. **`Sequence.empty`** — Added grammar entry with explicit type
annotation syntax (`Sequence.empty<int>()`), plus translate and format
handling, resolving the 0-ary polymorphic function limitation for this
operation. Includes solver verification tests (cherry-picked from #1036)
covering basic usage and various element types.

6. **Datatype roundtrip** — Verified that datatype formatting roundtrips
correctly (the extra-parentheses issue noted earlier is resolved).

7. **Roundtrip test infrastructure** — Added `RoundtripTest.lean` that
verifies parse → format → re-parse → re-format produces stable output
for types, functions, procedures, inline functions, parameterized type
arguments, datatypes, array assignments, and `Sequence.empty`.

8. **`getLFuncCall` reuse** — Refactored `lappToExpr` and
`decomposeMapUpdate` to use `getLFuncCall` for decomposing nested
applications, reducing code duplication with the existing utility.

## Testing

- All existing tests pass (with expected output updates for the
formatting improvements)
- New roundtrip tests verify types, functions, procedures, inline
functions, parameterized type arguments, datatypes, lhsArray, and
Sequence.empty
- Solver verification tests for Sequence.empty (from #1036) pass

## Remaining limitations

- Bitvector operations with widths beyond 1/8/16/32/64 log an error and
fall back to bv64 (requires grammar-level type entries per width)

---------

Co-authored-by: Aaron Tomb <aarotomb@amazon.com>
@PROgram52bc PROgram52bc changed the title Htd/unstructured procedure Add Support for Unstructured Programs in Strata Core May 20, 2026
@github-actions github-actions Bot added Laurel Python Core GOTO dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code SMT Git conflicts labels May 20, 2026
@PROgram52bc PROgram52bc changed the title Add Support for Unstructured Programs in Strata Core Add Concrete Syntax for Unstructured Programs in Strata Core May 20, 2026
@PROgram52bc PROgram52bc marked this pull request as ready for review May 20, 2026 21:13
@PROgram52bc PROgram52bc requested a review from a team as a code owner May 20, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core dependencies Pull requests that update a dependency file Git conflicts github_actions Pull requests that update GitHub Actions code GOTO Laurel Python SMT Waiting-For-Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants