baml-experimental

A fork of BAML — an open-source typed language for building reliable LLM workflows and agents. This repository tracks the upstream project and layers on four experimental capabilities.

Read about upstream BAML first: docs.boundaryml.com and the main repo at github.com/BoundaryML/baml.

Everything upstream still works exactly as documented — this fork is strictly additive. The experimental features live alongside the standard BAML workspace.

What this fork adds

1. Tower of BAML — nondeterministic verifiable transpilation

./tower_of_baml/ — a CLI that transpiles a Python function into BAML and then uses Z3 to prove the two are equivalent for every possible input. It generates candidate BAML with claude -p and rejects any candidate that isn't formally verified by an SMT solver.

Four verification modes:

arithmetic — integer functions, Z3 integer theory
loop — while loops with LLM-generated loop invariants and Hoare-triple discharge
string — string-concatenation functions, Z3 string theory
http — OpenAI-style chat-completion calls via http.client, verified by rendering both sides with a sentinel and proving byte-identity of the outgoing HTTP request via Z3 string theory

Based on the LLMLift methodology (Bhatia et al., NeurIPS 2024, arXiv:2406.03003). One verified candidate usually lands on the first LLM attempt; the Z3 unsat result is a proof of equivalence over all inputs, not a batch of passing tests.

cd tower_of_baml && cargo build --release
./target/release/tower examples/add.py --mode arithmetic
./target/release/tower examples/sum_to.py --mode loop \
    --pre "n >= 0" --spec "n*(n+1)/2"
./target/release/tower examples/build_url.py --mode string
./target/release/tower examples/classify.py --mode http

See tower_of_baml/README.md and tower_of_baml/docs/ARCHITECTURE.md for the full design.

Install pre-built binaries (macOS arm64/x86_64, Linux x86_64):

curl -sSf https://raw.githubusercontent.com/gambitinc/baml-experimental/canary/install.sh | bash

The installer detects your platform, downloads the matching release tarball, drops tower and city into ~/.local/bin/, and checks for the three runtime deps (z3, baml-cli, claude CLI). Maintainers cutting a release: see docs/RELEASE.md.

./city/ is the codebase-scale companion: walks an entire Python tree, auto-classifies each file into a tower mode (or skips it), runs tower on every candidate, and mirrors the result into a sibling BAML tree — the original source is untouched. Loop-mode files declare their precondition and spec via @tower-pre: / @tower-spec: docstring annotations; everything else is fully automatic.

cd city && cargo build --release
./target/release/city \
    ../tower_of_baml/tests/sample_codebase \
    /tmp/sample_baml -v

A small fixture codebase lives at tower_of_baml/tests/sample_codebase/ covering arithmetic, string, loop, and various non-transpilable patterns — useful for testing both city and tower together.

city in action

A realistic demo lives at examples/content_pipeline/ — a twelve-file research-assistant workflow (ingest/, agents/, utils/, nontranspilable/, config/). It covers every tower mode and every skip reason. The following is a literal play-by-play of running

./city/target/release/city \
    examples/content_pipeline \
    /tmp/pipeline_baml -v

Step 1 — Walk the source tree. city's WalkDir enumerates every file under content_pipeline/ depth-first. Directories are mirrored into the destination as empty stubs; the README.md is copied verbatim (not a .py file).

Step 2 — config/settings.py (no function). city's classifier parses the source, finds no top-level def, emits Mode2::Skip { reason: "no_top_level_function" }. Copied through to /tmp/pipeline_baml/config/settings.py unchanged.

[skip] config/settings.py: no_top_level_function

Step 3 — ingest/fetch_headline.py (http mode). The classifier walks the function body, sees http.client.HTTPSConnection(...) plus .request("POST", ...), and classifies as Mode2::Http. city shells out to tower --mode http, which spawns a Python subprocess with a monkey-patched http.client to capture the outgoing request, calls $build_request on the LLM-generated BAML candidate, normalizes both, partitions at a sentinel, and asks Z3 for a counterexample. Z3 returns unsat; the BAML is written to /tmp/pipeline_baml/ingest/fetch_headline.baml.

[run] ingest/fetch_headline.py → http (fetch_headline)
  ✓ ingest/fetch_headline.py

Step 4 — agents/classify_topic.py (http-static mode, openai SDK). The classifier walks the body, sees <client>.chat.completions.create(...), emits Mode2::HttpStatic { sdk: "openai" }. city dispatches to tower --mode http-static. Tower's openai contract extracts the call's kwargs as a symbolic JSON tree, substitutes a sentinel for the text parameter, canonicalizes to a matching byte pattern, invokes BAML's $build_request with the same sentinel, and Z3 proves byte-identity without ever executing the Python module.

    (sdk detected: openai)
[run] agents/classify_topic.py → http-static (classify_topic)
  ✓ agents/classify_topic.py

Step 5 — agents/compose_brief.py (http-static, openai SDK, multi-param via f-string). The function takes two string parameters topic and text, interleaved inside a single user message via f"Topic: {topic}\nArticle text:\n{text}". The static verifier assigns one sentinel to each parameter (xSNTNL000PLACEHLDRx, xSNTNL001PLACEHLDRx), produces a BodyNode::StringTemplate for the f-string, canonicalizes to "Topic: xSNTNL000PLACEHLDRx\nArticle text:\nxSNTNL001PLACEHLDRx", and partitions the body into a slot sequence: [Fixed, Param("topic"), Fixed, Param("text"), Fixed]. BAML emits the same slot shape via Topic: {{ topic }}\nArticle text:\n{{ text }}. Z3 builds Concat(p0, topic_var, p1, text_var, p2) on each side, asserts inequality, returns unsat. Proven over String × String.

    (sdk detected: openai)
[run] agents/compose_brief.py → http-static (compose_brief)
  ✓ agents/compose_brief.py

Step 6 — agents/summarize.py (http-static, LiteLLM). Classifier sees litellm.completion(...), emits Mode2::HttpStatic { sdk: "litellm" }. tower's litellm contract reads the model="openai/gpt-4o-mini" kwarg, splits on /, and dispatches to the openai endpoint contract with the prefix stripped from the model name. The resulting request routes to api.openai.com/v1/chat/completions — matching what BAML emits for a provider openai client. Proven.

    (sdk detected: litellm)
[run] agents/summarize.py → http-static (summarize)
  ✓ agents/summarize.py

Step 7 — agents/extract_entities.py (http-static, anthropic). Classifier sees <client>.messages.create(...) and emits Mode2::HttpStatic { sdk: "anthropic" }. tower's anthropic contract knows the endpoint is api.anthropic.com/v1/messages, auth is via x-api-key + anthropic-version, and the system kwarg is top-level (not part of messages). BAML emits system as a structured-content array ([{"type":"text","text":"..."}]); normalization unwraps that shape to plain string before comparison. Proven.

    (sdk detected: anthropic)
[run] agents/extract_entities.py → http-static (extract_entities)
  ✓ agents/extract_entities.py

Step 8 — utils/tally_matches.py (loop mode). Classifier strips the leading docstring, sees a while loop followed by a return, and reads @tower-pre: n >= 0 and @tower-spec: n*(n+1)/2 from the docstring. Emits Mode2::Loop. tower's loop verifier discharges three Hoare triples: initialization (invariant holds after let s = 0; let i = 1;), preservation (one body iteration preserves the invariant), and post-condition (invariant + loop-exit implies s == n*(n+1)/2). All three return unsat on their negations. The BAML includes the // @invariant: comment the LLM produced.

[run] utils/tally_matches.py → loop (tally_matches)
  ✓ utils/tally_matches.py

Step 9 — utils/estimate_reading_time.py (arithmetic). Classifier sees a single return with only int literals, int parameters, and +/-/*. Emits Mode2::Arithmetic. tower parses both sides into Z3 integer expressions, asserts py_expr != baml_expr, returns unsat. Commutativity, associativity, and distributivity are all handled by Z3 automatically — tower doesn't care whether the BAML wrote paragraphs * words_per_paragraph or words_per_paragraph * paragraphs.

[run] utils/estimate_reading_time.py → arithmetic (estimate_reading_time)
  ✓ utils/estimate_reading_time.py

Step 10 — utils/build_share_url.py (string). Return expression is a chain of str parameters and string literals joined by +. Classifier emits Mode2::String. tower translates both sides into Z3 Concat trees. Z3's string theory treats Concat as associative, so Python's left-associated Concat(Concat(Concat("https://", host), "/article/"), slug)... is provably equal to BAML's flat Concat("https://", host, "/article/", slug, ...) without any canonicalization. unsat. Proven.

[run] utils/build_share_url.py → string (build_share_url)
  ✓ utils/build_share_url.py

Step 11 — main.py (skip). First top-level function is run_cli(text_path, slug, token), whose body contains file I/O, multiple statement types, and calls into the orchestrator. Doesn't match any mode's expected shape. Skipped with reason unrecognized_function_shape. The file is copied verbatim to /tmp/pipeline_baml/main.py.

[skip] main.py: unrecognized_function_shape

Step 12 — nontranspilable/pipeline_orchestrator.py (skip). Module defines @dataclass and class PipelineOrchestrator, but no top-level def. Classifier fails the "first top-level function" lookup. Verdict: no_top_level_function. Copied through.

[skip] nontranspilable/pipeline_orchestrator.py: no_top_level_function

Step 13 — nontranspilable/http_probe.py (skip). The function DOES use http.client.HTTPSConnection, but its .request(...) call's first argument is the literal "HEAD", not "POST". tower's http mode only handles POST-shaped LLM calls. Classifier's has_http_shape rejects and falls through to unrecognized_function_shape. Copied through.

[skip] nontranspilable/http_probe.py: unrecognized_function_shape

Step 14 — write the report. After the walk, city serializes its RunReport (summary counters + per-file FileReport entries) to /tmp/pipeline_baml/.city_report.json and prints the summary table. The whole run took about 40 seconds (eight LLM calls at ~4-5 seconds each dominate the wall time; Z3 is single-digit milliseconds per proof).

=== city summary ===
python files:   12
  arithmetic:   1 ok / 0 fail
  string:       1 ok / 0 fail
  loop:         1 ok / 0 fail
  http:         1 ok / 0 fail
  http-static:  4 ok / 0 fail
  skipped:      4
non-py copied:  1

Final destination tree:

/tmp/pipeline_baml/
├── .city_report.json
├── README.md                                     (copied)
├── config/
│   └── settings.py                               (copied, not a candidate)
├── ingest/
│   └── fetch_headline.baml                       ✓ verified (http)
├── agents/
│   ├── classify_topic.baml                       ✓ verified (http-static openai)
│   ├── compose_brief.baml                        ✓ verified (http-static openai, 2 params)
│   ├── extract_entities.baml                     ✓ verified (http-static anthropic)
│   └── summarize.baml                            ✓ verified (http-static litellm)
├── utils/
│   ├── build_share_url.baml                      ✓ verified (string)
│   ├── estimate_reading_time.baml                ✓ verified (arithmetic)
│   └── tally_matches.baml                        ✓ verified (loop)
├── nontranspilable/
│   ├── http_probe.py                             (copied, HEAD not POST)
│   └── pipeline_orchestrator.py                  (copied, class-only)
└── main.py                                       (copied, multi-statement CLI)

Everything under /tmp/pipeline_baml/*.baml is formally verified to be equivalent to the corresponding Python for every possible input (integer inputs for arithmetic/loop; string inputs for string; string inputs for http and http-static; n ≥ 0 pre-condition for the loop). The original Python tree at examples/content_pipeline/ was never modified — city is read-only over the source.

See tower_of_baml/docs/verification.md for how each proof actually works.

2. BAML Hyperframes — a fast hypermedia framework

Status: in progress. A port of Hygen Hyperframes re-tuned for throughput under BAML-powered backends. Expected to live at ./baml_hyperframes/ once the first milestone lands.

3. New BAML SDKs — Swift, first

Status: Swift SDK in progress. Upstream BAML ships Python, TypeScript, Ruby, and Go clients via baml-cli generate. This fork adds Swift as a first-class generator target, to be followed by other languages driven by community demand. The codegen crate will live alongside baml_codegen_python/ and friends inside the BAML workspace.

4. Live BEP & changelog viewer

Status: design phase. Upstream BAML uses beps/ for RFC-style BAML Enhancement Proposals and a root CHANGELOG.md for release notes. This fork ships a lightweight viewer that surfaces both streams in the developer's editor / browser with diffing, status filters, and cross- linking. Target: keep the authors and reviewers of language changes in closer contact with end users during proposal review.

Layout

baml-experimental/
├── baml/                     # upstream BAML language + compiler workspace (unchanged)
├── baml_language/            # upstream BAML language (unchanged)
├── baml-cli/                 # upstream CLI (unchanged)
├── beps/                     # upstream BAML Enhancement Proposals
├── CHANGELOG.md              # upstream changelog
├── ...                       # everything else from upstream
│
├── tower_of_baml/            # [NEW] Python→BAML transpiler + Z3 verifier
│   ├── README.md
│   ├── Cargo.toml
│   ├── baml_reference.md     # language reference embedded in LLM prompts
│   ├── docs/ARCHITECTURE.md  # pipeline, SMT methodology, extension points
│   ├── src/                  # Rust source
│   │   ├── main.rs
│   │   ├── prompts.rs
│   │   ├── baml_parse.rs
│   │   ├── python_parse.rs
│   │   └── verify/           # three SMT verifiers
│   ├── examples/             # Python sources + verified BAML outputs
│   └── tests/sample_codebase/ # fixture corpus exercised by city
│
├── city/                     # [NEW] codebase-scale runner over tower
│   ├── README.md
│   ├── Cargo.toml
│   └── src/                  # main.rs (CLI), classify.rs, runner.rs
│
├── baml_hyperframes/         # [NEW, in progress] hypermedia framework
├── codegen_swift/            # [NEW, in progress] Swift SDK generator
└── bep_viewer/               # [NEW, in progress] live BEP + changelog UI

Everything outside the [NEW] directories is upstream BAML; it should track github.com/BoundaryML/baml's canary branch and carry no local modifications. If you want to build or use core BAML, use the unchanged upstream directories exactly as the upstream README describes.

Relationship to upstream

The four features above are maintained here until one of them is ready to propose upstream as a BEP. This fork is not intended to diverge from BAML permanently — each experiment should graduate to the main project or be retired based on usage and feedback.

Contributions welcome; please open an issue describing which of the four tracks you want to work on.

License

Inherits the upstream BAML license. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2,985 Commits
.cursor/rules		.cursor/rules
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
baml-cli		baml-cli
baml		baml
baml_language		baml_language
beps		beps
ci		ci
city		city
docs		docs
engine		engine
examples/content_pipeline		examples/content_pipeline
fern		fern
integ-tests		integ-tests
jetbrains		jetbrains
jobs		jobs
languages		languages
patches		patches
release		release
scripts		scripts
tools		tools
tower_of_baml		tower_of_baml
typescript		typescript
typescript2		typescript2
.actrc		.actrc
.dockerignore		.dockerignore
.envrc		.envrc
.gitattributes		.gitattributes
.gitconfig		.gitconfig
.gitignore		.gitignore
.infisical.json		.infisical.json
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README-DEV.md		README-DEV.md
README.md		README.md
baml_language.code-workspace		baml_language.code-workspace
biome.json		biome.json
cliff.toml		cliff.toml
codecov.yml		codecov.yml
flake.lock		flake.lock
flake.nix		flake.nix
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
justfile		justfile
mise.toml		mise.toml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
root-wasm32.code-workspace		root-wasm32.code-workspace
root.code-workspace		root.code-workspace
rust-toolchain.toml		rust-toolchain.toml
test_git_date.txt		test_git_date.txt
thoughts		thoughts
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

baml-experimental

What this fork adds

1. Tower of BAML — nondeterministic verifiable transpilation

city in action

2. BAML Hyperframes — a fast hypermedia framework

3. New BAML SDKs — Swift, first

4. Live BEP & changelog viewer

Layout

Relationship to upstream

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

baml-experimental

What this fork adds

1. Tower of BAML — nondeterministic verifiable transpilation

city in action

2. BAML Hyperframes — a fast hypermedia framework

3. New BAML SDKs — Swift, first

4. Live BEP & changelog viewer

Layout

Relationship to upstream

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages