A fork of BAML — an open-source typed language for building reliable LLM workflows and agents. This repository tracks the upstream project and layers on four experimental capabilities.
Read about upstream BAML first: docs.boundaryml.com and the main repo at github.com/BoundaryML/baml.
Everything upstream still works exactly as documented — this fork is strictly additive. The experimental features live alongside the standard BAML workspace.
./tower_of_baml/ — a CLI that transpiles a Python function into BAML
and then uses Z3 to prove the two are equivalent for every possible
input. It generates candidate BAML with claude -p and rejects any
candidate that isn't formally verified by an SMT solver.
Four verification modes:
- arithmetic — integer functions, Z3 integer theory
- loop —
whileloops with LLM-generated loop invariants and Hoare-triple discharge - string — string-concatenation functions, Z3 string theory
- http — OpenAI-style chat-completion calls via
http.client, verified by rendering both sides with a sentinel and proving byte-identity of the outgoing HTTP request via Z3 string theory
Based on the LLMLift methodology (Bhatia et al., NeurIPS 2024,
arXiv:2406.03003). One verified
candidate usually lands on the first LLM attempt; the Z3 unsat result
is a proof of equivalence over all inputs, not a batch of passing
tests.
cd tower_of_baml && cargo build --release
./target/release/tower examples/add.py --mode arithmetic
./target/release/tower examples/sum_to.py --mode loop \
--pre "n >= 0" --spec "n*(n+1)/2"
./target/release/tower examples/build_url.py --mode string
./target/release/tower examples/classify.py --mode httpSee tower_of_baml/README.md and
tower_of_baml/docs/ARCHITECTURE.md for the full design.
Install pre-built binaries (macOS arm64/x86_64, Linux x86_64):
curl -sSf https://raw.githubusercontent.com/gambitinc/baml-experimental/canary/install.sh | bashThe installer detects your platform, downloads the matching release
tarball, drops tower and city into ~/.local/bin/, and checks for
the three runtime deps (z3, baml-cli, claude CLI). Maintainers
cutting a release: see docs/RELEASE.md.
./city/ is the codebase-scale companion: walks an entire Python tree,
auto-classifies each file into a tower mode (or skips it), runs tower
on every candidate, and mirrors the result into a sibling BAML tree —
the original source is untouched. Loop-mode files declare their
precondition and spec via @tower-pre: / @tower-spec: docstring
annotations; everything else is fully automatic.
cd city && cargo build --release
./target/release/city \
../tower_of_baml/tests/sample_codebase \
/tmp/sample_baml -vA small fixture codebase lives at
tower_of_baml/tests/sample_codebase/ covering arithmetic, string,
loop, and various non-transpilable patterns — useful for testing both
city and tower together.
A realistic demo lives at examples/content_pipeline/ — a twelve-file
research-assistant workflow (ingest/, agents/, utils/,
nontranspilable/, config/). It covers every tower mode and every
skip reason. The following is a literal play-by-play of running
./city/target/release/city \
examples/content_pipeline \
/tmp/pipeline_baml -vStep 1 — Walk the source tree. city's WalkDir enumerates every
file under content_pipeline/ depth-first. Directories are mirrored
into the destination as empty stubs; the README.md is copied
verbatim (not a .py file).
Step 2 — config/settings.py (no function). city's classifier
parses the source, finds no top-level def, emits
Mode2::Skip { reason: "no_top_level_function" }. Copied through to
/tmp/pipeline_baml/config/settings.py unchanged.
[skip] config/settings.py: no_top_level_function
Step 3 — ingest/fetch_headline.py (http mode). The classifier
walks the function body, sees http.client.HTTPSConnection(...) plus
.request("POST", ...), and classifies as Mode2::Http. city shells
out to tower --mode http, which spawns a Python subprocess with a
monkey-patched http.client to capture the outgoing request, calls
$build_request on the LLM-generated BAML candidate, normalizes both,
partitions at a sentinel, and asks Z3 for a counterexample. Z3 returns
unsat; the BAML is written to /tmp/pipeline_baml/ingest/fetch_headline.baml.
[run] ingest/fetch_headline.py → http (fetch_headline)
✓ ingest/fetch_headline.py
Step 4 — agents/classify_topic.py (http-static mode, openai
SDK). The classifier walks the body, sees <client>.chat.completions.create(...),
emits Mode2::HttpStatic { sdk: "openai" }. city dispatches to
tower --mode http-static. Tower's openai contract extracts the
call's kwargs as a symbolic JSON tree, substitutes a sentinel for the
text parameter, canonicalizes to a matching byte pattern, invokes
BAML's $build_request with the same sentinel, and Z3 proves
byte-identity without ever executing the Python module.
(sdk detected: openai)
[run] agents/classify_topic.py → http-static (classify_topic)
✓ agents/classify_topic.py
Step 5 — agents/compose_brief.py (http-static, openai SDK,
multi-param via f-string). The function takes two string
parameters topic and text, interleaved inside a single user
message via f"Topic: {topic}\nArticle text:\n{text}". The static
verifier assigns one sentinel to each parameter
(xSNTNL000PLACEHLDRx, xSNTNL001PLACEHLDRx), produces a
BodyNode::StringTemplate for the f-string, canonicalizes to
"Topic: xSNTNL000PLACEHLDRx\nArticle text:\nxSNTNL001PLACEHLDRx",
and partitions the body into a slot sequence:
[Fixed, Param("topic"), Fixed, Param("text"), Fixed]. BAML emits
the same slot shape via Topic: {{ topic }}\nArticle text:\n{{ text }}.
Z3 builds Concat(p0, topic_var, p1, text_var, p2) on each side,
asserts inequality, returns unsat. Proven over String × String.
(sdk detected: openai)
[run] agents/compose_brief.py → http-static (compose_brief)
✓ agents/compose_brief.py
Step 6 — agents/summarize.py (http-static, LiteLLM).
Classifier sees litellm.completion(...), emits
Mode2::HttpStatic { sdk: "litellm" }. tower's litellm contract
reads the model="openai/gpt-4o-mini" kwarg, splits on /, and
dispatches to the openai endpoint contract with the prefix stripped
from the model name. The resulting request routes to
api.openai.com/v1/chat/completions — matching what BAML emits for
a provider openai client. Proven.
(sdk detected: litellm)
[run] agents/summarize.py → http-static (summarize)
✓ agents/summarize.py
Step 7 — agents/extract_entities.py (http-static,
anthropic). Classifier sees <client>.messages.create(...) and
emits Mode2::HttpStatic { sdk: "anthropic" }. tower's anthropic
contract knows the endpoint is api.anthropic.com/v1/messages, auth
is via x-api-key + anthropic-version, and the system kwarg is
top-level (not part of messages). BAML emits system as a
structured-content array ([{"type":"text","text":"..."}]);
normalization unwraps that shape to plain string before comparison.
Proven.
(sdk detected: anthropic)
[run] agents/extract_entities.py → http-static (extract_entities)
✓ agents/extract_entities.py
Step 8 — utils/tally_matches.py (loop mode). Classifier
strips the leading docstring, sees a while loop followed by a
return, and reads @tower-pre: n >= 0 and @tower-spec: n*(n+1)/2
from the docstring. Emits Mode2::Loop. tower's loop verifier
discharges three Hoare triples: initialization (invariant holds after
let s = 0; let i = 1;), preservation (one body iteration preserves
the invariant), and post-condition (invariant + loop-exit implies
s == n*(n+1)/2). All three return unsat on their negations. The
BAML includes the // @invariant: comment the LLM produced.
[run] utils/tally_matches.py → loop (tally_matches)
✓ utils/tally_matches.py
Step 9 — utils/estimate_reading_time.py (arithmetic).
Classifier sees a single return with only int literals, int
parameters, and +/-/*. Emits Mode2::Arithmetic. tower parses
both sides into Z3 integer expressions, asserts
py_expr != baml_expr, returns unsat. Commutativity, associativity,
and distributivity are all handled by Z3 automatically — tower doesn't
care whether the BAML wrote paragraphs * words_per_paragraph or
words_per_paragraph * paragraphs.
[run] utils/estimate_reading_time.py → arithmetic (estimate_reading_time)
✓ utils/estimate_reading_time.py
Step 10 — utils/build_share_url.py (string). Return
expression is a chain of str parameters and string literals joined
by +. Classifier emits Mode2::String. tower translates both sides
into Z3 Concat trees. Z3's string theory treats Concat as
associative, so Python's left-associated
Concat(Concat(Concat("https://", host), "/article/"), slug)... is
provably equal to BAML's flat Concat("https://", host, "/article/", slug, ...) without any canonicalization. unsat. Proven.
[run] utils/build_share_url.py → string (build_share_url)
✓ utils/build_share_url.py
Step 11 — main.py (skip). First top-level function is
run_cli(text_path, slug, token), whose body contains file I/O,
multiple statement types, and calls into the orchestrator. Doesn't
match any mode's expected shape. Skipped with reason
unrecognized_function_shape. The file is copied verbatim to
/tmp/pipeline_baml/main.py.
[skip] main.py: unrecognized_function_shape
Step 12 — nontranspilable/pipeline_orchestrator.py (skip).
Module defines @dataclass and class PipelineOrchestrator, but no
top-level def. Classifier fails the "first top-level function"
lookup. Verdict: no_top_level_function. Copied through.
[skip] nontranspilable/pipeline_orchestrator.py: no_top_level_function
Step 13 — nontranspilable/http_probe.py (skip). The function
DOES use http.client.HTTPSConnection, but its .request(...) call's
first argument is the literal "HEAD", not "POST". tower's http
mode only handles POST-shaped LLM calls. Classifier's
has_http_shape rejects and falls through to
unrecognized_function_shape. Copied through.
[skip] nontranspilable/http_probe.py: unrecognized_function_shape
Step 14 — write the report. After the walk, city serializes its
RunReport (summary counters + per-file FileReport entries) to
/tmp/pipeline_baml/.city_report.json and prints the summary table.
The whole run took about 40 seconds (eight LLM calls at ~4-5 seconds
each dominate the wall time; Z3 is single-digit milliseconds per
proof).
=== city summary ===
python files: 12
arithmetic: 1 ok / 0 fail
string: 1 ok / 0 fail
loop: 1 ok / 0 fail
http: 1 ok / 0 fail
http-static: 4 ok / 0 fail
skipped: 4
non-py copied: 1
Final destination tree:
/tmp/pipeline_baml/
├── .city_report.json
├── README.md (copied)
├── config/
│ └── settings.py (copied, not a candidate)
├── ingest/
│ └── fetch_headline.baml ✓ verified (http)
├── agents/
│ ├── classify_topic.baml ✓ verified (http-static openai)
│ ├── compose_brief.baml ✓ verified (http-static openai, 2 params)
│ ├── extract_entities.baml ✓ verified (http-static anthropic)
│ └── summarize.baml ✓ verified (http-static litellm)
├── utils/
│ ├── build_share_url.baml ✓ verified (string)
│ ├── estimate_reading_time.baml ✓ verified (arithmetic)
│ └── tally_matches.baml ✓ verified (loop)
├── nontranspilable/
│ ├── http_probe.py (copied, HEAD not POST)
│ └── pipeline_orchestrator.py (copied, class-only)
└── main.py (copied, multi-statement CLI)
Everything under /tmp/pipeline_baml/*.baml is formally verified to
be equivalent to the corresponding Python for every possible input
(integer inputs for arithmetic/loop; string inputs for string; string
inputs for http and http-static; n ≥ 0 pre-condition for the loop).
The original Python tree at examples/content_pipeline/ was never
modified — city is read-only over the source.
See tower_of_baml/docs/verification.md for how each proof actually
works.
Status: in progress. A port of
Hygen Hyperframes re-tuned for
throughput under BAML-powered backends. Expected to live at
./baml_hyperframes/ once the first milestone lands.
Status: Swift SDK in progress. Upstream BAML ships Python,
TypeScript, Ruby, and Go clients via baml-cli generate. This fork
adds Swift as a first-class generator target, to be followed by other
languages driven by community demand. The codegen crate will live
alongside baml_codegen_python/ and friends inside the BAML workspace.
Status: design phase. Upstream BAML uses beps/ for RFC-style BAML
Enhancement Proposals and a root CHANGELOG.md for release notes. This
fork ships a lightweight viewer that surfaces both streams in the
developer's editor / browser with diffing, status filters, and cross-
linking. Target: keep the authors and reviewers of language changes in
closer contact with end users during proposal review.
baml-experimental/
├── baml/ # upstream BAML language + compiler workspace (unchanged)
├── baml_language/ # upstream BAML language (unchanged)
├── baml-cli/ # upstream CLI (unchanged)
├── beps/ # upstream BAML Enhancement Proposals
├── CHANGELOG.md # upstream changelog
├── ... # everything else from upstream
│
├── tower_of_baml/ # [NEW] Python→BAML transpiler + Z3 verifier
│ ├── README.md
│ ├── Cargo.toml
│ ├── baml_reference.md # language reference embedded in LLM prompts
│ ├── docs/ARCHITECTURE.md # pipeline, SMT methodology, extension points
│ ├── src/ # Rust source
│ │ ├── main.rs
│ │ ├── prompts.rs
│ │ ├── baml_parse.rs
│ │ ├── python_parse.rs
│ │ └── verify/ # three SMT verifiers
│ ├── examples/ # Python sources + verified BAML outputs
│ └── tests/sample_codebase/ # fixture corpus exercised by city
│
├── city/ # [NEW] codebase-scale runner over tower
│ ├── README.md
│ ├── Cargo.toml
│ └── src/ # main.rs (CLI), classify.rs, runner.rs
│
├── baml_hyperframes/ # [NEW, in progress] hypermedia framework
├── codegen_swift/ # [NEW, in progress] Swift SDK generator
└── bep_viewer/ # [NEW, in progress] live BEP + changelog UI
Everything outside the [NEW] directories is upstream BAML; it should
track github.com/BoundaryML/baml's canary branch and carry no local
modifications. If you want to build or use core BAML, use the unchanged
upstream directories exactly as the upstream README
describes.
The four features above are maintained here until one of them is ready to propose upstream as a BEP. This fork is not intended to diverge from BAML permanently — each experiment should graduate to the main project or be retired based on usage and feedback.
Contributions welcome; please open an issue describing which of the four tracks you want to work on.
Inherits the upstream BAML license. See LICENSE.