Proof encoding in egglog itself #770

oflatt · 2025-12-28T00:22:01Z

This PR

Changes the TermDag API to use TermIds, simplifying lots of code.
Adds a new proof mode that can be enabled with the --proofs flag.
Instruments egglog's term encoding with proofs that can be obtained with the prove command.
Adds a proof format to the public API and a checker for the format.
Adds primitive validators and a new macro for easily making checkable primitives.
Adds detection for egglog programs proofs currently do not support (containers, ect)
Adds a notion of "proof normal form" with slight restrictions on the egglog query format.

In the future we'll want

A way to bind an extraction to a variable so we can use then prove to prove equivalence.
An optimization that makes it possible to run math_microbenchmark
Expanding the scope of benchmarks we support

codspeed-hq · 2025-12-28T00:30:01Z

CodSpeed Performance Report

Merging this PR will degrade performance by 9.91%

_{Comparing oflatt-proofs-encoding-2 (80031f9) with main (f028092)¹}

Summary

❌ 5 regressed benchmarks
✅ 17 untouched benchmarks
⏩ 190 skipped benchmarks²

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	WallTime	`tests[stresstest_large_expr]`	1.7 s	1.9 s	-7.73%
❌	WallTime	`tests[herbie-tutorial]`	10.7 ms	11.3 ms	-5.38%
❌	WallTime	`tests[extract-vec-bench]`	79.1 ms	85.6 ms	-7.6%
❌	Simulation	`tests[stresstest_large_expr]`	1.6 s	1.7 s	-9.91%
❌	Simulation	`tests[extract-vec-bench]`	76 ms	84.2 ms	-9.77%

No successful run was found on main (57fdb7d) during the generation of this report, so f028092 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
190 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Copilot

Pull request overview

This PR introduces proof generation capabilities to egglog by instrumenting the term encoding with proof tracking. The changes enable proofs to be generated and verified for equality reasoning in egglog programs.

Changes:

Refactors TermDag API to use TermIds instead of Terms for cleaner ownership semantics
Adds comprehensive proof infrastructure including encoding, format, checker, and extraction
Implements primitive validators enabling automatic proof generation for built-in operations

Reviewed changes

Copilot reviewed 64 out of 64 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/lib.rs	Adds proof mode support, ProveExists command, and integrates proof pipeline
src/typechecking.rs	Adds PrimitiveValidator type and support for primitives with validators
src/termdag.rs	Refactors API from Term to TermId with new pretty-printing support
src/proofs/*	New proof infrastructure including format, checker, encoding, extraction
src/sort/*.rs	Updates all sort implementations to use TermId API
src/prelude.rs	Adds LiteralConvertible trait for validator generation
tests/*	Adds validator tests and proof test cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/proofs/proof_checker.rs

Copilot · 2026-01-14T17:47:59Z

src/termdag.rs

+const MAX_PRETTY_LINE_WIDTH: usize = 80;
+const PRETTY_INDENT_STEP: usize = 2;
+const MIN_SHARED_TERM_SIZE: usize = 4;


These constants control pretty-printing behavior but lack documentation explaining their purpose and rationale. Consider adding comments explaining why these specific values were chosen.

src/proofs/proof_checker.rs

oflatt · 2026-01-20T22:44:23Z

src/sort/add_primitive/src/lib.rs

+/// add_literal_prim!(eg, "<" = |a: i64, b: i64| -> bool { a < b });
+/// ```
+#[proc_macro]
+pub fn add_literal_prim(input: TokenStream) -> TokenStream {


I'm bad a writing macros, this one is AI generated so take care.
The rest of the code in this PR is mostly hand written

I guess I am curious what a validator really looks like... kinda hard to review without that.

See validator_test for an example (:

src/proofs/proof_encoding.md

src/proofs/proof_encoding_helpers.rs

src/proofs/proof_normal_form.rs

tests/subsume-relation.egg

src/proofs/proof_checker.rs

ezrosent

First round of comments.

egglog-bridge/src/tests.rs

src/ast/desugar.rs

src/ast/mod.rs

src/proofs/proof_checker.rs

src/proofs/proof_normal_form.rs

src/proofs/proof_simplification.rs

src/sort/add_primitive/src/lib.rs

ezrosent · 2026-01-28T06:52:13Z

src/sort/add_primitive/src/lib.rs

+/// add_literal_prim!(eg, "<" = |a: i64, b: i64| -> bool { a < b });
+/// ```
+#[proc_macro]
+pub fn add_literal_prim(input: TokenStream) -> TokenStream {


I guess I am curious what a validator really looks like... kinda hard to review without that.

src/typechecking.rs

ezrosent · 2026-01-28T07:00:14Z

tests/merge-saturates.egg

-
-(run 100)
-(check (baz 50))
+;; TODO now that relations are desugared, this trick also doesn't work:


What's the issue? (Could be a good topic for a regular meeting if this hasn't already come up?)

Yeah, good topic for regular meeting. It doesn't work because the relation has a fresh sort, so you can't put it in the merge function position.

oflatt · 2026-01-28T17:57:43Z

Looks like the performance regressions on stresstest_large_expr are from my changes to SymbolGen. I propose we just allow the performance regression because the generated names are nicer.

fix up symbol gen better names fixing things fix header small fix working on justification nits doing a lot of refactoring small tweaks snapshots getting rid of justification, just proof working on encoding small example running again renames term tests passing again working on relation small fixes small docs proof normal form untested rule proof instrumentation proofs failing eqsat basic making progress proofs passing some tests small bug fix fix bug with to ast fix another bug fix term encoding again tests pass fmt working on prove query working prove command refactor to proofs folders working on format working on proof snapshots first proof snapshot working on bug with proof extraction found bug in proof productioN fixed a bug fix up detection for supporting proofs fix supported proofs subsume fix working on better printing working on printing again, spotted substitution bug wrote unification code, getting error remove debugging unify seems to be working more proof simplification proof simplifier commit push symmetry down working on != proofs supported list validator test working on validators validators better validators use string new tests working on making checker more accurate fix checking for functions refactor validators run validators better proof checker workingon checker simplification breaking something refactor working on checker refactor use prim validators small fix add todo working on bugs most proofs being checked big termid refactor error in add prim more refactor merge fn proof fix check merge proofs working on fixing bugs all tests passing fix simplification refactor simplifications name hints nits more docs docs improvement snap working on encoding working on docs better docs more docs more cleanup fix up Update src/proofs/proof_checker.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Update src/proofs/proof_checker.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> small nits

oflatt · 2026-01-28T19:27:23Z

tests/snapshots/files__towers_of_hanoi_term_encoding.snap

+expression: snapshot_content
+---
+(
+   (Config (Cons 1 (Cons 2 (Cons 3 (e)))) (e) (e) 0) -> (Config (Cons 1 (Cons 2 (Cons 3 (e)))) (e) (e) 0)


See #797

Now that relations are constructors, we get this

ezrosent

I'd also see if you can track down the codspeed slowdowns... but they're small enough that they could be within the noise threshold.

ezrosent · 2026-02-01T19:15:02Z

src/proofs/proof_encoding.md

+  a term table that stores the actual terms, and a view table storing representative terms along with their e-class (stored as the leader term).
+The term encoding enables proof tracking, done at the
+  same time in this file.
+The encoding keeps the operational semantics equivalent to the standard encoding (for the


I think OS + some observational equivalence relation would be fine here too... but then I think we should say something informal here that conveys that given that we don't have an "official" operational semantics for egglog. How about:

The encoding does not change any user-observable behavior of an egglog program, except for commands that let you 'pop the hood' like print-function.

src/proofs/proof_extraction.rs

src/proofs/proof_format.rs

src/sort/add_primitive/src/lib.rs

ezrosent · 2026-02-01T19:25:05Z

tests/files.rs

-            "python_array_optimize",
-            "stresstest_large_expr",
-        ];
+        // in parallel mode always skip


Instead, I'd recommend exposing a function from core-relations that checked rayon::current_num_threads and such. We can still run things in parallel when it's a release build. And if we ever changed the debug_assertions bit we won't have to hunt down every file that asked about it (most of the heuristics for parallelism are in one file I think).

oflatt changed the title ~~Proofs implemented in egglog itself~~ Proof encoding in egglog itself Dec 28, 2025

oflatt force-pushed the oflatt-proofs-encoding-2 branch from fa46459 to e484b95 Compare January 14, 2026 17:39

oflatt marked this pull request as ready for review January 14, 2026 17:46

oflatt requested a review from a team as a code owner January 14, 2026 17:46

oflatt requested review from FTRobbin and Copilot and removed request for a team January 14, 2026 17:46

Copilot started reviewing on behalf of oflatt January 14, 2026 17:46 View session

Copilot AI reviewed Jan 14, 2026

View reviewed changes

oflatt requested review from ezrosent and removed request for FTRobbin January 14, 2026 19:53

oflatt force-pushed the oflatt-proofs-encoding-2 branch 2 times, most recently from 13549f2 to fe0fc44 Compare January 18, 2026 16:08