Skip to content

Conversation

@oflatt
Copy link
Member

@oflatt oflatt commented Dec 28, 2025

This PR

  • Changes the TermDag API to use TermIds, simplifying lots of code.
  • Adds a new proof mode that can be enabled with the --proofs flag.
  • Instruments egglog's term encoding with proofs that can be obtained with the prove command.
  • Adds a proof format to the public API and a checker for the format.
  • Adds primitive validators and a new macro for easily making checkable primitives.
  • Adds detection for egglog programs proofs currently do not support (containers, ect)
  • Adds a notion of "proof normal form" with slight restrictions on the egglog query format.

In the future we'll want

  • A way to bind an extraction to a variable so we can use then prove to prove equivalence.
  • An optimization that makes it possible to run math_microbenchmark
  • Expanding the scope of benchmarks we support

@oflatt oflatt changed the title Proofs implemented in egglog itself Proof encoding in egglog itself Dec 28, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 28, 2025

CodSpeed Performance Report

Merging this PR will degrade performance by 9.91%

Comparing oflatt-proofs-encoding-2 (80031f9) with main (f028092)1

Summary

❌ 5 regressed benchmarks
✅ 17 untouched benchmarks
⏩ 190 skipped benchmarks2

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime tests[stresstest_large_expr] 1.7 s 1.9 s -7.73%
WallTime tests[herbie-tutorial] 10.7 ms 11.3 ms -5.38%
WallTime tests[extract-vec-bench] 79.1 ms 85.6 ms -7.6%
Simulation tests[stresstest_large_expr] 1.6 s 1.7 s -9.91%
Simulation tests[extract-vec-bench] 76 ms 84.2 ms -9.77%

Footnotes

  1. No successful run was found on main (57fdb7d) during the generation of this report, so f028092 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 190 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@oflatt oflatt force-pushed the oflatt-proofs-encoding-2 branch from fa46459 to e484b95 Compare January 14, 2026 17:39
@oflatt oflatt marked this pull request as ready for review January 14, 2026 17:46
@oflatt oflatt requested a review from a team as a code owner January 14, 2026 17:46
@oflatt oflatt requested review from FTRobbin and Copilot and removed request for a team January 14, 2026 17:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces proof generation capabilities to egglog by instrumenting the term encoding with proof tracking. The changes enable proofs to be generated and verified for equality reasoning in egglog programs.

Changes:

  • Refactors TermDag API to use TermIds instead of Terms for cleaner ownership semantics
  • Adds comprehensive proof infrastructure including encoding, format, checker, and extraction
  • Implements primitive validators enabling automatic proof generation for built-in operations

Reviewed changes

Copilot reviewed 64 out of 64 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/lib.rs Adds proof mode support, ProveExists command, and integrates proof pipeline
src/typechecking.rs Adds PrimitiveValidator type and support for primitives with validators
src/termdag.rs Refactors API from Term to TermId with new pretty-printing support
src/proofs/* New proof infrastructure including format, checker, encoding, extraction
src/sort/*.rs Updates all sort implementations to use TermId API
src/prelude.rs Adds LiteralConvertible trait for validator generation
tests/* Adds validator tests and proof test cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 26 to 28
const MAX_PRETTY_LINE_WIDTH: usize = 80;
const PRETTY_INDENT_STEP: usize = 2;
const MIN_SHARED_TERM_SIZE: usize = 4;
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants control pretty-printing behavior but lack documentation explaining their purpose and rationale. Consider adding comments explaining why these specific values were chosen.

Copilot uses AI. Check for mistakes.
@oflatt oflatt requested review from ezrosent and removed request for FTRobbin January 14, 2026 19:53
@oflatt oflatt force-pushed the oflatt-proofs-encoding-2 branch 2 times, most recently from 13549f2 to fe0fc44 Compare January 18, 2026 16:08
/// add_literal_prim!(eg, "<" = |a: i64, b: i64| -> bool { a < b });
/// ```
#[proc_macro]
pub fn add_literal_prim(input: TokenStream) -> TokenStream {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm bad a writing macros, this one is AI generated so take care.
The rest of the code in this PR is mostly hand written

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I am curious what a validator really looks like... kinda hard to review without that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See validator_test for an example (:

Copy link
Contributor

@ezrosent ezrosent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments.

/// add_literal_prim!(eg, "<" = |a: i64, b: i64| -> bool { a < b });
/// ```
#[proc_macro]
pub fn add_literal_prim(input: TokenStream) -> TokenStream {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I am curious what a validator really looks like... kinda hard to review without that.


(run 100)
(check (baz 50)) No newline at end of file
;; TODO now that relations are desugared, this trick also doesn't work:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue? (Could be a good topic for a regular meeting if this hasn't already come up?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good topic for regular meeting. It doesn't work because the relation has a fresh sort, so you can't put it in the merge function position.

@oflatt
Copy link
Member Author

oflatt commented Jan 28, 2026

Looks like the performance regressions on stresstest_large_expr are from my changes to SymbolGen. I propose we just allow the performance regression because the generated names are nicer.

fix up symbol gen

better names

fixing things

fix header

small fix

working on justification

nits

doing a lot of refactoring

small tweaks

snapshots

getting rid of justification, just proof

working on encoding

small example running again

renames

term tests passing again

working on relation

small fixes

small docs

proof normal form

untested rule proof instrumentation

proofs failing eqsat basic

making progress

proofs passing some tests

small bug fix

fix bug with to ast

fix another bug

fix term encoding again

tests pass

fmt

working on prove query

working prove command

refactor to proofs folders

working on format

working on proof snapshots

first proof snapshot

working on bug with proof extraction

found bug in proof productioN

fixed a bug

fix up detection for supporting proofs

fix supported proofs

subsume fix

working on better printing

working on printing again, spotted substitution bug

wrote unification code, getting error

remove debugging

unify seems to be working

more proof simplification

proof simplifier commit

push symmetry down

working on !=

proofs supported list

validator test

working on validators

validators

better validators

use string

new tests

working on making checker more accurate

fix checking for functions

refactor validators

run validators

better proof checker

workingon checker

simplification breaking something

refactor

working on checker

refactor

use prim validators

small fix

add todo

working on bugs

most proofs being checked

big termid refactor

error in add prim

more refactor

merge fn proof fix

check merge proofs

working on fixing bugs

all tests passing

fix simplification

refactor simplifications

name hints

nits

more docs

docs improvement

snap

working on encoding

working on docs

better docs

more docs

more cleanup

fix up

Update src/proofs/proof_checker.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/proofs/proof_checker.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

small nits
@oflatt oflatt force-pushed the oflatt-proofs-encoding-2 branch from 633d98d to 19873c4 Compare January 28, 2026 18:03
expression: snapshot_content
---
(
(Config (Cons 1 (Cons 2 (Cons 3 (e)))) (e) (e) 0) -> (Config (Cons 1 (Cons 2 (Cons 3 (e)))) (e) (e) 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #797

Now that relations are constructors, we get this

@oflatt oflatt force-pushed the oflatt-proofs-encoding-2 branch 4 times, most recently from 70db85d to 85acdc3 Compare January 28, 2026 20:27
@oflatt oflatt force-pushed the oflatt-proofs-encoding-2 branch from 85acdc3 to 80031f9 Compare January 28, 2026 20:56
Copy link
Contributor

@ezrosent ezrosent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also see if you can track down the codspeed slowdowns... but they're small enough that they could be within the noise threshold.

a term table that stores the actual terms, and a view table storing representative terms along with their e-class (stored as the leader term).
The term encoding enables proof tracking, done at the
same time in this file.
The encoding keeps the operational semantics equivalent to the standard encoding (for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OS + some observational equivalence relation would be fine here too... but then I think we should say something informal here that conveys that given that we don't have an "official" operational semantics for egglog. How about:

The encoding does not change any user-observable behavior of an egglog program, except for commands that let you 'pop the hood' like print-function.

"python_array_optimize",
"stresstest_large_expr",
];
// in parallel mode always skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, I'd recommend exposing a function from core-relations that checked rayon::current_num_threads and such. We can still run things in parallel when it's a release build. And if we ever changed the debug_assertions bit we won't have to hunt down every file that asked about it (most of the heuristics for parallelism are in one file I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants