Skip to content

Conversation

@saulshanabrook
Copy link
Member

@saulshanabrook saulshanabrook commented Nov 13, 2025

WIP PR to close #728. This adds JSON schema support for commands. It also adds support for serializing to/from JSON for them at runtime.

This would support the Python bindings automatically generating type safe interfaces that stay in sync with the Rust codebase. It could also support generating interfaces in other languages more easily.

For example, with this code the following Python can be generated automatically:

Generated Python Model

# generated by datamodel-codegen:
#   filename:  schema.json
#   timestamp: 2025-11-13T00:24:08+00:00

from __future__ import annotations

from enum import Enum
from typing import Annotated, List, Optional, Tuple, Union

from msgspec import Meta, Struct, field


class Panic(Struct, tag=True):
    pass


class SrcFile(Struct):
    contents: str
    name: Optional[str] = None


class RustSpan(Struct):
    file: str
    line: Annotated[int, Meta(ge=0)]
    column: Annotated[int, Meta(ge=0)]


class Schema(Struct):
    input: List[str]
    output: str


class PrintFunctionMode(Enum):
    """
    The mode of printing a function. The default mode prints the function in a user-friendly way and
    has an unreliable interface.
    The CSV mode prints the function in the CSV format.
    """

    Default = 'Default'
    CSV = 'CSV'


class Push(Struct, tag=True):
    """
    `push` the current egraph `n` times so that it is saved.
    Later, the current database and rules can be restored using `pop`.
    """

    n: Annotated[int, Meta(ge=0)]


class EgglogSpan(Struct):
    file: SrcFile
    i: Annotated[int, Meta(ge=0)]
    j: Annotated[int, Meta(ge=0)]


class Rust(Struct, tag=True):
    field1: RustSpan


class Egglog(Struct, tag=True):
    field1: EgglogSpan


Literal = Optional[Union[int, float, str, bool]]


Change2 = Annotated[None, Meta(description='Change a function entry.')]


Span = Union[Panic, Egglog, Rust]


class Var(Struct, tag=True):
    span: Span
    name: str


class Lit(Struct, tag=True):
    field1: Span
    field2: Literal


class Variant(Struct):
    span: Span
    name: str
    types: List[str]
    unextractable: bool
    cost: Optional[Annotated[int, Meta(ge=0)]] = None


class Variants(Struct, tag=True):
    field1: List[Variant]


class Constructor(Struct, tag=True):
    """
    The `constructor` command defines a new constructor for a user-defined datatype
    Example:
    ```text
    (constructor Add (i64 i64) Math)
    ```
    """

    span: Span
    name: str
    schema_: Schema = field(name='schema')
    unextractable: bool
    cost: Optional[Annotated[int, Meta(ge=0)]] = None


class Relation(Struct, tag=True):
    """
    The `relation` command declares a named relation
    Example:
    ```text
    (relation path (i64 i64))
    (relation edge (i64 i64))
    ```
    """

    span: Span
    name: str
    inputs: List[str]


class AddRuleset(Struct, tag=True):
    """
    Using the `ruleset` command, defines a new
    ruleset that can be added to in [`Command::Rule`]s.
    Rulesets are used to group rules together
    so that they can be run together in a [`Schedule`].

    Example:
    Ruleset allows users to define a ruleset- a set of rules

    ```text
    (ruleset myrules)
    (rule ((edge x y))
          ((path x y))
          :ruleset myrules)
    (run myrules 2)
    ```
    """

    field1: Span
    field2: str


class UnstableCombinedRuleset(Struct, tag=True):
    """
    Using the `combined-ruleset` command, construct another ruleset
    which runs all the rules in the given rulesets.
    This is useful for running multiple rulesets together.
    The combined ruleset also inherits any rules added to the individual rulesets
    after the combined ruleset is declared.

    Example:
    ```text
    (ruleset myrules1)
    (rule ((edge x y))
          ((path x y))
         :ruleset myrules1)
    (ruleset myrules2)
    (rule ((path x y) (edge y z))
          ((path x z))
          :ruleset myrules2)
    (combined-ruleset myrules-combined myrules1 myrules2)
    ```
    """

    field1: Span
    field2: str
    field3: List[str]


class Panic2(Struct, tag='Panic'):
    field1: Span
    field2: str


class PrintOverallStatistics(Struct, tag=True):
    """
    Print runtime statistics about rules
    and rulesets so far.
    """

    field1: Span
    field2: Optional[str] = None


class PrintFunction(Struct, tag=True):
    """
    Print out rows of a given function, extracting each of the elements of the function.
    Example:

    ```text
    (print-function Add 20)
    ```
    prints the first 20 rows of the `Add` function.

    ```text
    (print-function Add)
    ```
    prints all rows of the `Add` function.

    ```text
    (print-function Add :file "add.csv")
    ```
    prints all rows of the `Add` function to a CSV file.
    """

    field1: Span
    field2: str
    field5: PrintFunctionMode
    field3: Optional[Annotated[int, Meta(ge=0)]] = None
    field4: Optional[str] = None


class PrintSize(Struct, tag=True):
    """
    Print out the number of rows in a function or all functions.
    """

    field1: Span
    field2: Optional[str] = None


class Input(Struct, tag=True):
    """
    Input a CSV file directly into a function.
    """

    span: Span
    name: str
    file: str


class Pop(Struct, tag=True):
    """
    `pop` the current egraph, restoring the previous one.
    The argument specifies how many egraphs to pop.
    """

    field1: Span
    field2: Annotated[int, Meta(ge=0)]


class Include(Struct, tag=True):
    """
    Include another egglog file directly as text and run it.
    """

    field1: Span
    field2: str


class Datatype(Struct, tag=True):
    """
    Egglog supports three types of functions

    A constructor models an egg-style user-defined datatype
    It can only be defined through the `datatype`/`datatype*` command
    or the `constructor` command

    A relation models a datalog-style mathematical relation
    It can only be defined through the `relation` command

    A custom function is a dictionary
    It can only be defined through the `function` command

    The `datatype` command declares a user-defined datatype.
    Datatypes can be unioned with [`Action::Union`] either
    at the top level or in the actions of a rule.
    This makes them equal in the implicit, global equality relation.

    Example:
    ```text
    (datatype Math
      (Num i64)
      (Var String)
      (Add Math Math)
      (Mul Math Math))
    ```

    defines a simple `Math` datatype with variants for numbers, named variables, addition and multiplication.

    Datatypes desugar directly to a [`Command::Sort`] and a [`Command::Constructor`] for each constructor.
    The code above becomes:
    ```text
    (sort Math)
    (constructor Num (i64) Math)
    (constructor Var (String) Math)
    (constructor Add (Math Math) Math)
    (constructor Mul (Math Math) Math)

    Datatypes are also known as algebraic data types, tagged unions and sum types.
    """

    span: Span
    name: str
    variants: List[Variant]


class Sort(Struct, tag=True):
    """
    Create a new user-defined sort, which can then
    be used in new [`Command::Function`] declarations.
    The [`Command::Datatype`] command desugars directly to this command, with one [`Command::Function`]
    per constructor.
    The main use of this command (as opposed to using [`Command::Datatype`]) is for forward-declaring a sort for mutually-recursive datatypes.

    It can also be used to create
    a container sort.
    For example, here's how to make a sort for vectors
    of some user-defined sort `Math`:
    ```text
    (sort MathVec (Vec Math))
    ```

    Now `MathVec` can be used as an input or output sort.
    """

    field1: Span
    field2: str
    field3: Optional[Tuple[str, List[GenericExpr]]] = None


class Call(Struct, tag=True):
    field1: Span
    field2: str
    field3: List[GenericExpr]


class Datatypes(Struct, tag=True):
    span: Span
    datatypes: List[Datatype1]


class NewSort(Struct, tag=True):
    field1: str
    field2: List[GenericExpr]


class Function(Struct, tag=True):
    """
    The `function` command declare an egglog custom function, which is a database table with a
    a functional dependency (also called a primary key) on its inputs to one output.

    ```text
    (function <name:Ident> <schema:Schema> <cost:Cost>
           (:on_merge <List<Action>>)?
           (:merge <Expr>)?)
    ```
    A function can have a `cost` for extraction.

    Finally, it can have a `merge` and `on_merge`, which are triggered when
    the function dependency is violated.
    In this case, the merge expression determines which of the two outputs
    for the same input is used.
    The `on_merge` actions are run after the merge expression is evaluated.

    Note that the `:merge` expression must be monotonic
    for the behavior of the egglog program to be consistent and defined.
    In other words, the merge function must define a lattice on the output of the function.
    If values are merged in different orders, they should still result in the same output.
    If the merge expression is not monotonic, the behavior can vary as
    actions may be applied more than once with different results.

    ```text
    (function LowerBound (Math) i64 :merge (max old new))
    ```

    Specifically, a custom function can also have an EqSort output type:

    ```text
    (function Add (i64 i64) Math)
    ```

    All functions can be `set`
    with [`Action::Set`].

    Output of a function, if being the EqSort type, can be unioned with [`Action::Union`]
    with another datatype of the same `sort`.
    """

    span: Span
    name: str
    schema_: Schema = field(name='schema')
    merge: Optional[GenericExpr] = None


class Rule(Struct, tag=True):
    """
    ```text
    (rule <body:List<Fact>> <head:List<Action>>)
    ```

    defines an egglog rule.
    The rule matches a list of facts with respect to
    the global database, and runs the list of actions
    for each match.
    The matches are done *modulo equality*, meaning
    equal datatypes in the database are considered
    equal.

    Example:
    ```text
    (rule ((edge x y))
          ((path x y)))

    (rule ((path x y) (edge y z))
          ((path x z)))
    ```
    """

    field1: GenericRule


class GenericRule(Struct):
    span: Span
    head: GenericActions
    body: List[GenericFact]
    name: Annotated[
        str, Meta(description='A globally unique name for this rule in the EGraph.')
    ]
    """
    A globally unique name for this rule in the EGraph.
    """
    ruleset: Annotated[
        str, Meta(description='The ruleset this rule belongs to. Defaults to `""`.')
    ]
    """
    The ruleset this rule belongs to. Defaults to `""`.
    """


class Let(Struct, tag=True):
    """
    Bind a variable to a particular datatype or primitive.
    At the top level (in a [`Command::Action`]), this defines a global variable.
    In a [`Command::Rule`], this defines a local variable in the actions.
    """

    field1: Span
    field2: str
    field3: GenericExpr


class Set(Struct, tag=True):
    """
    `set` a function to a particular result.
    `set` should not be used on datatypes-
    instead, use `union`.
    """

    field1: Span
    field2: str
    field3: List[GenericExpr]
    field4: GenericExpr


class Change(Struct, tag=True):
    """
    Delete or subsume (mark as hidden from future rewrites and unextractable) an entry from a function.
    """

    field1: Span
    field2: Change2
    field3: str
    field4: List[GenericExpr]


class UnionModel(Struct, tag=True):
    """
    `union` two datatypes, making them equal
    in the implicit, global equality relation
    of egglog.
    All rules match modulo this equality relation.

    Example:
    ```text
    (datatype Math (Num i64))
    (union (Num 1) (Num 2)); Define that Num 1 and Num 2 are equivalent
    (extract (Num 1)); Extracts Num 1
    (extract (Num 2)); Extracts Num 1
    ```
    """

    field1: Span
    field2: GenericExpr
    field3: GenericExpr


class Expr(Struct, tag=True):
    field1: Span
    field2: GenericExpr


class Eq(Struct, tag=True):
    field1: Span
    field2: GenericExpr
    field3: GenericExpr


class Fact(Struct, tag=True):
    field1: GenericExpr


class Rewrite(Struct, tag=True):
    """
    `rewrite` is syntactic sugar for a specific form of `rule`
    which simply unions the left and right hand sides.

    Example:
    ```text
    (rewrite (Add a b)
             (Add b a))
    ```

    Desugars to:
    ```text
    (rule ((= lhs (Add a b)))
          ((union lhs (Add b a))))
    ```

    Additionally, additional facts can be specified
    using a `:when` clause.
    For example, the same rule can be run only
    when `a` is zero:

    ```text
    (rewrite (Add a b)
             (Add b a)
             :when ((= a (Num 0)))
    ```

    Add the `:subsume` flag to cause the left hand side to be subsumed after matching, which means it can
    no longer be matched in a rule, but can still be checked against (See [`Change`] for more details.)

    ```text
    (rewrite (Mul a 2) (bitshift-left a 1) :subsume)
    ```

    Desugars to:
    ```text
    (rule ((= lhs (Mul a 2)))
          ((union lhs (bitshift-left a 1))
           (subsume (Mul a 2))))
    ```
    """

    field1: str
    field2: GenericRewrite
    field3: bool


class GenericRewrite(Struct):
    span: Span
    lhs: GenericExpr
    rhs: GenericExpr
    conditions: List[GenericFact]


class BiRewrite(Struct, tag=True):
    """
    Similar to [`Command::Rewrite`], but
    generates two rules, one for each direction.

    Example:
    ```text
    (bi-rewrite (Mul (Var x) (Num 0))
                (Var x))
    ```

    Becomes:
    ```text
    (rule ((= lhs (Mul (Var x) (Num 0))))
          ((union lhs (Var x))))
    (rule ((= lhs (Var x)))
          ((union lhs (Mul (Var x) (Num 0)))))
    ```
    """

    field1: str
    field2: GenericRewrite


class Action(Struct, tag=True):
    """
    Perform an [`Action`] on the global database
    (see documentation for [`Action`] for more details).
    Example:
    ```text
    (let xplusone (Add (Var "x") (Num 1)))
    ```
    """

    field1: GenericAction


class Extract(Struct, tag=True):
    """
    `extract` a datatype from the egraph, choosing
    the smallest representative.
    By default, each constructor costs 1 to extract
    (common subexpressions are not shared in the cost
    model).
    """

    field1: Span
    field2: GenericExpr
    field3: GenericExpr


class RunSchedule(Struct, tag=True):
    """
    Runs a [`Schedule`], which specifies
    rulesets and the number of times to run them.

    Example:
    ```text
    (run-schedule
        (saturate my-ruleset-1)
        (run my-ruleset-2 4))
    ```

    Runs `my-ruleset-1` until saturation,
    then runs `my-ruleset-2` four times.

    See [`Schedule`] for more details.
    """

    field1: GenericSchedule


class Saturate(Struct, tag=True):
    field1: Span
    field2: GenericSchedule


class Repeat(Struct, tag=True):
    field1: Span
    field2: Annotated[int, Meta(ge=0)]
    field3: GenericSchedule


class Run(Struct, tag=True):
    field1: Span
    field2: GenericRunConfig


class GenericRunConfig(Struct):
    ruleset: str
    until: Optional[List[GenericFact]] = None


class Sequence(Struct, tag=True):
    field1: Span
    field2: List[GenericSchedule]


class Check(Struct, tag=True):
    """
    The `check` command checks that the given facts
    match at least once in the current database.
    The list of facts is matched in the same way a [`Command::Rule`] is matched.

    Example:

    ```text
    (check (= (+ 1 2) 3))
    (check (<= 0 3) (>= 3 0))
    (fail (check (= 1 2)))
    ```

    prints

    ```text
    [INFO ] Checked.
    [INFO ] Checked.
    [ERROR] Check failed
    [INFO ] Command failed as expected.
    ```
    """

    span: Span
    facts: List[GenericFact]


class Output(Struct, tag=True):
    """
    Extract and output a set of expressions to a file.
    """

    span: Span
    file: str
    exprs: List[GenericExpr]


class Fail(Struct, tag=True):
    """
    Assert that a command fails with an error.
    """

    field1: Span
    field2: GenericCommand


class UserDefined(Struct, tag=True):
    """
    User-defined command.
    """

    field1: Span
    field2: str
    field3: List[GenericExpr]


GenericCommand = Annotated[
    Union[
        Sort,
        Datatype,
        Datatypes,
        Constructor,
        Relation,
        Function,
        AddRuleset,
        UnstableCombinedRuleset,
        Rule,
        Rewrite,
        BiRewrite,
        Action,
        Extract,
        RunSchedule,
        PrintOverallStatistics,
        Check,
        PrintFunction,
        PrintSize,
        Input,
        Output,
        Push,
        Pop,
        Fail,
        Include,
        UserDefined,
    ],
    Meta(
        description='A [`Command`] is the top-level construct in egglog.\nIt includes defining rules, declaring functions,\nadding to tables, and running rules (via a [`Schedule`]).'
    ),
]


GenericExpr = Union[Var, Call, Lit]


Subdatatypes = Union[Variants, NewSort]


GenericAction = Union[Let, Set, Change, UnionModel, Panic2, Expr]


GenericFact = Annotated[
    Union[Eq, Fact],
    Meta(
        description='Facts are the left-hand side of a [`Command::Rule`].\nThey represent a part of a database query.\nFacts can be expressions or equality constraints between expressions.\n\nNote that primitives such as  `!=` are partial.\nWhen two things are equal, it returns nothing and the query does not match.\nFor example, the following egglog code runs:\n```text\n(fail (check (!= 1 1)))\n```'
    ),
]


GenericSchedule = Union[Saturate, Repeat, Run, Sequence]


ArrayOfGenericCommand = Annotated[
    List[GenericCommand], Meta(title='Array_of_GenericCommand')
]


Datatype1 = Tuple[Span, str, Subdatatypes]


GenericActions = List[GenericAction]

TODO in egglog:

TODO on Python side:

  • Push PR for adding tags to datamodel library
  • Push PR for fixing ordering to datamodel library
  • Test with new bindings

@saulshanabrook saulshanabrook added status:needs decision Needs feedback from a project meeting to know whether to move forward status:needs discussion Need more design/conversation on the issue inside or outside a meeting to get to a concrete proposal status:ready for work and removed status:needs decision Needs feedback from a project meeting to know whether to move forward status:needs discussion Need more design/conversation on the issue inside or outside a meeting to get to a concrete proposal labels Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider Exporting Machine Readable Command and Output Types

1 participant