Skip to content

feat: add Malli function schema bridge for verified functions#2

Open
xlfe wants to merge 1 commit into
replikativ:mainfrom
psithur:feat/malli-integration
Open

feat: add Malli function schema bridge for verified functions#2
xlfe wants to merge 1 commit into
replikativ:mainfrom
psithur:feat/malli-integration

Conversation

@xlfe
Copy link
Copy Markdown

@xlfe xlfe commented Apr 13, 2026

Introduce ansatz.malli namespace that automatically converts CIC type signatures from a/defn into Malli function schemas (e.g. Nat becomes [:and :int [:>= 0]], (List Bool) becomes [:sequential :boolean]).

When malli? is bound to true, a/defn registers the generated schema with Malli's registry so that mi/instrument! can add runtime input/output validation — giving both compile-time proof and runtime contract checking on the same function. Malli is an optional dependency loaded lazily via requiring-resolve; the defn macro degrades gracefully if it is absent.

Introduce ansatz.malli namespace that automatically converts CIC type
signatures from a/defn into Malli function schemas (e.g. Nat becomes
[:and :int [:>= 0]], (List Bool) becomes [:sequential :boolean]).

When *malli?* is bound to true, a/defn registers the generated schema
with Malli's registry so that mi/instrument! can add runtime input/output
validation — giving both compile-time proof and runtime contract checking
on the same function. Malli is an optional dependency loaded lazily via
requiring-resolve; the defn macro degrades gracefully if it is absent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@whilo whilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this — the direction is right and the effort is clear. Before this can merge I want to lay out the design precisely, because I think the integration can be significantly cleaner and more valuable. I'm leaving this open for you to iterate on.


Core constraint: zero changes to core.clj

The defn macro is the heart of the system. Remove *malli?*, remove the requiring-resolve call inside the macro, and remove the docstring amendment. Everything Malli-related must live entirely within ansatz.malli.

Registration should be provided by a wrapper macro in ansatz.malli itself:

;; Users opt in explicitly — no core coupling
(defmacro defn [fn-name params ret-type & body]
  `(do
     (a/defn ~fn-name ~params ~ret-type ~@body)
     (register-schema! '~(ns-name *ns*) '~fn-name)))

Or bulk instrumentation after the fact:

(am/instrument-all! @a/ansatz-env)

Either way, the user makes an explicit choice to bring Malli in. The core stays clean.


Dependency

{:local/root "../malli"} cannot be merged — it breaks anyone who doesn't have Malli checked out next to Ansatz. Use a proper coordinate under a dedicated :malli alias (not :test), so it's clearly optional:

:malli {:extra-paths ["src"]
        :extra-deps {metosin/malli {:mvn/version "0.16.4"}}}

The type mapping — this is the hard part

This is where the current PR needs the most work, and it needs to be thought through carefully because the mapping is not straightforward. Here is a complete design spec.

Ground truth: what actually reaches the Clojure runtime

The key insight is that the schema must reflect what compute-arity says about the function, not the raw CIC type. The existing compute-arity function (core.clj:114) is your oracle:

(loop [t fn-type explicit 0 erased 0 in-prefix true]
  (if (e/forall? t)
    (let [bi (e/forall-info t)]
      (if (= :default bi)
        (recur (e/forall-body t) (inc explicit) erased false)
        (recur (e/forall-body t) explicit ...)))
    {:arity explicit :erased erased}))

Only :default binder parameters survive to the Clojure runtime. Everything else — :implicit type parameters, :inst-implicit type class instances, universe parameters — is either synthesized at elaboration time or erased during compilation. Your fn-schema walker must use exactly this same binder-info logic, not just walk all foralls.

The current PR's fn-schema already does this correctly — but it's important to understand why, so the rest of the mapping is done consistently.

Category 1 — Clean scalar mappings

These are unambiguous. The only note is that the information loss on Nat should be documented explicitly:

CIC type Malli schema Note
Nat [:and :int [:>= 0]] Lean Nat is unbounded; Clojure int is bounded to Long/MAX_VALUE. Intentionally lossy — document this.
Int :int
Bool :boolean
String :string
Float / Real :double
Unit :nil Compiles to nil at runtime

Category 2 — Structural types (recursive mapping, information loss expected)

These are expressible but the mapping loses any dependent constraints. This is fine and expected — document it.

CIC type Malli schema
List α [:sequential (type->malli α)]
Array α [:vector (type->malli α)]
Option α [:maybe (type->malli α)]
Prod α β [:tuple (type->malli α) (type->malli β)]
Sum α β [:or (type->malli α) (type->malli β)]

User-defined inductives and structures are the hard case. At runtime:

  • Structures (a/structure) compile to Clojure defrecord instances. The schema should be [:map [:field1 schema1] [:field2 schema2] ...]. To generate this, look up the structure in structure-registry and read the field types from the constructor's CIC type in the env. The field types are themselves CIC types that need recursive mapping.

  • Enumerations (inductives with all nullary constructors, e.g. (a/inductive Color [] (red) (green) (blue))) compile to integer indices (0, 1, 2, ...) at runtime. The schema should be [:enum 0 1 2] — though keyword variants might be a richer alternative worth discussing.

  • General inductives (mixed nullary + n-ary constructors) compile to [constructor-index & fields] tagged vectors. There is no clean Malli schema for these; :any with a warning is the right default until a richer [:multi ...] representation is worked out.

The implication: type->malli needs access to the Ansatz environment and structure-registry, not just the type expression. Thread these through the opts map you already have.

Category 3 — Erased at runtime (never include in schema)

These never appear in the compiled Clojure function. If encountered while walking the forall telescope, they must be skipped silently (exactly as compute-arity does):

  • Proof parameters: any :default parameter whose type is Prop-sorted. The current PR misses this case. A function like (a/defn safe-div [n :- Nat, m :- Nat, h :- (< 0 m)] Nat ...) has h with :default binder-info, but its type is a proposition. At runtime h receives an opaque proof term — it has no meaningful Malli schema. Detect via e/sort? on the parameter type, and check the sort level is Prop (level zero). Skip these parameters.
  • Type class instances: :inst-implicit — already handled correctly.
  • Universe parameters: :implicit where the type is Sort _ — already handled.

Category 4 — Dependent / indexed types (the hard problem — be honest about limitations)

Malli has no concept of dependent types. Do not try to solve this now. The correct approach for each case:

  • Vector α n (fixed-length): The index n is type-level — it is not passed to the compiled function at runtime. The runtime value is just a sequential collection. Schema: [:sequential (type->malli α)]. The length constraint is guaranteed by the kernel; it does not need runtime re-checking.

  • Fin n (bounded natural): The bound n lives in the type, not in the runtime value. Schema: [:and :int [:>= 0]].

  • Subtype P / {x : α // P x}: At runtime, only the witness x : α is present — the proof of P x is erased. Schema: (type->malli α).

  • Matrix m n α, ZMod n, etc.: :any with a logged warning.

The general rule: for dependent types, produce the schema for the underlying carrier type and document that the dependent constraint is enforced by the kernel, not at runtime. This is correct semantics, not a limitation — the proof system already guarantees the constraint.

Category 5 — Mathlib abstract algebra (:any for now)

GroupHom, RingHom, LinearMap, Polynomial, ENNReal, etc. are opaque mathematical structures with no meaningful Malli schema. Return :any with a warning. The right extensibility mechanism for the future is a user-registered dispatch table:

(am/register-type-mapping! "Finset"
  (fn [args opts] [:set (type->malli (first args) opts)]))

Design this hook now even if the built-in table is small.

What to do at unknown boundaries

Never silently return :any. Return :any and emit a warning at registration time so the user knows their schema is approximate:

(println "WARN ansatz.malli: no schema for" type-name "— using :any")

Edit: This should probably have a dynamic var to turn warnings on or off.


Priority: what to build first

1. Generative testing — this is the unique value. No verified-function system has this. malli.generator can generate random inputs from CIC types and property-test the compiled function against them. This tests the ansatz->clj compilation step, which is the least formally verified part of the pipeline:

(am/check-verified 'double {:iterations 1000})
;; => nil if passes, {:schema ... :input [...] :output ...} on failure

Implement using malli.generator/function-checker. This should be the centrepiece.

2. Boundary instrumentation. am/instrument! / am/unstrument! for guarding verified functions against unverified callers. Off by default.

3. Schema inspection. am/fn-schema-for as a REPL convenience. Already mostly there.


Test fixes

  • test-malli-flag-auto-register calls define-verified (the function) then manually calls register-from-defn! — it never exercises the macro auto-registration path it claims to test. Rewrite against the wrapper macro once *malli?* is removed from core.

  • The test fixture that resets ansatz-env should use env/fork + try/finally to avoid contaminating other tests. See test-isort-preservation in core_test.clj as the model.

  • Tests for the type mapping should cover all five categories above, including explicit tests for unknown-boundary behaviour (:any + warning logged).


Happy to answer questions on any of this. The idea is solid — I just want the implementation to be as rigorous as the rest of the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants