diff --git a/lib/elixir/block_b_notes.md b/lib/elixir/block_b_notes.md new file mode 100644 index 00000000000..6808b7232a7 --- /dev/null +++ b/lib/elixir/block_b_notes.md @@ -0,0 +1,14 @@ +## Notes for Block B - Graham 26/02/2026 + +- B.1 - Update internal representations to hold nodes rather than descrs: I'm actually not sure if anything has to be done here. In an object-oriented language, we would be changing field types, however here where we simply process structures as given, it should only matter whether the functions doing such processing can handle such field shapes. This describes exactly the next task. +- B.2 - Update type constructors to accept nodes as arguments: This was already done as part of Block A. For example, in tuple_descr/3, the first line is `value_descr = to_descr(value)`, to handle the case where `value` is a node, rather than a descr as it previously would have been. +- B.3 - Update kind-level set operations to handle nodes in sub-type positions: + * **Question:** Looking at the implemention of `union` for example, we see `union/3` which calls code such as `tuple_union/2`. I don't think I understand how, for example, a union between a tuple and an atom is created. + * Maybe answer: the key isn't necessarily saying "this is a union between two maps" for example; it is rather saying "the first things is a map, give me the union between that and whatever the second value is on the BDD level" + * Hint for myself to properly interpret code: a descr is a collection of disjoint BDDs. So each value of a key in a descr is a BDD in itself. + +- Overall, it appears what needs to be ensured **is that BDDs can contain nodes, and not just descrs**. + * Doing a quick skim of the set operations, it doesn't seem like any of them explicitly assume sub-components of a BDD to be a descr rather than a node. They end up returning sub-components directly, or re-passing sub-components to other BDD operations with the same characteristics. I believe that overall this is a good sign. + * I think the easiest way to see what actually has to be changed is to create some test cases. Or at least, create some literal creation/literal operation cases that can be run in the REPL to demonstrate whatever the behavior at the time (during incremental code changes) is. + * With all that being said, it feels like right now, normal operations *should* more or less work with stateless (non-recursive) nodes. And if they don't, it's jsut a matter of tracking down trivial signature changes. I'm really not sure Block B has that much work as long as functions are dynamic. It seems to me like the next real work is Block C where nodes will have to be unrolled one level before having the Block B set operations applied to them. + * I might hypothesize that the LLM is imagining the evaluation of set operations in a recursive expression manner because that's what it looks like for a programming language implementation for example, but for better or worse, within this set-theoretic types framework we somehow end up doing fewer "unwrap"-recursive calls than expected. That, or I'm just totally missing them. diff --git a/lib/elixir/lib/module/types/descr.ex b/lib/elixir/lib/module/types/descr.ex index 17d17cbbbed..8bd08de2511 100644 --- a/lib/elixir/lib/module/types/descr.ex +++ b/lib/elixir/lib/module/types/descr.ex @@ -83,6 +83,7 @@ defmodule Module.Types.Descr do @compile {:inline, unfold: 1} defp unfold(:term), do: unfolded_term() + defp unfold({_id, _state, _generator} = node), do: to_descr(node) |> unfold() defp unfold(other), do: other defp unfolded_term, do: @term @@ -110,6 +111,192 @@ defmodule Module.Types.Descr do @boolset :sets.from_list([true, false], version: 2) def boolean(), do: %{atom: {:union, @boolset}} + ## Nodes + + # @spec descr() :: map() | :term + # @spec state() :: map() + # @spec node() :: {id :: reference(), state :: map(), generator :: ((atom() -> node()) -> node())} + def make_node(state, generator), do: {make_ref(), state, generator} + + # @spec to_node(descr :: descr() | node()) :: node() + def to_node(:term), do: make_node(:term, fn _state -> :term end) + def to_node(descr = %{}), do: make_node(%{}, fn _state -> descr end) + def to_node(node) when is_tuple(node), do: node + + def to_descr(:term), do: :term + def to_descr(descr = %{}), do: descr + + def to_descr({_id, state, generator}) do + recur = fn name -> make_node(state, state[name]) end + generator.(recur) + end + + # Collect generator functions embedded in a type (node, descr, or :term). + # Walks BDD leaves without unfolding any nodes. Returns a list of generators. + defp collect_generators({ref, _state, gen}) when is_reference(ref), do: [gen] + defp collect_generators(:term), do: [] + + defp collect_generators(%{} = descr) do + # Use direct key lookups instead of Enum.flat_map to avoid dispatching + # the Enumerable protocol on maps with __struct__ (e.g. struct types). + tuple_gens = + case descr do + %{tuple: bdd} -> collect_generators_from_bdd(bdd, &collect_generators_tuple_leaf/1) + _ -> [] + end + + list_gens = + case descr do + %{list: bdd} -> collect_generators_from_bdd(bdd, &collect_generators_list_leaf/1) + _ -> [] + end + + map_gens = + case descr do + %{map: bdd} -> collect_generators_from_bdd(bdd, &collect_generators_map_leaf/1) + _ -> [] + end + + tuple_gens ++ list_gens ++ map_gens + end + + defp collect_generators(_), do: [] + + defp collect_generators_from_bdd(:bdd_top, _leaf_fn), do: [] + defp collect_generators_from_bdd(:bdd_bot, _leaf_fn), do: [] + defp collect_generators_from_bdd({_, _} = leaf, leaf_fn), do: leaf_fn.(leaf) + + defp collect_generators_from_bdd({lit, c, u, d}, leaf_fn) do + leaf_fn.(lit) ++ + collect_generators_from_bdd(c, leaf_fn) ++ + collect_generators_from_bdd(u, leaf_fn) ++ + collect_generators_from_bdd(d, leaf_fn) + end + + # Tuple leaf: {tag, [element, ...]} + defp collect_generators_tuple_leaf({_tag, elements}) do + Enum.flat_map(elements, &collect_generators/1) + end + + # List leaf: {list_type, last_type} + defp collect_generators_list_leaf({list_type, last_type}) do + collect_generators(list_type) ++ collect_generators(last_type) + end + + # Map leaf: {tag, %{key => value, ...}} or {%{domains}, %{fields}} + defp collect_generators_map_leaf({tag, fields}) when is_map(fields) do + field_gens = :maps.fold(fn _k, v, acc -> collect_generators(v) ++ acc end, [], fields) + + case tag do + %{} -> :maps.fold(fn _k, v, acc -> collect_generators(v) ++ acc end, field_gens, tag) + _ -> field_gens + end + end + + defp collect_generators_map_leaf(_), do: [] + + defp empty_memo_key(descr), do: {:empty_seen, empty_memo_normalize(descr)} + + # Normalize recursive obligations to stable unary memo keys by erasing + # fresh node references and retaining generator identity only. + defp empty_memo_normalize({ref, _state, generator}) when is_reference(ref) do + {:node, generator} + end + + # Compiler-internal struct-like maps such as %{__struct__: :term} should not + # be traversed as regular maps for this prototype key. + defp empty_memo_normalize(%{__struct__: struct}) when is_atom(struct) do + {:struct, struct} + end + + defp empty_memo_normalize(%{} = map) do + map + |> :maps.to_list() + |> then(fn entries -> + :lists.map( + fn {key, value} -> {empty_memo_normalize(key), empty_memo_normalize(value)} end, + entries + ) + end) + |> :lists.sort() + end + + defp empty_memo_normalize(list) when is_list(list) do + :lists.map(&empty_memo_normalize/1, list) + end + + defp empty_memo_normalize(tuple) when is_tuple(tuple) do + tuple + |> Tuple.to_list() + |> then(fn elements -> :lists.map(&empty_memo_normalize/1, elements) end) + |> List.to_tuple() + end + + defp empty_memo_normalize(other), do: other + + @doc """ + Builds recursive type nodes from a set of mutually recursive equations. + + Takes a map `%{name => generator}` where each generator is a function + `fn recur -> descr` and `recur.(name)` builds the recursive node for `name`. + + Returns a map `%{name => node}` with one self-contained node per variable. + + ## Important constraint + + Inside a generator, recursion variables (nodes built from the state) may + only appear as arguments to **type constructors** (`tuple/1`, `list/1`, + `non_empty_list/2`, `closed_map/1`, `open_map/1`, etc.). + + They must **never** be passed to **set-theoretic operations** (`union/2`, + `intersection/2`, `difference/2`, `negation/1`) inside the generator. + These operations eagerly unfold their arguments via `to_descr/1`, which + triggers the generator again, causing infinite recursion. + + Correct: node inside a constructor, set operations on the result: + + tuple([integer(), recur.(:X)]) + |> union(atom([:nil])) + + Wrong: node passed directly to a set operation: + + union(recur.(:X), atom([:nil])) + + ## Example + + The recursive type `X = {integer(), X} | nil` is defined as: + + %{X: node_x} = recursive(%{ + X: fn recur -> + tuple([integer(), recur.(:X)]) + |> union(atom([:nil])) + end + }) + + For mutually recursive types `X = {int, Y} | nil` and `Y = {bool, X} | nil`: + + %{X: node_x, Y: node_y} = recursive(%{ + X: fn recur -> + tuple([integer(), recur.(:Y)]) + |> union(atom([:nil])) + end, + Y: fn recur -> + tuple([boolean(), recur.(:X)]) + |> union(atom([:nil])) + end + }) + """ + def recursive(equations) when is_map(equations) do + # The generators are the recursive environment: each variable maps to its + # generator function and is resolved lazily via recur.(name). + state = equations + + # Step 4: build one node per variable with the shared state. + Map.new(equations, fn {name, generator} -> + {name, make_node(state, generator)} + end) + end + @doc """ Gets the upper bound of a gradual type. @@ -360,6 +547,7 @@ defmodule Module.Types.Descr do Returns true if the type has a gradual part. """ def gradual?(:term), do: false + def gradual?({_id, _state, _generator} = node), do: gradual?(to_descr(node)) def gradual?(descr), do: is_map_key(descr, :dynamic) @doc """ @@ -397,7 +585,7 @@ defmodule Module.Types.Descr do def union(none, other) when none == @none, do: other def union(other, none) when none == @none, do: other - def union(left, right) do + def union(%{} = left, %{} = right) do is_gradual_left = gradual?(left) is_gradual_right = gradual?(right) @@ -415,6 +603,8 @@ defmodule Module.Types.Descr do end end + def union(left, right), do: union(to_descr(left), to_descr(right)) + @compile {:inline, union_static: 2} defp union_static(left, right) do symmetrical_merge(left, right, &union/3) @@ -437,7 +627,7 @@ defmodule Module.Types.Descr do def intersection(%{dynamic: :term}, other), do: dynamic(remove_optional(other)) def intersection(other, %{dynamic: :term}), do: dynamic(remove_optional(other)) - def intersection(left, right) do + def intersection(%{} = left, %{} = right) do is_gradual_left = gradual?(left) is_gradual_right = gradual?(right) @@ -455,6 +645,8 @@ defmodule Module.Types.Descr do end end + def intersection(left, right), do: intersection(to_descr(left), to_descr(right)) + @compile {:inline, intersection_static: 2} defp intersection_static(left, right) do symmetrical_intersection(left, right, &intersection/3) @@ -480,7 +672,9 @@ defmodule Module.Types.Descr do def difference(left, :term), do: keep_optional(left) def difference(left, none) when none == @none, do: left - def difference(left, right) do + def difference(left, %{} = right) do + left = to_descr(left) + if gradual?(left) or gradual?(right) do {left_dynamic, left_static} = pop_dynamic(left) {right_dynamic, right_static} = pop_dynamic(right) @@ -492,6 +686,8 @@ defmodule Module.Types.Descr do end end + def difference(left, right), do: difference(to_descr(left), to_descr(right)) + # For static types, the difference is component-wise defp difference_static(left, descr) when descr == %{}, do: left defp difference_static(left, :term), do: keep_optional(left) @@ -521,10 +717,9 @@ defmodule Module.Types.Descr do defp iterator_difference_static(:none, map), do: map - # This function is designed to compute the difference during subtyping efficiently. - # Do not use it for anything else. + # Fused difference + emptiness check that short-circuits per-key. + # This is Algorithm 1 from Laurent & Nguyen, fused with subtyping. defp empty_difference_subtype?(%{dynamic: dyn_left} = left, %{dynamic: dyn_right} = right) do - # Dynamic will either exist on both sides or on none empty_difference_subtype?(dyn_left, dyn_right) and empty_difference_subtype?(Map.delete(left, :dynamic), Map.delete(right, :dynamic)) end @@ -532,22 +727,26 @@ defmodule Module.Types.Descr do defp empty_difference_subtype?(left, :term), do: keep_optional(left) == @none defp empty_difference_subtype?(left, right) do - iterator_empty_difference_subtype?(:maps.next(:maps.iterator(unfold(left))), unfold(right)) + iterator_empty_difference_subtype?( + :maps.next(:maps.iterator(unfold(left))), + unfold(right), + %{} + ) end - defp iterator_empty_difference_subtype?({key, v1, iterator}, map) do + defp iterator_empty_difference_subtype?({key, v1, iterator}, map, seen) do case map do %{^key => v2} -> value = difference(key, v1, v2) - value in @empty_difference or empty_key?(key, value) + value in @empty_difference or empty_key?(key, value, seen) %{} -> - empty_key?(key, v1) + empty_key?(key, v1, seen) end and - iterator_empty_difference_subtype?(:maps.next(iterator), map) + iterator_empty_difference_subtype?(:maps.next(iterator), map, seen) end - defp iterator_empty_difference_subtype?(:none, _map), do: true + defp iterator_empty_difference_subtype?(:none, _map, _seen), do: true # Returning 0 from the callback is taken as none() for that subtype. defp difference(:atom, v1, v2), do: atom_difference(v1, v2) @@ -563,6 +762,7 @@ defmodule Module.Types.Descr do """ def negation(:term), do: none() def negation(%{} = descr), do: difference(term(), descr) + def negation(node), do: negation(to_descr(node)) @doc """ Check if a type is empty. @@ -573,35 +773,62 @@ defmodule Module.Types.Descr do the type is non-empty as we normalize then during construction. """ def empty?(:term), do: false + def empty?(%{} = descr), do: empty_seen?(descr, %{}) + def empty?({_id, _state, _gen} = node), do: empty_seen?(node, %{}) - def empty?(%{} = descr) do - case :maps.get(:dynamic, descr, descr) do - :term -> - false + # This is Algorithm 1 (non-optimized version) from + # "Implement Set-Theoretic Types" (https://hal.science/hal-05369012v1/document). + # TODO: implement Algorithm 2 (optimized). + # + # Internal emptiness check with coinductive memoization. + # The `seen` map stores both raw node generators and normalized unary + # emptiness obligations so recursive calls can terminate structurally. + defp empty_seen?(:term, _seen), do: false - value when value == @none -> - true + defp empty_seen?(%{} = descr, seen) do + key = empty_memo_key(descr) + + if :erlang.is_map_key(key, seen) do + true + else + seen = Map.put(seen, key, true) + + case :maps.get(:dynamic, descr, descr) do + :term -> + false + + value when value == @none -> + true - descr -> - not Map.has_key?(descr, :atom) and - not Map.has_key?(descr, :bitmap) and - not Map.has_key?(descr, :optional) and - (not Map.has_key?(descr, :tuple) or tuple_empty?(descr.tuple)) and - (not Map.has_key?(descr, :map) or map_empty?(descr.map)) and - (not Map.has_key?(descr, :list) or list_empty?(descr.list)) and - (not Map.has_key?(descr, :fun) or fun_empty?(descr.fun)) + descr -> + not Map.has_key?(descr, :atom) and + not Map.has_key?(descr, :bitmap) and + not Map.has_key?(descr, :optional) and + (not Map.has_key?(descr, :tuple) or tuple_empty?(descr.tuple, seen)) and + (not Map.has_key?(descr, :map) or map_empty?(descr.map, seen)) and + (not Map.has_key?(descr, :list) or list_empty?(descr.list, seen)) and + (not Map.has_key?(descr, :fun) or fun_empty?(descr.fun, seen)) + end end end - defp empty_or_optional?(type), do: empty?(remove_optional(type)) + # Node: check for cycle via generator identity, then unfold + defp empty_seen?({_id, _state, generator} = node, seen) do + if :erlang.is_map_key(generator, seen) do + true + else + seen = Map.put(seen, generator, true) + empty_seen?(to_descr(node), seen) + end + end # For atom, bitmap, tuple, and optional, if the key is present, # then they are not empty, - defp empty_key?(:fun, value), do: fun_empty?(value) - defp empty_key?(:map, value), do: map_empty?(value) - defp empty_key?(:list, value), do: list_empty?(value) - defp empty_key?(:tuple, value), do: tuple_empty?(value) - defp empty_key?(_, _value), do: false + defp empty_key?(:fun, value, seen), do: fun_empty?(value, seen) + defp empty_key?(:map, value, seen), do: map_empty?(value, seen) + defp empty_key?(:list, value, seen), do: list_empty?(value, seen) + defp empty_key?(:tuple, value, seen), do: tuple_empty?(value, seen) + defp empty_key?(_, _value, _seen), do: false @doc """ Converts all floats or integers into numbers. @@ -817,6 +1044,11 @@ defmodule Module.Types.Descr do right_static = Map.delete(right, :dynamic) subtype_static?(left, right_static) + is_grad_left -> + # Both gradual: check dynamic and static parts separately + subtype_static?(Map.get(left, :dynamic), Map.get(right, :dynamic)) and + subtype_static?(Map.delete(left, :dynamic), Map.delete(right, :dynamic)) + true -> subtype_static?(left, right) end @@ -825,6 +1057,10 @@ defmodule Module.Types.Descr do defp subtype_static?(same, same), do: true defp subtype_static?(left, right), do: empty_difference_subtype?(left, right) + # Recursive subtype check that reuses the unary emptiness memo by reducing + # subtyping to emptiness of the set difference under the current `seen` map. + defp subtype_seen?(left, right, seen), do: empty_seen?(difference(left, right), seen) + @doc """ Check if a type is equal to another. @@ -846,6 +1082,8 @@ defmodule Module.Types.Descr do # Two gradual types are disjoint if their upper bounds are disjoint. def disjoint?(left, right) do + left = to_descr(left) + right = to_descr(right) left_upper = unfold(left) |> Map.get(:dynamic, left) right_upper = unfold(right) |> Map.get(:dynamic, right) @@ -1526,14 +1764,14 @@ defmodule Module.Types.Descr do # and compile into tuples of positive and negative nodes. Positive nodes are # those followed by a left path, negative nodes are those followed by a right path. defp fun_bdd_to_dnf(bdd), - do: bdd_to_dnf(bdd) |> Enum.filter(fn {pos, neg} -> not fun_line_empty?(pos, neg) end) + do: bdd_to_dnf(bdd) |> Enum.filter(fn {pos, neg} -> not fun_line_empty?(pos, neg, %{}) end) # Check if all functions types for all arities are empty. - defp fun_empty?({:negation, _}), do: false + defp fun_empty?({:negation, _}, _seen), do: false - defp fun_empty?({:union, repr}) do + defp fun_empty?({:union, repr}, seen) do Enum.all?(repr, fn {_ar, bdd} -> - Enum.all?(bdd_to_dnf(bdd), fn {pos, neg} -> fun_line_empty?(pos, neg) end) + Enum.all?(bdd_to_dnf(bdd), fn {pos, neg} -> fun_line_empty?(pos, neg, seen) end) end) end @@ -1544,9 +1782,9 @@ defmodule Module.Types.Descr do # # - `{[fun(1)], []}` is not empty # - `{[fun(integer() -> atom())], [fun(none() -> term())]}` is empty - defp fun_line_empty?([], _), do: false + defp fun_line_empty?([], _, _seen), do: false - defp fun_line_empty?(positives, negatives) do + defp fun_line_empty?(positives, negatives, seen) do # Check if any negative function negates the whole positive intersection # e.g. (integer() -> atom()) is negated by: # @@ -1557,8 +1795,8 @@ defmodule Module.Types.Descr do Enum.any?(negatives, fn {neg_arguments, neg_return} -> # Check if the negative function's domain is a supertype of the positive # domain and if the phi function determines emptiness. - subtype?(args_to_domain(neg_arguments), fetch_domain(positives)) and - phi_starter(neg_arguments, neg_return, positives) + subtype_seen?(args_to_domain(neg_arguments), fetch_domain(positives), seen) and + phi_starter(neg_arguments, neg_return, positives, seen) end) end @@ -1582,12 +1820,12 @@ defmodule Module.Types.Descr do # Returns true if the intersection of the positives is a subtype of (t1,...,tn)->(not t). # # See [Castagna and Lanvin (2024)](https://arxiv.org/abs/2408.14345), Theorem 4.2. - defp phi_starter(arguments, return, positives) do + defp phi_starter(arguments, return, positives, seen) do # Optimization: When all positive functions have non-empty domains, # we can simplify the phi function check to a direct subtyping test. # This avoids the expensive recursive phi computation by checking only that applying the # input to the positive intersection yields a subtype of the return - case disjoint_non_empty_domains?({arguments, return}, positives) do + case disjoint_non_empty_domains?({arguments, return}, positives, seen) do :disjoint_non_empty -> apply_disjoint(arguments, positives) |> subtype?(return) @@ -1597,17 +1835,20 @@ defmodule Module.Types.Descr do _ -> # Initialize memoization cache for the recursive phi computation arguments = Enum.map(arguments, &{false, &1}) - {result, _cache} = phi(arguments, {false, negation(return)}, positives, %{}) + {result, _cache} = phi(arguments, {false, negation(return)}, positives, %{}, seen) result end end - defp phi(args, {b, t}, [], cache) do - result = Enum.any?(args, fn {bool, typ} -> bool and empty?(typ) end) or (b and empty?(t)) + defp phi(args, {b, t}, [], cache, seen) do + result = + Enum.any?(args, fn {bool, typ} -> bool and empty_seen?(typ, seen) end) or + (b and empty_seen?(t, seen)) + {result, Map.put(cache, {args, {b, t}, []}, result)} end - defp phi(args, {b, ret}, [{arguments, return} | rest_positive], cache) do + defp phi(args, {b, ret}, [{arguments, return} | rest_positive], cache, seen) do # Create cache key from function arguments cache_key = {args, {b, ret}, [{arguments, return} | rest_positive]} @@ -1617,7 +1858,8 @@ defmodule Module.Types.Descr do %{} -> # Compute result and cache it - {result1, cache} = phi(args, {true, intersection(ret, return)}, rest_positive, cache) + {result1, cache} = + phi(args, {true, intersection(ret, return)}, rest_positive, cache, seen) if not result1 do cache = Map.put(cache, cache_key, false) @@ -1629,7 +1871,7 @@ defmodule Module.Types.Descr do {new_result, new_cache} = args |> List.update_at(index, fn {_, arg} -> {true, difference(arg, type)} end) - |> phi({b, ret}, rest_positive, acc_cache) + |> phi({b, ret}, rest_positive, acc_cache, seen) if new_result do {:cont, {index + 1, acc_result and new_result, new_cache}} @@ -1645,9 +1887,9 @@ defmodule Module.Types.Descr do end end - defp disjoint_non_empty_domains?({arguments, return}, positives) do + defp disjoint_non_empty_domains?({arguments, return}, positives, seen) do b1 = all_disjoint_arguments?(positives) - b2 = all_non_empty_arguments?([{arguments, return} | positives]) + b2 = all_non_empty_arguments?([{arguments, return} | positives], seen) cond do b1 and b2 -> :disjoint_non_empty @@ -1656,9 +1898,9 @@ defmodule Module.Types.Descr do end end - defp all_non_empty_arguments?(positives) do + defp all_non_empty_arguments?(positives, seen) do Enum.all?(positives, fn {args, _ret} -> - Enum.all?(args, fn arg -> not empty?(arg) end) + Enum.all?(args, fn arg -> not empty_seen?(arg, seen) end) end) end @@ -1942,54 +2184,83 @@ defmodule Module.Types.Descr do # # none() types can be given and, while stored, it means the list type is empty. defp list_descr(list_type, last_type, empty?) do - {list_dynamic?, list_type} = list_pop_dynamic(list_type) - {last_dynamic?, last_type} = list_pop_dynamic(last_type) + # Detect recursive nodes (from recursive/1, with non-empty state) in + # either position. These must be stored as-is without unfolding to + # prevent infinite recursion during construction. + list_is_rec_node = match?({id, state, _} when is_reference(id) and state != %{}, list_type) + last_is_rec_node = match?({id, state, _} when is_reference(id) and state != %{}, last_type) - list_part = - if last_type == :term do - list_new(:term, :term) - else - case :maps.take(:list, last_type) do - :error -> - list_new(list_type, last_type) + if list_is_rec_node or last_is_rec_node do + # For non-node positions, resolve normally; for node positions, store as-is + {list_dynamic?, resolved_list} = + if list_is_rec_node, do: {false, list_type}, else: list_pop_dynamic(to_descr(list_type)) - {bdd, last_type_no_list} -> - # `last_type` may itself represent one or more list types. - # Our goal is to fold those list types into `list_type` while retaining the - # possible type of the final element (which can be `[]` or any non-list value). - # - # The list types inside `last_type` are stored in a BDD that includes possible - # negations, so we must evaluate each node with its sign taken into account. - # - # A negation only matters when the negated list type is a supertype of the - # corresponding positive list type; in that case we subtract the negated - # variant from the positive one. This is done in list_bdd_to_pos_dnf/1. - {list_type, last_type} = - list_bdd_to_pos_dnf(bdd) - |> Enum.reduce({list_type, last_type_no_list}, fn - {list, last, _negs}, {acc_list, acc_last} -> - {union(list, acc_list), union(last, acc_last)} - end) + {last_dynamic?, resolved_last} = + if last_is_rec_node, do: {false, last_type}, else: list_pop_dynamic(to_descr(last_type)) - list_new(list_type, last_type) - end + list_part = list_new(resolved_list, resolved_last) + + list_descr = + if empty?, do: %{list: list_part, bitmap: @bit_empty_list}, else: %{list: list_part} + + case list_dynamic? or last_dynamic? do + true -> %{dynamic: list_descr} + false -> list_descr end + else + list_type_descr = to_descr(list_type) + last_type_descr = to_descr(last_type) - list_descr = - if empty?, do: %{list: list_part, bitmap: @bit_empty_list}, else: %{list: list_part} + {list_dynamic?, list_type} = list_pop_dynamic(list_type_descr) + {last_dynamic?, last_type} = list_pop_dynamic(last_type_descr) + + list_part = + if last_type == :term do + list_new(:term, :term) + else + case :maps.take(:list, last_type) do + :error -> + list_new(list_type, last_type) + + {bdd, last_type_no_list} -> + # `last_type` may itself represent one or more list types. + # Our goal is to fold those list types into `list_type` while retaining the + # possible type of the final element (which can be `[]` or any non-list value). + # + # The list types inside `last_type` are stored in a BDD that includes possible + # negations, so we must evaluate each node with its sign taken into account. + # + # A negation only matters when the negated list type is a supertype of the + # corresponding positive list type; in that case we subtract the negated + # variant from the positive one. This is done in list_bdd_to_pos_dnf/1. + {list_type, last_type} = + list_bdd_to_pos_dnf(bdd) + |> Enum.reduce({list_type, last_type_no_list}, fn + {list, last, _negs}, {acc_list, acc_last} -> + {union(list, acc_list), union(last, acc_last)} + end) + + list_new(list_type, last_type) + end + end - case list_dynamic? or last_dynamic? do - true -> %{dynamic: list_descr} - false -> list_descr + list_descr = + if empty?, do: %{list: list_part, bitmap: @bit_empty_list}, else: %{list: list_part} + + case list_dynamic? or last_dynamic? do + true -> %{dynamic: list_descr} + false -> list_descr + end end end defp list_new(list_type, last_type), do: bdd_leaf(list_type, last_type) - defp non_empty_list_literals_intersection(list_literals) do + defp non_empty_list_literals_intersection(list_literals, seen) do try do Enum.reduce(list_literals, {:term, :term}, fn {next_list, next_last}, {list, last} -> - {non_empty_intersection!(list, next_list), non_empty_intersection!(last, next_last)} + {non_empty_intersection!(list, next_list, seen), + non_empty_intersection!(last, next_last, seen)} end) catch :empty -> :empty @@ -2007,7 +2278,7 @@ defmodule Module.Types.Descr do defp list_bdd_to_pos_dnf(bdd) do bdd_to_dnf(bdd) |> Enum.reduce([], fn {pos_list, negs}, acc -> - case non_empty_list_literals_intersection(pos_list) do + case non_empty_list_literals_intersection(pos_list, %{}) do :empty -> acc @@ -2038,6 +2309,10 @@ defmodule Module.Types.Descr do end defp list_tail_unfold(:term), do: @not_non_empty_list + + defp list_tail_unfold({id, _state, _gen} = node) when is_reference(id), + do: Map.delete(to_descr(node), :list) + defp list_tail_unfold(other), do: Map.delete(other, :list) @compile {:inline, list_union: 2} @@ -2162,13 +2437,35 @@ defmodule Module.Types.Descr do end end + # X = {int, X} | nil + # %{X: fn recur -> tuple([int(), recur.(X)]) |> union(atom([nil])) + # %{X: fn recur -> tuple([int(), recur.(X)])|> intersection(tuple([int(), float()]) + + # {int, X} /\ {int, float} + + # X = list(list(X, [])) and list(list(Y, [])) + # X & Y + # X & Y + # X \ Y + # X | Y + defp list_leaf_intersection(bdd_leaf(list1, last1), bdd_leaf(list2, last2)) do - try do - list = non_empty_intersection!(list1, list2) - last = non_empty_intersection!(last1, last2) - bdd_leaf(list, last) - catch - :empty -> :bdd_bot + has_node = + collect_generators(list1) != [] or + collect_generators(list2) != [] or + collect_generators(last1) != [] or + collect_generators(last2) != [] + + if has_node do + bdd_intersection(bdd_leaf(list1, last1), bdd_leaf(list2, last2)) + else + try do + list = non_empty_intersection!(list1, list2) + last = non_empty_intersection!(last1, last2) + bdd_leaf(list, last) + catch + :empty -> :bdd_bot + end end end @@ -2182,7 +2479,19 @@ defmodule Module.Types.Descr do # 3. Base case: adds bdd2 type to negations of bdd1 type # The result may be larger than the initial bdd1, which is maintained in the accumulator. defp list_difference(bdd_leaf(list1, last1) = bdd1, bdd_leaf(list2, last2) = bdd2) do + # When element or tail types contain recursive nodes, skip subtype/disjoint + # optimizations to avoid infinite unfolding. Fall through to bdd_difference + # which preserves nodes as-is for later cycle-aware emptiness checking. + has_node = + collect_generators(list1) != [] or + collect_generators(list2) != [] or + collect_generators(last1) != [] or + collect_generators(last2) != [] + cond do + has_node -> + bdd_difference(bdd1, bdd2) + disjoint?(list1, list2) or disjoint?(last1, last2) -> bdd_leaf(list1, last1) @@ -2199,30 +2508,31 @@ defmodule Module.Types.Descr do defp list_difference(bdd1, bdd2), do: bdd_difference(bdd1, bdd2) - defp list_empty?(@non_empty_list_top), do: false + defp list_empty?(@non_empty_list_top, _seen), do: false - defp list_empty?(bdd) do + defp list_empty?(bdd, seen) do bdd_to_dnf(bdd) |> Enum.all?(fn {pos, negs} -> - case non_empty_list_literals_intersection(pos) do + case non_empty_list_literals_intersection(pos, seen) do :empty -> true - {list, last} -> list_line_empty?(list, last, negs) + {list, last} -> list_line_empty?(list, last, negs, seen) end end) end - defp list_line_empty?(list_type, last_type, negs) do + defp list_line_empty?(list_type, last_type, negs, seen) do last_type = list_tail_unfold(last_type) # To make a list {list, last} empty with some negative lists: # 1. Ignore negative lists which do not have a list type that is a supertype of the positive one. # 2. Each of the list supertypes: # a. either completely covers the type, if its last type is a supertype of the positive one, # b. or it removes part of the last type. - empty?(list_type) or empty?(last_type) or + empty_seen?(list_type, seen) or empty_seen?(last_type, seen) or Enum.reduce_while(negs, last_type, fn {neg_type, neg_last}, acc_last_type -> - if subtype?(list_type, neg_type) do + if subtype_seen?(list_type, neg_type, seen) do + neg_last = list_tail_unfold(neg_last) d = difference(acc_last_type, neg_last) - if empty?(d), do: {:halt, nil}, else: {:cont, d} + if empty_seen?(d, seen), do: {:halt, nil}, else: {:cont, d} else {:cont, acc_last_type} end @@ -2611,9 +2921,27 @@ defmodule Module.Types.Descr do end end + # Recursive type nodes are stored as-is without stepping (same as tuple_descr). + # Stepping would call the generator, which would call closed_map/open_map again, + # producing fresh nodes ad infinitum. + defp map_descr_pairs([{key, {id, _state, _gen} = node} | rest], fields, domain, dynamic?) + when is_reference(id) and is_atom(key) do + map_descr_pairs(rest, [{key, node} | fields], domain, dynamic?) + end + + defp map_descr_pairs([{key, {id, _state, _gen} = node} | rest], fields, domain, dynamic?) + when is_reference(id) do + # For domain keys, store node directly. We cannot call if_set/union on nodes + # as those would unfold the generator and loop during construction. + domain = Enum.reduce(key, domain, fn k, acc when is_atom(k) -> Map.put(acc, k, node) end) + map_descr_pairs(rest, fields, domain, dynamic?) + end + defp map_descr_pairs([{key, value} | rest], fields, domain, dynamic?) do + value_descr = to_descr(value) + {value, dynamic?} = - case :maps.take(:dynamic, value) do + case :maps.take(:dynamic, value_descr) do :error -> {value, dynamic?} {dynamic, _static} -> {dynamic, true} end @@ -2660,7 +2988,7 @@ defmodule Module.Types.Descr do defp non_empty_map_only?(descr) do case :maps.take(:map, descr) do :error -> false - {map_bdd, rest} -> empty?(rest) and not map_empty?(map_bdd) + {map_bdd, rest} -> empty?(rest) and not map_empty?(map_bdd, %{}) end end @@ -2675,25 +3003,37 @@ defmodule Module.Types.Descr do defp map_union(bdd1, bdd2), do: bdd_union(bdd1, bdd2) defp maybe_optimize_map_union({tag1, pos1, []} = map1, {tag2, pos2, []} = map2) do - case map_union_optimization_strategy(tag1, pos1, tag2, pos2) do - :all_equal -> - map1 + has_node = + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, pos1) or + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, pos2) or + (is_map(tag1) and + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, tag1)) or + (is_map(tag2) and + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, tag2)) + + if has_node do + nil + else + case map_union_optimization_strategy(tag1, pos1, tag2, pos2) do + :all_equal -> + map1 - :any_map -> - {:open, %{}, []} + :any_map -> + {:open, %{}, []} - {:one_key_difference, key, v1, v2} -> - new_pos = Map.put(pos1, key, union(v1, v2)) - {tag1, new_pos, []} + {:one_key_difference, key, v1, v2} -> + new_pos = Map.put(pos1, key, union(v1, v2)) + {tag1, new_pos, []} - :left_subtype_of_right -> - map2 + :left_subtype_of_right -> + map2 - :right_subtype_of_left -> - map1 + :right_subtype_of_left -> + map1 - nil -> - nil + nil -> + nil + end end end @@ -2731,12 +3071,25 @@ defmodule Module.Types.Descr do defp do_map_union_optimization_strategy(:none, _, status), do: status + # Bail out when either value is a recursive node — calling union/subtype? + # on nodes during BDD construction would unfold the generator and loop. + defp do_map_union_optimization_strategy({_key, {id, _, _}, _iterator}, _pos2, _status) + when is_reference(id), + do: nil + defp do_map_union_optimization_strategy({key, v1, iterator}, pos2, status) do - with %{^key => v2} <- pos2, - next_status when next_status != nil <- map_union_next_strategy(key, v1, v2, status) do - do_map_union_optimization_strategy(:maps.next(iterator), pos2, next_status) - else - _ -> nil + case pos2 do + %{^key => {id, _, _}} when is_reference(id) -> + # Abort the optimization when a recursive node is encountered + nil + + _ -> + with %{^key => v2} <- pos2, + next_status when next_status != nil <- map_union_next_strategy(key, v1, v2, status) do + do_map_union_optimization_strategy(:maps.next(iterator), pos2, next_status) + else + _ -> nil + end end end @@ -2820,35 +3173,62 @@ defmodule Module.Types.Descr do bdd_intersection(bdd1, bdd2) end - defp map_leaf_intersection(bdd_leaf(tag1, fields1), bdd_leaf(tag2, fields2)) do - try do - {tag, fields} = map_literal_intersection(tag1, fields1, tag2, fields2) - bdd_leaf(tag, fields) - catch - :empty -> :bdd_bot + defp map_leaf_intersection(bdd_leaf(tag1, fields1) = l1, bdd_leaf(tag2, fields2) = l2) do + has_node = + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, fields1) or + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, fields2) or + (is_map(tag1) and + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, tag1)) or + (is_map(tag2) and + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, tag2)) + + if has_node do + bdd_intersection(l1, l2) + else + try do + {tag, fields} = map_literal_intersection(tag1, fields1, tag2, fields2) + bdd_leaf(tag, fields) + catch + :empty -> :bdd_bot + end end end + # When bdd_leaf_intersection recurses with a BDD tree (from previous has_node fallback) + defp map_leaf_intersection(bdd1, bdd2), do: bdd_intersection(bdd1, bdd2) + # Optimizations on single maps. defp map_difference(bdd_leaf(tag, fields) = map1, bdd_leaf(neg_tag, neg_fields) = map2) do - # Case 1: we are removing an open map with one field. Just do the difference of that field. - if neg_tag == :open and map_size(neg_fields) == 1 do - [{key, value}] = Map.to_list(neg_fields) - t_diff = difference(Map.get(fields, key, map_key_tag_to_type(tag)), value) + has_node = + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, fields) or + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, neg_fields) or + (is_map(tag) and + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, tag)) or + (is_map(neg_tag) and + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, neg_tag)) + + if has_node do + bdd_difference(map1, map2) + else + # Case 1: we are removing an open map with one field. Just do the difference of that field. + if neg_tag == :open and map_size(neg_fields) == 1 do + [{key, value}] = Map.to_list(neg_fields) + t_diff = difference(Map.get(fields, key, map_key_tag_to_type(tag)), value) - if empty?(t_diff) do - :bdd_bot + if empty?(t_diff) do + :bdd_bot + else + bdd_leaf(tag, Map.put(fields, key, t_diff)) + end else - bdd_leaf(tag, Map.put(fields, key, t_diff)) - end - else - # Case 2: the maps have all but one key in common. Do the difference of that key. - case map_all_but_one(tag, fields, neg_tag, neg_fields) do - {diff_key, type1, type2} -> - bdd_leaf(tag, Map.replace!(fields, diff_key, difference(type1, type2))) + # Case 2: the maps have all but one key in common. Do the difference of that key. + case map_all_but_one(tag, fields, neg_tag, neg_fields) do + {diff_key, type1, type2} -> + bdd_leaf(tag, Map.replace!(fields, diff_key, difference(type1, type2))) - _ -> - bdd_difference(map1, map2, &map_leaf_disjoint?/2) + _ -> + bdd_difference(map1, map2, &map_leaf_disjoint?/2) + end end end end @@ -2857,7 +3237,15 @@ defmodule Module.Types.Descr do do: bdd_difference(bdd1, bdd2, &map_leaf_disjoint?/2) defp map_leaf_disjoint?(bdd_leaf(_tag1, fields1), bdd_leaf(_tag2, fields2)) do - disjoint_structs?(fields1, fields2) + has_node = + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, fields1) or + :maps.fold(fn _k, v, acc -> acc or collect_generators(v) != [] end, false, fields2) + + if has_node do + false + else + disjoint_structs?(fields1, fields2) + end end defp disjoint_structs?(%{__struct__: %{atom: atom} = d1}, %{__struct__: d2}) @@ -2873,21 +3261,25 @@ defmodule Module.Types.Descr do defp disjoint_structs?(_, _), do: false # Intersects two map literals; throws if their intersection is empty. + + defp map_literal_intersection(tag1, fields1, tag2, fields2), + do: map_literal_intersection(tag1, fields1, tag2, fields2, %{}) + # Both open: the result is open. - defp map_literal_intersection(:open, map1, :open, map2) do + defp map_literal_intersection(:open, map1, :open, map2, seen) do new_fields = symmetrical_merge(map1, map2, fn _, type1, type2 -> - non_empty_intersection!(type1, type2) + non_empty_intersection!(type1, type2, seen) end) {:open, new_fields} end # Both closed: the result is closed. - defp map_literal_intersection(:closed, map1, :closed, map2) do + defp map_literal_intersection(:closed, map1, :closed, map2, seen) do new_fields = symmetrical_intersection(map1, map2, fn _, type1, type2 -> - non_empty_intersection!(type1, type2) + non_empty_intersection!(type1, type2, seen) end) if map_size(new_fields) < map_size(map1) or map_size(new_fields) < map_size(map2) do @@ -2898,16 +3290,16 @@ defmodule Module.Types.Descr do end # Open and closed: result is closed, all fields from open should be in closed, except not_set ones. - defp map_literal_intersection(:open, open, :closed, closed) do - :maps.iterator(open) |> :maps.next() |> map_literal_intersection_loop(closed) + defp map_literal_intersection(:open, open, :closed, closed, seen) do + :maps.iterator(open) |> :maps.next() |> map_literal_intersection_loop(closed, seen) end - defp map_literal_intersection(:closed, closed, :open, open) do - :maps.iterator(open) |> :maps.next() |> map_literal_intersection_loop(closed) + defp map_literal_intersection(:closed, closed, :open, open, seen) do + :maps.iterator(open) |> :maps.next() |> map_literal_intersection_loop(closed, seen) end # At least one tag is a tag-domain pair. - defp map_literal_intersection(tag_or_domains1, map1, tag_or_domains2, map2) do + defp map_literal_intersection(tag_or_domains1, map1, tag_or_domains2, map2, seen) do # For a closed map with domains intersected with an open map with domains: # 1. The result is closed (more restrictive) # 2. We need to check each domain in the open map against the closed map @@ -2915,7 +3307,7 @@ defmodule Module.Types.Descr do default2 = map_key_tag_to_type(tag_or_domains2) # Compute the new domain - tag_or_domains = map_domain_intersection(tag_or_domains1, tag_or_domains2) + tag_or_domains = map_domain_intersection(tag_or_domains1, tag_or_domains2, seen) # Go over all fields in map1 and map2 with default atom types atom1 and atom2 # 1. If key is in both maps, compute non empty intersection (:error if it is none) @@ -2925,17 +3317,17 @@ defmodule Module.Types.Descr do # using default values when a key is not present. {tag_or_domains, symmetrical_merge(map1, default1, map2, default2, fn _key, v1, v2 -> - non_empty_intersection!(v1, v2) + non_empty_intersection!(v1, v2, seen) end)} end # Compute the intersection of two tags or tag-domain pairs. - defp map_domain_intersection(:closed, _), do: :closed - defp map_domain_intersection(_, :closed), do: :closed - defp map_domain_intersection(:open, tag_or_domains), do: tag_or_domains - defp map_domain_intersection(tag_or_domains, :open), do: tag_or_domains + defp map_domain_intersection(:closed, _, _seen), do: :closed + defp map_domain_intersection(_, :closed, _seen), do: :closed + defp map_domain_intersection(:open, tag_or_domains, _seen), do: tag_or_domains + defp map_domain_intersection(tag_or_domains, :open, _seen), do: tag_or_domains - defp map_domain_intersection(domains1 = %{}, domains2 = %{}) do + defp map_domain_intersection(domains1 = %{}, domains2 = %{}, seen) do new_domains = for {domain_key, type1} <- domains1, reduce: %{} do acc_domains -> @@ -2943,7 +3335,7 @@ defmodule Module.Types.Descr do %{^domain_key => type2} -> inter = intersection(type1, type2) - if empty_or_optional?(inter) do + if empty_seen?(remove_optional(inter), seen) do acc_domains else Map.put(acc_domains, domain_key, inter) @@ -2958,18 +3350,18 @@ defmodule Module.Types.Descr do if map_size(new_domains) == 0, do: :closed, else: new_domains end - defp map_literal_intersection_loop(:none, acc), do: {:closed, acc} + defp map_literal_intersection_loop(:none, acc, _seen), do: {:closed, acc} - defp map_literal_intersection_loop({key, type1, iterator}, acc) do + defp map_literal_intersection_loop({key, type1, iterator}, acc, seen) do case acc do %{^key => type2} -> - acc = %{acc | key => non_empty_intersection!(type1, type2)} - :maps.next(iterator) |> map_literal_intersection_loop(acc) + acc = %{acc | key => non_empty_intersection!(type1, type2, seen)} + :maps.next(iterator) |> map_literal_intersection_loop(acc, seen) _ -> # If the key is optional in the open map, we can ignore it case type1 do - %{optional: 1} -> :maps.next(iterator) |> map_literal_intersection_loop(acc) + %{optional: 1} -> :maps.next(iterator) |> map_literal_intersection_loop(acc, seen) _ -> throw(:empty) end end @@ -2980,6 +3372,11 @@ defmodule Module.Types.Descr do if empty?(type), do: throw(:empty), else: type end + defp non_empty_intersection!(type1, type2, seen) do + type = intersection(type1, type2) + if empty_seen?(type, seen), do: throw(:empty), else: type + end + # Takes all the lines from the root to the leaves finishing with a 1, # and compile into tuples of positive and negative nodes. Positive nodes are # those followed by a left path, negative nodes are those followed by a right path. @@ -2991,7 +3388,7 @@ defmodule Module.Types.Descr do acc {tag, fields} -> - if init_map_line_empty?(tag, fields, negs) do + if init_map_line_empty?(tag, fields, negs, %{}) do acc else [{tag, fields, negs} | acc] @@ -3894,10 +4291,10 @@ defmodule Module.Types.Descr do {[], [], nil, to_domain_keys(key_descr), []} end - defp non_empty_map_literals_intersection(maps) do + defp non_empty_map_literals_intersection(maps, seen \\ %{}) do try do Enum.reduce(maps, {:open, %{}}, fn {next_tag, next_fields}, {tag, fields} -> - map_literal_intersection(tag, fields, next_tag, next_fields) + map_literal_intersection(tag, fields, next_tag, next_fields, seen) end) catch :empty -> :empty @@ -3907,10 +4304,11 @@ defmodule Module.Types.Descr do # Short-circuits if it finds a non-empty map literal in the union. # Since the algorithm is recursive, we implement the short-circuiting # as throw/catch. - defp map_empty?(bdd) do + defp map_empty?(bdd, seen) do bdd_to_dnf(bdd) |> Enum.all?(fn {pos, negs} -> - case non_empty_map_literals_intersection(pos) do + # We pass seen down to the literal intersection because it may need to check emptiness of elements, which may contain recursive nodes. + case non_empty_map_literals_intersection(pos, seen) do :empty -> true @@ -3918,25 +4316,29 @@ defmodule Module.Types.Descr do # We check the emptiness of the fields because non_empty_map_literal_intersection # will not return :empty on fields that are set to none() and that exist # just in one map, but not the other. - init_map_line_empty?(tag, fields, negs) + init_map_line_empty?(tag, fields, negs, seen) end end) end - defp init_map_line_empty?(tag, fields, negs) do - Enum.any?(Map.to_list(fields), fn {_key, type} -> empty?(type) end) or - map_line_empty?(tag, fields, negs) + defp init_map_line_empty?(tag, fields, negs, seen) do + Enum.any?(Map.to_list(fields), fn {_key, type} -> empty_seen?(type, seen) end) or + map_line_empty?(tag, fields, negs, seen) end # These positives get checked once when calling init_map_line_empty?, and then every time # an intersection or difference is computed, its emptiness is checked again. # So they are all necessarily non-empty. - defp map_line_empty?(_, _pos, []), do: false - defp map_line_empty?(_, _, [{:open, neg_fields} | _]) when neg_fields == %{}, do: true - defp map_line_empty?(:open, fs, [{:closed, _} | negs]), do: map_line_empty?(:open, fs, negs) + defp map_line_empty?(_, _pos, [], _seen), do: false + defp map_line_empty?(_, _, [{:open, neg_fields} | _], _seen) when neg_fields == %{}, do: true + + defp map_line_empty?(:open, fs, [{:closed, _} | negs], seen), + do: map_line_empty?(:open, fs, negs, seen) + + defp map_line_empty?(tag, fields, all_negs, seen) when all_negs != [] do + [{neg_tag, neg_fields} | negs] = all_negs - defp map_line_empty?(tag, fields, [{neg_tag, neg_fields} | negs]) do - if map_check_domain_keys(tag, neg_tag) do + if map_check_domain_keys(tag, neg_tag, seen) do atom_default = map_key_tag_to_type(tag) neg_atom_default = map_key_tag_to_type(neg_tag) @@ -3955,18 +4357,24 @@ defmodule Module.Types.Descr do # There may be value in common tag == :open -> diff = difference(term_or_optional(), neg_type) - empty?(diff) or map_line_empty?(tag, Map.put(fields, neg_key, diff), negs) + + empty_seen?(diff, seen) or + map_line_empty?(tag, Map.put(fields, neg_key, diff), negs, seen) true -> diff = difference(atom_default, neg_type) - empty?(diff) or map_line_empty?(tag, Map.put(fields, neg_key, diff), negs) + + empty_seen?(diff, seen) or + map_line_empty?(tag, Map.put(fields, neg_key, diff), negs, seen) end end) and Enum.all?(Map.to_list(fields), fn {key, type} -> case neg_fields do %{^key => neg_type} -> diff = difference(type, neg_type) - empty?(diff) or map_line_empty?(tag, Map.put(fields, key, diff), negs) + + empty_seen?(diff, seen) or + map_line_empty?(tag, Map.put(fields, key, diff), negs, seen) %{} -> cond do @@ -3981,38 +4389,44 @@ defmodule Module.Types.Descr do true -> # an absent key in a open negative map can be ignored diff = difference(type, neg_atom_default) - empty?(diff) or map_line_empty?(tag, Map.put(fields, key, diff), negs) + + empty_seen?(diff, seen) or + map_line_empty?(tag, Map.put(fields, key, diff), negs, seen) end end end) else - map_line_empty?(tag, fields, negs) + map_line_empty?(tag, fields, tl(all_negs), seen) end catch - :closed -> map_line_empty?(tag, fields, negs) + :closed -> map_line_empty?(tag, fields, tl(all_negs), seen) end # Verify the domain condition from equation (22) in paper ICFP'23 https://www.irif.fr/~gc/papers/icfp23.pdf # which is that every domain key type in the positive map is a subtype # of the corresponding domain key type in the negative map. - defp map_check_domain_keys(:closed, _), do: true - defp map_check_domain_keys(_, :open), do: true + defp map_check_domain_keys(:closed, _, _seen), do: true + defp map_check_domain_keys(_, :open, _seen), do: true # An open map is a subtype iff the negative domains are all present as term_or_optional() - defp map_check_domain_keys(:open, neg_domains) do + defp map_check_domain_keys(:open, neg_domains, seen) do map_size(neg_domains) == length(@domain_key_types) and - Enum.all?(neg_domains, fn {_domain_key, type} -> subtype?(term_or_optional(), type) end) + Enum.all?(neg_domains, fn {_domain_key, type} -> + subtype_seen?(term_or_optional(), type, seen) + end) end # A positive domains is smaller than a closed map iff all its keys are empty or optional - defp map_check_domain_keys(pos_domains, :closed) do - Enum.all?(pos_domains, fn {_domain_key, type} -> empty_or_optional?(type) end) + defp map_check_domain_keys(pos_domains, :closed, seen) do + Enum.all?(pos_domains, fn {_domain_key, type} -> + empty_seen?(remove_optional(type), seen) + end) end # Component-wise comparison of domains - defp map_check_domain_keys(pos_domains, neg_domains) do + defp map_check_domain_keys(pos_domains, neg_domains, seen) do Enum.all?(pos_domains, fn {domain_key, type} -> - subtype?(type, Map.get(neg_domains, domain_key, not_set())) + subtype_seen?(type, Map.get(neg_domains, domain_key, not_set()), seen) end) end @@ -4310,29 +4724,50 @@ defmodule Module.Types.Descr do tuple_descr(rest, [:term | acc], dynamic?) end - defp tuple_descr([value | rest], acc, dynamic?) do - # Check if the static part is empty - static_empty? = - case value do - # Has dynamic component, check static separately - %{dynamic: _} -> false - _ -> empty?(value) - end + # Recursive type nodes (3-tuples with a reference id) are stored as-is + # without stepping. Stepping would call the generator, which calls tuple() + # again, producing fresh nodes ad infinitum. This is safe because dynamic() + # never appears inside recursive type definitions, so skipping dynamic + # detection here loses nothing. Emptiness of nodes is handled lazily by + # the memoized empty?/2 algorithm when the node is eventually unfolded. + defp tuple_descr([{id, _state, _gen} = node | rest], acc, dynamic?) when is_reference(id) do + tuple_descr(rest, [node | acc], dynamic?) + end - if static_empty? do - :empty + defp tuple_descr([value | rest], acc, dynamic?) do + value_descr = to_descr(value) + + # Skip emptiness check for values containing recursive nodes to avoid + # infinite unfolding during construction. Like the bare node clause above, + # dynamic() never appears inside recursive type definitions, so skipping + # dynamic detection loses nothing. Node emptiness is handled lazily by + # the memoized empty_seen? algorithm. + if collect_generators(value_descr) != [] do + tuple_descr(rest, [value | acc], dynamic?) else - case :maps.take(:dynamic, value) do - :error -> - tuple_descr(rest, [value | acc], dynamic?) + # Check if the static part is empty + static_empty? = + case value_descr do + # Has dynamic component, check static separately + %{dynamic: _} -> false + _ -> empty?(value_descr) + end - {dynamic, _static} -> - # Check if dynamic component is empty - if empty?(dynamic) do - :empty - else - tuple_descr(rest, [dynamic | acc], true) - end + if static_empty? do + :empty + else + case :maps.take(:dynamic, value_descr) do + :error -> + tuple_descr(rest, [value | acc], dynamic?) + + {dynamic, _static} -> + # Check if dynamic component is empty + if empty?(dynamic) do + :empty + else + tuple_descr(rest, [dynamic | acc], true) + end + end end end end @@ -4343,25 +4778,42 @@ defmodule Module.Types.Descr do defp tuple_new(tag, elements), do: bdd_leaf(tag, elements) + # defp tuple_intersection(bdd1, bdd2), do: bdd_intersection(bdd1, bdd2) + + # X = {X, int} | nil + # fn recur -> {:tuple, [recur.(X), int()]} + # {:tuple, [recur.(X), int()]} + defp tuple_intersection(bdd1, bdd2) do bdd_intersection(bdd1, bdd2, &tuple_leaf_intersection/2) end - defp tuple_leaf_intersection(bdd_leaf(tag1, elements1), bdd_leaf(tag2, elements2)) do - case tuple_literal_intersection(tag1, elements1, tag2, elements2) do - {tag, elements} -> bdd_leaf(tag, elements) - :empty -> :bdd_bot + defp tuple_leaf_intersection(bdd_leaf(tag1, elems1) = l1, bdd_leaf(tag2, elems2) = l2) do + has_node = + Enum.any?(elems1, &(collect_generators(&1) != [])) or + Enum.any?(elems2, &(collect_generators(&1) != [])) + + if has_node do + bdd_intersection(l1, l2) + else + case tuple_literal_intersection(tag1, elems1, tag2, elems2) do + {tag, elements} -> bdd_leaf(tag, elements) + :empty -> :bdd_bot + end end end - defp tuple_literal_intersection(:open, [], tag, elements), do: {tag, elements} + defp tuple_literal_intersection(tag1, elems1, tag2, elems2), + do: tuple_literal_intersection(tag1, elems1, tag2, elems2, %{}) + + defp tuple_literal_intersection(:open, [], tag, elements, _seen), do: {tag, elements} - defp tuple_literal_intersection(tag1, elements1, tag2, elements2) do + defp tuple_literal_intersection(tag1, elements1, tag2, elements2, seen) do if mismatched_tuple_sizes?(tag1, elements1, tag2, elements2) do :empty else try do - zip_non_empty_intersection!(elements1, elements2, []) + zip_non_empty_intersection!(elements1, elements2, [], seen) catch :empty -> :empty else @@ -4384,11 +4836,16 @@ defmodule Module.Types.Descr do do: false # Intersects two lists of types, and _appends_ the extra elements to the result. - defp zip_non_empty_intersection!([], types2, acc), do: Enum.reverse(acc, types2) - defp zip_non_empty_intersection!(types1, [], acc), do: Enum.reverse(acc, types1) - - defp zip_non_empty_intersection!([type1 | rest1], [type2 | rest2], acc) do - zip_non_empty_intersection!(rest1, rest2, [non_empty_intersection!(type1, type2) | acc]) + defp zip_non_empty_intersection!([], types2, acc, _seen), do: Enum.reverse(acc, types2) + defp zip_non_empty_intersection!(types1, [], acc, _seen), do: Enum.reverse(acc, types1) + + defp zip_non_empty_intersection!([type1 | rest1], [type2 | rest2], acc, seen) do + zip_non_empty_intersection!( + rest1, + rest2, + [non_empty_intersection!(type1, type2, seen) | acc], + seen + ) end defp zip_empty_intersection?([], _types2), do: false @@ -4405,8 +4862,16 @@ defmodule Module.Types.Descr do do: bdd_difference(bdd1, bdd2, &tuple_leaf_disjoint?/2) defp tuple_leaf_disjoint?(bdd_leaf(tag1, elements1), bdd_leaf(tag2, elements2)) do - mismatched_tuple_sizes?(tag1, elements1, tag2, elements2) or - disjoint_tagged_tuples?(elements1, elements2) + has_node = + Enum.any?(elements1, &(collect_generators(&1) != [])) or + Enum.any?(elements2, &(collect_generators(&1) != [])) + + if has_node do + false + else + mismatched_tuple_sizes?(tag1, elements1, tag2, elements2) or + disjoint_tagged_tuples?(elements1, elements2) + end end # A very cheap check for tagged tuples @@ -4422,10 +4887,10 @@ defmodule Module.Types.Descr do defp disjoint_atom_descr?(atom1, %{atom: atom2}), do: atom_intersection(atom1, atom2) === 0 defp disjoint_atom_descr?(_atom, %{}), do: true - defp non_empty_tuple_literals_intersection(tuples) do + defp non_empty_tuple_literals_intersection(tuples, seen \\ %{}) do try do Enum.reduce(tuples, {:open, []}, fn {next_tag, next_elements}, {tag, elements} -> - case tuple_literal_intersection(tag, elements, next_tag, next_elements) do + case tuple_literal_intersection(tag, elements, next_tag, next_elements, seen) do :empty -> throw(:empty) next -> next end @@ -4435,27 +4900,30 @@ defmodule Module.Types.Descr do end end - defp tuple_empty?(bdd) do + defp tuple_empty?(bdd, seen) do bdd_to_dnf(bdd) |> Enum.all?(fn {pos, negs} -> - case non_empty_tuple_literals_intersection(pos) do + # We pass seen down to the literal intersection because it may need to check emptiness of elements, which may contain recursive nodes. + case non_empty_tuple_literals_intersection(pos, seen) do :empty -> true - {tag, fields} -> tuple_line_empty?(tag, fields, negs) + {tag, fields} -> tuple_line_empty?(tag, fields, negs, seen) end end) end - # No negations, so not empty unless there's an empty type - # Note: since the extraction from the BDD is done in a way that guarantees that - # the elements are non-empty, we can avoid checking for empty types there. - # Otherwise, tuple_empty?(_, elements, []) would be Enum.any?(elements, &empty?/1) - defp tuple_line_empty?(_, _, []), do: false + # No negations: elements may contain nodes (from single positive leaf), + # so we must check if any element is empty using cycle-aware emptiness. + # This handles recursive types like X = {X} correctly. + defp tuple_line_empty?(_, elements, [], seen), + do: Enum.any?(elements, &empty_seen?(&1, seen)) + # Open empty negation makes it empty - defp tuple_line_empty?(_, _, [{:open, []} | _]), do: true + defp tuple_line_empty?(_, _, [{:open, []} | _], _seen), do: true # Open positive can't be emptied by a single closed negative - defp tuple_line_empty?(:open, _pos, [{:closed, _}]), do: false + defp tuple_line_empty?(:open, _pos, [{:closed, _}], _seen), do: false - defp tuple_line_empty?(tag, elements, [{neg_tag, neg_elements} | negs]) do + defp tuple_line_empty?(tag, elements, negs, seen) do + [{neg_tag, neg_elements} | negs] = negs n = length(elements) m = length(neg_elements) @@ -4463,37 +4931,40 @@ defmodule Module.Types.Descr do # 1. When removing larger tuples from a fixed-size positive tuple # 2. When removing smaller tuples from larger tuples if (tag == :closed and n < m) or (neg_tag == :closed and n > m) do - tuple_line_empty?(tag, elements, negs) + tuple_line_empty?(tag, elements, negs, seen) else - tuple_elements_empty?([], tag, elements, neg_elements, negs) and - tuple_empty_arity?(n, m, tag, elements, neg_tag, negs) + tuple_elements_empty?([], tag, elements, neg_elements, negs, seen) and + tuple_empty_arity?(n, m, tag, elements, neg_tag, negs, seen) end end # Recursively check elements for emptiness - defp tuple_elements_empty?(_, _, _, [], _), do: true + defp tuple_elements_empty?(_, _, _, [], _, _seen), do: true - defp tuple_elements_empty?(acc_meet, tag, elements, [neg_type | neg_elements], negs) do + defp tuple_elements_empty?(acc_meet, tag, elements, [neg_type | neg_elements], negs, seen) do # Handles the case where {tag, elements} is an open tuple, like {:open, []} {ty, elements} = List.pop_at(elements, 0, term()) + diff = difference(ty, neg_type) meet = intersection(ty, neg_type) # In this case, there is no intersection between the positive and this negative. # So we should just "go next" - (empty?(diff) or tuple_line_empty?(tag, Enum.reverse(acc_meet, [diff | elements]), negs)) and - (empty?(meet) or tuple_elements_empty?([meet | acc_meet], tag, elements, neg_elements, negs)) + (empty_seen?(diff, seen) or + tuple_line_empty?(tag, Enum.reverse(acc_meet, [diff | elements]), negs, seen)) and + (empty_seen?(meet, seen) or + tuple_elements_empty?([meet | acc_meet], tag, elements, neg_elements, negs, seen)) end # Determines if the set difference is empty when: # - Positive tuple: {tag, elements} of size n # - Negative tuple: open or closed tuples of size m - defp tuple_empty_arity?(n, m, tag, elements, neg_tag, negs) do + defp tuple_empty_arity?(n, m, tag, elements, neg_tag, negs, seen) do # The tuples to consider are all those of size n to m - 1, and if the negative tuple is # closed, we also need to consider tuples of size greater than m + 1. tag == :closed or - (Enum.all?(n..(m - 1)//1, &tuple_line_empty?(:closed, tuple_fill(elements, &1), negs)) and - (neg_tag == :open or tuple_line_empty?(:open, tuple_fill(elements, m + 1), negs))) + (Enum.all?(n..(m - 1)//1, &tuple_line_empty?(:closed, tuple_fill(elements, &1), negs, seen)) and + (neg_tag == :open or tuple_line_empty?(:open, tuple_fill(elements, m + 1), negs, seen))) end defp tuple_eliminate_negations(tag, elements, negs) do @@ -4618,9 +5089,17 @@ defmodule Module.Types.Descr do bdd_leaf(tag1, elements1) = tuple1, bdd_leaf(tag2, elements2) = tuple2 ) do - case maybe_optimize_tuple_union({tag1, elements1}, {tag2, elements2}) do - {tag, elements} -> bdd_leaf(tag, elements) - nil -> bdd_union(tuple1, tuple2) + has_node = + Enum.any?(elements1, &(collect_generators(&1) != [])) or + Enum.any?(elements2, &(collect_generators(&1) != [])) + + if has_node do + bdd_union(tuple1, tuple2) + else + case maybe_optimize_tuple_union({tag1, elements1}, {tag2, elements2}) do + {tag, elements} -> bdd_leaf(tag, elements) + nil -> bdd_union(tuple1, tuple2) + end end end @@ -4678,6 +5157,16 @@ defmodule Module.Types.Descr do when status in [:all_equal, :left_subtype_of_right], do: :left_subtype_of_right + # Bail out when either element is a recursive node — calling union/subtype? + # on nodes during BDD construction would unfold the generator and loop. + defp do_tuple_union_optimization_strategy(_tag1, [{id, _, _} | _], _tag2, _pos2, _i, _status) + when is_reference(id), + do: nil + + defp do_tuple_union_optimization_strategy(_tag1, _pos1, _tag2, [{id, _, _} | _], _i, _status) + when is_reference(id), + do: nil + defp do_tuple_union_optimization_strategy(tag1, [v1 | pos1], tag2, [v2 | pos2], i, status) do if next_status = tuple_union_next_strategy(i, v1, v2, status) do do_tuple_union_optimization_strategy(tag1, pos1, tag2, pos2, i + 1, next_status) @@ -4721,18 +5210,19 @@ defmodule Module.Types.Descr do # Transforms a bdd into a union of tuples with no negations. # Note: it is important to compose the results with tuple_dnf_union/2 to avoid duplicates - defp tuple_bdd_to_dnf(bdd) do + defp tuple_bdd_to_dnf(bdd, seen \\ %{}) do bdd_to_dnf(bdd) |> Enum.reduce([], fn {positive_tuples, negative_tuples}, acc -> - case non_empty_tuple_literals_intersection(positive_tuples) do + case non_empty_tuple_literals_intersection(positive_tuples, seen) do :empty -> acc {tag, elements} -> - if tuple_line_empty?(tag, elements, negative_tuples) do + if tuple_line_empty?(tag, elements, negative_tuples, seen) do acc else - tuple_eliminate_negations(tag, elements, negative_tuples) |> tuple_dnf_union(acc) + tuple_eliminate_negations(tag, elements, negative_tuples) + |> tuple_dnf_union(acc) end end end) @@ -4868,7 +5358,7 @@ defmodule Module.Types.Descr do defp non_empty_tuple_only?(descr) do case :maps.take(:tuple, descr) do :error -> false - {tuple_bdd, rest} -> empty?(rest) and not tuple_empty?(tuple_bdd) + {tuple_bdd, rest} -> empty?(rest) and not tuple_empty?(tuple_bdd, %{}) end end @@ -4885,6 +5375,8 @@ defmodule Module.Types.Descr do |> Enum.reduce(none(), fn {tag, elements}, acc -> Enum.at(elements, index, tuple_tag_to_type(tag)) |> union(acc) end) + # As we work on nodes (the elements of a tuple), we ensure the result is a descr. + |> to_descr() end @doc """ @@ -5702,7 +6194,7 @@ defmodule Module.Types.Descr do defp iterator_non_disjoint_intersection?({key, v1, iterator}, map) do with %{^key => v2} <- map, value when value not in @empty_intersection <- intersection(key, v1, v2), - false <- empty_key?(key, value) do + false <- empty_key?(key, value, %{}) do true else _ -> iterator_non_disjoint_intersection?(:maps.next(iterator), map) diff --git a/lib/elixir/test/elixir/module/types/descr_test.exs b/lib/elixir/test/elixir/module/types/descr_test.exs index 6e3099f94db..7797e69e697 100644 --- a/lib/elixir/test/elixir/module/types/descr_test.exs +++ b/lib/elixir/test/elixir/module/types/descr_test.exs @@ -3265,4 +3265,841 @@ defmodule Module.Types.DescrTest do assert subtype?(range, Enum.reduce(entries, &union/2)) end end + + describe "recursive types" do + # Helper: recursive linked list type. Returns a descr. + # linked_list(T) = {T, linked_list(T)} | nil + defp linked_list(descr) do + recursive(%{ + X: fn recur -> tuple([descr, recur.(:X)]) |> union(atom([nil])) end + }) + |> Map.fetch!(:X) + |> to_descr() + end + + defp int_linked_list(), do: linked_list(integer()) + defp number_linked_list(), do: linked_list(union(integer(), float())) + + # Helper: recursive binary tree type. Returns a descr. + # binary_tree(T) = {T, binary_tree(T), binary_tree(T)} | nil + defp binary_tree(descr) do + recursive(%{ + T: fn recur -> + tuple([descr, recur.(:T), recur.(:T)]) + |> union(atom([nil])) + end + }) + |> Map.fetch!(:T) + |> to_descr() + end + + # Helper: recursive Macro.t() approximation. Returns a descr. + # ast = {ast | atom, keyword, atom | [ast]} | {ast, ast} | [ast] | atom | number | binary + defp macro_ast() do + recursive(%{ + Ast: fn recur -> + node = recur.(:Ast) + + call = + tuple([union(node, atom()), list(tuple([atom(), term()])), union(atom(), list(node))]) + + pair = tuple([node, node]) + + list(node) + |> union(call) + |> union(pair) + |> union(atom()) + |> union(number()) + |> union(binary()) + end + }) + |> Map.fetch!(:Ast) + |> to_descr() + end + + # Helper: recursive Inspect.Algebra.t(). Returns a descr. + defp inspect_doc() do + recursive(%{ + Doc: fn recur -> + child = recur.(:Doc) + + binary() + |> union(empty_list()) + |> union(atom([:doc_line])) + |> union(non_empty_list(child, child)) + |> union(tuple([atom([:doc_string]), binary(), integer()])) + |> union(tuple([atom([:doc_limit]), child, integer() |> union(atom([:infinity]))])) + |> union( + tuple([ + atom([:doc_nest]), + child, + integer() |> union(atom([:cursor, :reset])), + atom([:always, :break]) + ]) + ) + |> union(tuple([atom([:doc_break]), binary(), atom([:flex, :strict])])) + |> union( + tuple([ + atom([:doc_group]), + child, + atom([:normal, :optimistic, :pessimistic, :inherit]) + ]) + ) + |> union(tuple([atom([:doc_fits]), child, atom([:enabled, :disabled])])) + |> union(tuple([atom([:doc_force]), child])) + |> union(tuple([atom([:doc_collapse]), integer()])) + |> union(tuple([atom([:doc_color]), child, term()])) + end + }) + |> Map.fetch!(:Doc) + |> to_descr() + end + + # Helper: recursive iodata() approximation. Returns a descr. + # iodata = binary | [] | [byte | binary | iodata | ...] + defp iodata_descr() do + recursive(%{ + Iodata: fn recur -> + node = recur.(:Iodata) + head = integer() |> union(binary()) |> union(node) + + binary() + |> union(empty_list()) + |> union(non_empty_list(head, node)) + end + }) + |> Map.fetch!(:Iodata) + |> to_descr() + end + + test "node infrastructure" do + # stepping a node applies the generator to the state + descr = integer() + node = make_node(%{}, fn _recur -> descr end) + assert to_descr(node) == descr + + # to_descr is identity on :term + assert to_descr(:term) == :term + + # to_descr is identity on a descr map + d = union(integer(), atom()) + assert to_descr(d) == d + + # wrapping a descr map produces a node whose step returns the original + d = integer() + node = to_node(d) + assert to_descr(node) == d + + # wrapping :term produces a node whose step returns :term + node = to_node(:term) + assert to_descr(node) == :term + + # to_node on an existing node returns it unchanged + node = make_node(%{}, fn _ -> integer() end) + assert to_node(node) === node + + # wrapping and stepping round-trips for complex descrs + d = union(tuple([integer(), atom()]), list(float())) + assert to_descr(to_node(d)) == d + end + + test "constructors accept nodes and descrs" do + # tuple/1 accepts nodes and produces a non-empty tuple descr + n1 = to_node(integer()) + n2 = to_node(atom()) + + result = tuple([n1, n2]) + refute empty?(result) + + # tuple/1 accepts a mix of nodes and descrs + n = to_node(integer()) + result = tuple([n, float()]) + assert equal?(result, tuple([integer(), float()])) + + # list/1 accepts a node element type + n = to_node(integer()) + from_node = list(n) + from_descr = list(integer()) + assert equal?(from_node, from_descr) + + # non_empty_list/2 accepts nodes for both element and tail + elem_node = to_node(integer()) + tail_node = to_node(empty_list()) + + from_nodes = non_empty_list(elem_node, tail_node) + from_descrs = non_empty_list(integer(), empty_list()) + assert equal?(from_nodes, from_descrs) + + # closed_map accepts node values + n = to_node(integer()) + result = closed_map(a: n) + refute empty?(result) + + # open_map accepts node values + n = to_node(atom()) + result = open_map(b: n) + refute empty?(result) + end + + test "real-world types" do + ## integer linked list: X = {integer(), X} | nil + int_list = int_linked_list() + assert is_map_key(int_list, :tuple) and is_map_key(int_list, :atom) + + # {integer(), nil} <: X + assert subtype?(tuple([integer(), atom([nil])]), int_list) + # nil <: X + assert subtype?(atom([nil]), int_list) + # {integer(), {integer(), nil}} <: X + assert subtype?(tuple([integer(), tuple([integer(), atom([nil])])]), int_list) + # {integer(), integer()} json_value} + %{Json: json_value} = + recursive(%{ + Json: fn recur -> + scalar = + atom([nil]) + |> union(boolean()) + |> union(integer()) + |> union(float()) + |> union(binary()) + + array = list(recur.(:Json)) + object = open_map([{to_domain_keys(binary()), recur.(:Json)}]) + + scalar + |> union(array) + |> union(object) + end + }) + + assert subtype?(atom([nil]), json_value) + assert subtype?(list(binary()), json_value) + assert subtype?(empty_map(), json_value) + refute subtype?(pid(), json_value) + + ## IO.chardata recursive type + %{Char: chardata} = + recursive(%{ + Char: fn recur -> + head = integer() |> union(binary()) + tail = recur.(:Char) + + binary() + |> union(empty_list()) + |> union(non_empty_list(head, tail)) + end + }) + + assert subtype?(binary(), chardata) + assert subtype?(empty_list(), chardata) + assert subtype?(non_empty_list(integer(), empty_list()), chardata) + assert subtype?(non_empty_list(binary(), binary()), chardata) + refute subtype?(pid(), chardata) + + ## Inspect.Algebra.t recursive type + %{Doc: doc} = + recursive(%{ + Doc: fn recur -> + child = recur.(:Doc) + + binary() + |> union(empty_list()) + |> union(atom([:doc_line])) + |> union(non_empty_list(child, child)) + |> union(tuple([atom([:doc_string]), binary(), integer()])) + |> union(tuple([atom([:doc_limit]), child, integer() |> union(atom([:infinity]))])) + |> union( + tuple([ + atom([:doc_nest]), + child, + integer() |> union(atom([:cursor, :reset])), + atom([:always, :break]) + ]) + ) + |> union(tuple([atom([:doc_break]), binary(), atom([:flex, :strict])])) + |> union( + tuple([ + atom([:doc_group]), + child, + atom([:normal, :optimistic, :pessimistic, :inherit]) + ]) + ) + |> union(tuple([atom([:doc_fits]), child, atom([:enabled, :disabled])])) + |> union(tuple([atom([:doc_force]), child])) + |> union(tuple([atom([:doc_collapse]), integer()])) + |> union(tuple([atom([:doc_color]), child, term()])) + end + }) + + assert subtype?(binary(), doc) + assert subtype?(atom([:doc_line]), doc) + assert subtype?(tuple([atom([:doc_force]), binary()]), doc) + assert subtype?(non_empty_list(binary(), binary()), doc) + refute subtype?(pid(), doc) + + ## expression trees + # Expr = integer() | {atom, Expr, Expr}, Binop = {atom, Expr, Expr} + %{Expr: nexpr, Binop: nbinop} = + recursive(%{ + Expr: fn recur -> + union( + integer(), + tuple([ + atom(), + recur.(:Expr), + recur.(:Expr) + ]) + ) + end, + Binop: fn recur -> + tuple([atom(), recur.(:Expr), recur.(:Expr)]) + end + }) + + texpr = to_descr(nexpr) + tbinop = to_descr(nbinop) + + refute empty?(texpr) + refute empty?(tbinop) + + # 42 is an Expr + assert subtype?(integer(), texpr) + + # {:+, 1, 2} is an Expr and a Binop + assert subtype?(tuple([atom(), integer(), integer()]), texpr) + assert subtype?(tuple([atom(), integer(), integer()]), tbinop) + + # {:*, {:+, 1, 2}, 3} is an Expr + inner = tuple([atom(), integer(), integer()]) + assert subtype?(tuple([atom(), inner, integer()]), texpr) + end + + # test "recursive function types" do + # ast = macro_ast() + # doc = inspect_doc() + # iodata = iodata_descr() + # width = integer() |> union(atom([:infinity])) + # ast_transform = fun([ast], ast) + # ast_transform_acc = fun([ast, term()], tuple([ast, term()])) + + # # Macro.prewalk/2 and Macro.postwalk/2: + # # (Macro.t(), (Macro.t() -> Macro.t()) -> Macro.t()) + # prewalk_type = fun([ast, ast_transform], ast) + # assert subtype?(fun([integer(), fun([integer()], integer())], integer()), prewalk_type) + # refute subtype?(fun([integer(), fun([integer()], atom())], integer()), prewalk_type) + # assert fun_apply(prewalk_type, [tuple([integer(), integer()]), ast_transform]) == {:ok, ast} + + # # Macro.traverse/4: + # # (Macro.t(), acc, (Macro.t(), acc -> {Macro.t(), acc}), (Macro.t(), acc -> {Macro.t(), acc}) + # # -> {Macro.t(), acc}) + # traverse_type = fun([ast, term(), ast_transform_acc, ast_transform_acc], tuple([ast, term()])) + + # assert subtype?( + # fun( + # [integer(), atom(), fun([integer(), atom()], tuple([integer(), atom()])), + # fun([integer(), atom()], tuple([integer(), atom()]))], + # tuple([integer(), atom()]) + # ), + # traverse_type + # ) + + # refute subtype?( + # fun( + # [integer(), atom(), fun([integer(), atom()], tuple([atom(), atom()])), + # fun([integer(), atom()], tuple([integer(), atom()]))], + # tuple([integer(), atom()]) + # ), + # traverse_type + # ) + + # assert fun_apply(traverse_type, [integer(), atom(), ast_transform_acc, ast_transform_acc]) == + # {:ok, tuple([ast, term()])} + + # # Inspect.Algebra.format/2: (Inspect.Algebra.t(), non_neg_integer() | :infinity -> iodata()) + # format_type = fun([doc, width], iodata) + # assert subtype?(fun([binary(), integer()], binary()), format_type) + # assert subtype?(fun([tuple([atom([:doc_force]), binary()]), atom([:infinity])], iodata), format_type) + # refute subtype?(fun([integer(), integer()], binary()), format_type) + + # # Inspect.Opts.default_inspect_fun/0: + # # (() -> (term(), Inspect.Opts.t() -> Inspect.Algebra.t())) + # default_inspect_fun_type = fun([], fun([term(), term()], doc)) + # assert subtype?(fun([], fun([term(), term()], binary())), default_inspect_fun_type) + # refute subtype?(fun([], fun([term(), term()], integer())), default_inspect_fun_type) + # assert fun_apply(default_inspect_fun_type, []) == {:ok, fun([term(), term()], doc)} + # end + + test "emptiness" do + # X = {X, X} is empty (no base case) + %{X: nx} = recursive(%{X: fn recur -> tuple([recur.(:X), recur.(:X)]) end}) + + assert empty?(nx) + + # X = {integer()} | {X, X} has base case, not empty + %{X: nx} = + recursive(%{ + X: fn recur -> union(tuple([integer()]), tuple([recur.(:X), recur.(:X)])) end + }) + + refute empty?(nx) + + ## mutual recursion + # X = {int,Y} | nil, Y = {bool,X} | nil: both not empty + %{X: node_x, Y: node_y} = + recursive(%{ + X: fn recur -> tuple([integer(), recur.(:Y)]) |> union(atom([nil])) end, + Y: fn recur -> tuple([boolean(), recur.(:X)]) |> union(atom([nil])) end + }) + + tx = to_descr(node_x) + ty = to_descr(node_y) + + refute empty?(tx) + refute empty?(ty) + + # {int, {bool, nil}} <: X + inner_y = tuple([boolean(), atom([nil])]) + assert subtype?(tuple([integer(), inner_y]), node_x) + + # {bool, {int, nil}} <: Y + inner_x = tuple([integer(), atom([nil])]) + assert subtype?(tuple([boolean(), inner_x]), node_y) + + # {int, {bool, {int, nil}}} <: X + level3 = tuple([integer(), atom([nil])]) + level2 = tuple([boolean(), level3]) + assert subtype?(tuple([integer(), level2]), node_x) + + # X = {Y}|nil, Y = {X}: X not empty, Y not empty (Y can hold {nil}) + %{X: nx, Y: ny} = + recursive(%{ + X: fn recur -> tuple([recur.(:Y)]) |> union(atom([nil])) end, + Y: fn recur -> tuple([recur.(:X)]) end + }) + + refute empty?(nx) + refute empty?(ny) + + ## cycle detection + cases = [ + # X -> {Y}, Y -> {Z}, Z -> {X} + {"3-cycle no base", true, + %{ + X: fn recur -> tuple([recur.(:Y)]) end, + Y: fn recur -> tuple([recur.(:Z)]) end, + Z: fn recur -> tuple([recur.(:X)]) end + }}, + # X -> {Y}, Y -> {Z}, Z -> {X} or nil + {"3-cycle with base", false, + %{ + X: fn recur -> tuple([recur.(:Y)]) end, + Y: fn recur -> tuple([recur.(:Z)]) end, + Z: fn recur -> tuple([recur.(:X)]) |> union(atom([nil])) end + }} + ] + + for {desc, expected_empty, gen} <- cases do + nodes = recursive(gen) + + for {_key, node} <- nodes do + if expected_empty do + assert empty?(node), "#{desc}: expected empty but wasn't" + else + refute empty?(node), "#{desc}: expected not empty but was" + end + end + end + + ## diamond patterns + # X->{Y,Z}, Y->{W}, Z->{W}, W->{X} + # X + # / \ + # Y Z + # \ / + # W -> X (cycle) + # no base case: all empty + %{X: nx, Y: ny, Z: nz, W: nw} = + recursive(%{ + X: fn recur -> tuple([recur.(:Y), recur.(:Z)]) end, + Y: fn recur -> tuple([recur.(:W)]) end, + Z: fn recur -> tuple([recur.(:W)]) end, + W: fn recur -> tuple([recur.(:X)]) end + }) + + assert empty?(nx) + assert empty?(ny) + assert empty?(nz) + assert empty?(nw) + + # same with base case on W: none empty + %{X: nx, Y: ny, Z: nz, W: nw} = + recursive(%{ + X: fn recur -> tuple([recur.(:Y), recur.(:Z)]) end, + Y: fn recur -> tuple([recur.(:W)]) end, + Z: fn recur -> tuple([recur.(:W)]) end, + W: fn recur -> tuple([recur.(:X)]) |> union(atom([nil])) end + }) + + refute empty?(nx) + refute empty?(ny) + refute empty?(nz) + refute empty?(nw) + + ## binary trees + # Tree = {atom, Tree, Tree} | nil: not empty + refute empty?(binary_tree(atom())) + + ## list-head recursion with base case is not empty + # X = non_empty_list(X, []) | non_empty_list(integer(), []) + %{X: nx} = + recursive(%{ + X: fn recur -> + non_empty_list(recur.(:X), empty_list()) + |> union(non_empty_list(integer(), empty_list())) + end + }) + + refute empty?(nx) + end + + test "regular expressions" do + # bool_list = {boolean(), bool_list} | nil + # int_or_bool_list = {integer() | boolean(), int_or_bool_list} | nil + # int_then_bool_list = {integer(), int_then_bool_list} | bool_list + bool_list = linked_list(boolean()) + int_or_bool_list = linked_list(union(integer(), boolean())) + + int_then_bool_list = + recursive(%{ + IntThenBoolList: fn recur -> + tuple([integer(), recur.(:IntThenBoolList)]) |> union(bool_list) + end + }) + |> Map.fetch!(:IntThenBoolList) + |> to_descr() + + refute empty?(int_then_bool_list) + refute empty?(bool_list) + refute empty?(int_or_bool_list) + assert subtype?(bool_list, int_or_bool_list) + assert subtype?(int_then_bool_list, int_or_bool_list) + end + + test "subtyping" do + ## recursion on list tail + # X = non_empty_list(integer(), X) | [] + %{X: nx} = + recursive(%{ + X: fn recur -> non_empty_list(integer(), recur.(:X)) |> union(empty_list()) end + }) + + # [] <: X + assert subtype?(empty_list(), nx) + + # non_empty_list(integer()) <: X (terminates with []) + assert subtype?(non_empty_list(integer()), nx) + + ## binary trees + int_bin_tree = binary_tree(integer()) + + # nil is a valid tree + assert subtype?(atom([nil]), int_bin_tree) + + # {42, nil, nil} is a valid tree + assert subtype?(tuple([integer(), atom([nil]), atom([nil])]), int_bin_tree) + + # {1, {2, nil, nil}, {3, nil, nil}} is a valid tree + leaf = tuple([integer(), atom([nil]), atom([nil])]) + assert subtype?(tuple([integer(), leaf, leaf]), int_bin_tree) + + # wrong arity: {1, nil} not a subtype + refute subtype?(tuple([integer(), atom([nil])]), int_bin_tree) + + # wrong tag type: {float, nil, nil} not a subtype + refute subtype?(tuple([float(), atom([nil]), atom([nil])]), int_bin_tree) + + # X = non_empty_list(integer(), X) | non_empty_list(integer()) | [] + # Y = list(integer()) + gen = %{ + X: fn recur -> + non_empty_list(integer(), recur.(:X)) + |> union(non_empty_list(integer(), empty_list())) + |> union(empty_list()) + end + } + + tx = recursive(gen)[:X] |> to_descr() + assert equal?(tx, list(integer())) + + # X = %{outer: %{inner: X}} | nil + # Y = %{outer: %{inner: Y}} | %{outer: %{inner: integer}} | nil + %{X: nx, Y: ny} = + recursive(%{ + X: fn recur -> + closed_map(outer: closed_map(inner: recur.(:X))) |> union(atom([nil])) + end, + Y: fn recur -> + closed_map(outer: closed_map(inner: recur.(:Y))) + |> union(closed_map(outer: closed_map(inner: integer()))) + |> union(atom([nil])) + end + }) + + assert subtype?(nx, ny), "nested map wrapping map with recursive field" + + # X = non_empty_list(non_empty_list(X, empty_list()), empty_list()) | nil + # Y = non_empty_list(non_empty_list(Y, empty_list()), empty_list()) + # | non_empty_list(non_empty_list(integer(), empty_list()), empty_list()) | nil + %{X: nx, Y: ny} = + recursive(%{ + X: fn recur -> + non_empty_list( + non_empty_list(recur.(:X), empty_list()), + empty_list() + ) + |> union(atom([nil])) + end, + Y: fn recur -> + non_empty_list( + non_empty_list(recur.(:Y), empty_list()), + empty_list() + ) + |> union(non_empty_list(non_empty_list(integer(), empty_list()), empty_list())) + |> union(atom([nil])) + end + }) + + assert subtype?(nx, ny), "nested list wrapping list with recursive element" + + ## nested tuple wrapping tuple with recursive element + # X = {{X}} | nil, Y = {{Y}} | {{integer()}} | nil + %{X: nx, Y: ny} = + recursive(%{ + X: fn recur -> tuple([tuple([recur.(:X)])]) |> union(atom([nil])) end, + Y: fn recur -> + tuple([tuple([recur.(:Y)])]) + |> union(tuple([tuple([integer()])])) + |> union(atom([nil])) + end + }) + + assert subtype?(nx, ny) + + ## map linked list: BoolChain <: IntOrBoolChain + # BoolChain = %{val: boolean(), next: BoolChain} | %{} + # IntOrBoolChain = %{val: integer() | boolean(), next: IntOrBoolChain} | %{} + %{BoolChain: bc, IntOrBoolChain: iobc} = + recursive(%{ + BoolChain: fn recur -> + closed_map(val: boolean(), next: recur.(:BoolChain)) |> union(empty_map()) + end, + IntOrBoolChain: fn recur -> + closed_map( + val: union(integer(), boolean()), + next: recur.(:IntOrBoolChain) + ) + |> union(empty_map()) + end + }) + + assert subtype?(bc, iobc) + end + + test "set operations accept nodes" do + # union works on nodes and descrs + n = to_node(integer()) + assert equal?(union(n, float()), union(integer(), float())) + assert equal?(union(atom(), n), union(atom(), integer())) + + # intersection of a node and a descr + n = to_node(union(integer(), atom())) + assert equal?(intersection(n, integer()), integer()) + assert equal?(intersection(atom(), n), atom()) + + # difference of a node and a descr + n = to_node(union(integer(), float())) + assert equal?(difference(n, float()), integer()) + assert equal?(difference(union(integer(), float()), to_node(float())), integer()) + end + + test "set operations on nodes built with constructors" do + ## map intersection inside nodes + # X = %{a: %{a: X}} | %{a: %{a: atom()}} + # Y = %{a: %{a: X}} | %{a: %{a: atom()}} + %{X: nx, Y: ny} = + recursive(%{ + X: fn recur -> + closed_map(a: closed_map(a: recur.(:X))) + |> union(closed_map(a: closed_map(a: atom()))) + end, + Y: fn recur -> + closed_map(a: closed_map(a: recur.(:Y))) + |> union(closed_map(a: closed_map(a: atom()))) + end + }) + + refute empty?(intersection(nx, ny)) + assert equal?(union(nx, ny), nx) + + # This example loops if we do not make a deep search for recursive nodes when disabling + # map union optimizations. + # X = %{a: open_map(), b: open_map()} | nil + # Y = %{a: %{inner: Y}, b: %{inner: Y}} + # | %{a: %{inner: atom()}, b: %{inner: atom()}} + %{X: nx, Y: ny} = + recursive(%{ + X: fn _recur -> closed_map(a: open_map(), b: open_map()) |> union(atom([nil])) end, + Y: fn recur -> + closed_map( + a: closed_map(inner: recur.(:Y)), + b: closed_map(inner: recur.(:Y)) + ) + |> union(closed_map(a: closed_map(inner: atom()), b: closed_map(inner: atom()))) + end + }) + + assert equal?(union(nx, ny), nx) + + # X = {{X}} | {{atom()}} + # Y = {{Y}} | {{atom()}} + %{X: nx, Y: ny} = + recursive(%{ + X: fn recur -> tuple([tuple([recur.(:X)])]) |> union(tuple([tuple([atom()])])) end, + Y: fn recur -> tuple([tuple([recur.(:Y)])]) |> union(tuple([tuple([atom()])])) end + }) + + refute empty?(intersection(nx, ny)) + assert equal?(union(nx, ny), nx) + + # X = %{integer() => X} | %{integer() => atom()} + # Y = %{integer() => Y} | %{integer() => atom()} + %{X: nx, Y: ny} = + recursive(%{ + X: fn recur -> + closed_map([{[:integer], recur.(:X)}]) |> union(closed_map([{[:integer], atom()}])) + end, + Y: fn recur -> + closed_map([{[:integer], recur.(:Y)}]) |> union(closed_map([{[:integer], atom()}])) + end + }) + + refute empty?(intersection(nx, ny)) + assert equal?(union(nx, ny), nx) + + ## three recursive maps + # A = %{x: A} | %{x: atom()}, B = %{x: B} | %{x: integer()}, C = %{x: C} | %{x: term()} + # intersection(A, intersection(B, C)) should be empty (atom ∩ integer = ∅) + %{A: na, B: nb, C: nc} = + recursive(%{ + A: fn recur -> closed_map(x: recur.(:A)) |> union(closed_map(x: atom())) end, + B: fn recur -> closed_map(x: recur.(:B)) |> union(closed_map(x: integer())) end, + C: fn recur -> closed_map(x: recur.(:C)) |> union(closed_map(x: term())) end + }) + + assert empty?(intersection(na, intersection(nb, nc))) + assert subtype?(na, nc) + refute subtype?(nc, na) + + ## union then intersection of recursive maps + # X = %{a: X} | %{a: atom()}, Y = %{a: Y} | %{a: atom()} + # intersection(union(X, Y), X) should be non-empty + gen = %{ + X: fn recur -> closed_map(a: recur.(:X)) |> union(closed_map(a: atom())) end, + Y: fn recur -> closed_map(a: recur.(:Y)) |> union(closed_map(a: atom())) end + } + + %{X: nx, Y: ny} = recursive(gen) + + refute empty?(intersection(union(nx, ny), nx)) + assert equal?(intersection(union(nx, ny), nx), nx) + end + + test "type operators" do + ## tuple operators on descr with recursive node element + # X = {integer(), X} | {integer(), atom()} + %{X: nx} = + recursive(%{ + X: fn recur -> tuple([integer(), recur.(:X)]) |> union(tuple([integer(), atom()])) end + }) + + t = to_descr(nx) + + # tuple_fetch + assert {false, type} = tuple_fetch(t, 0) + assert equal?(type, integer()) + assert {false, _type} = tuple_fetch(t, 1) + + # tuple_values + result = tuple_values(t) + assert subtype?(integer(), result) + + # tuple_delete_at + result = tuple_delete_at(t, 0) + assert {false, _type} = tuple_fetch(result, 0) + + # tuple_insert_at + result = tuple_insert_at(t, 0, boolean()) + assert {false, type} = tuple_fetch(result, 0) + assert equal?(type, boolean()) + + # tuple fetch + # X = {X} | {atom()} + %{X: nx} = + recursive(%{X: fn recur -> tuple([recur.(:X)]) |> union(tuple([atom()])) end}) + + tx = to_descr(nx) + t = difference(tx, tuple([atom()])) + assert {false, type} = tuple_fetch(t, 0) + assert equal?(type, tx) + + ## map_fetch_key on descr with recursive node value + # X = %{a: integer(), b: X} | %{a: integer(), b: atom()} + %{X: nx} = + recursive(%{ + X: fn recur -> + closed_map(a: integer(), b: recur.(:X)) |> union(closed_map(a: integer(), b: atom())) + end + }) + + assert {false, type} = map_fetch_key(to_descr(nx), :a) + assert equal?(type, integer()) + assert {false, _type} = map_fetch_key(to_descr(nx), :b) + + ## list_hd and list_tl on descr with recursive node tail + # X = non_empty_list(integer(), X) | non_empty_list(integer(), []) + %{X: nx} = + recursive(%{ + X: fn recur -> + non_empty_list(integer(), recur.(:X)) + |> union(non_empty_list(integer(), empty_list())) + end + }) + + assert {:ok, type} = list_hd(to_descr(nx)) + assert equal?(type, integer()) + assert {:ok, _type} = list_tl(to_descr(nx)) + + ## list_hd on descr with recursive node as head element + # X = non_empty_list(X, []) | non_empty_list(atom(), []) + %{X: nx} = + recursive(%{ + X: fn recur -> + non_empty_list(recur.(:X), empty_list()) + |> union(non_empty_list(atom(), empty_list())) + end + }) + + assert {:ok, _type} = list_hd(to_descr(nx)) + end + end end diff --git a/recursive_types.md b/recursive_types.md new file mode 100644 index 00000000000..1555f0c3d5e --- /dev/null +++ b/recursive_types.md @@ -0,0 +1,441 @@ +# Recursive Types via Type Nodes in Elixir + +### A coinductive representation using closures and maps + +--- + +**Abstract.** +We describe a technique for representing recursive types in Elixir by introducing a *type node* structure. A type node pairs a *state*—a map recording one stepping step for every recursion variable—with a *generator function* that, given a state, produces a type descriptor (*descr*). This coinductive encoding naturally extends to mutually recursive definitions, keeps each node self-contained, and integrates smoothly with set-theoretic operations such as union. + +--- + +## Table of Contents + +1. [Background: Type Descriptors](#1-background-type-descriptors) +2. [Type Nodes](#2-type-nodes) +3. [The Role of Nodes and Descriptors](#3-the-role-of-nodes-and-descriptors) +4. [Constructing Recursive Types](#4-constructing-recursive-types) +5. [Example: Simple Recursive List](#5-example-simple-recursive-list) +6. [Example: Mutually Recursive Types](#6-example-mutually-recursive-types) +7. [Using Nodes in Set Operations](#7-using-nodes-in-set-operations) +8. [Node Identity and Memoization](#8-node-identity-and-memoization) +9. [Summary](#9-summary) +10. [Implementation Plan](#10-implementation-plan) + +--- + +## 1. Background: Type Descriptors + +A *type descriptor* (or simply *descr*) is an Elixir map whose keys are type-kind tags and whose values describe the inhabitants of that kind: + +``` +descr = %{atom => atom_kind, tuple => tuple_kind, ...} +``` + +For example, the type "integer or `nil`" is represented as: + +``` +%{integer => true, atom => [:nil]} +``` + +Here `integer => true` means "integers are present" (as a whole) and `atom => [:nil]` means "the atoms present are the set {`nil`}". + +Descriptors support set-theoretic operations (union, intersection, negation) defined pointwise on each kind. We assume these operations are already implemented and focus on the extension to *recursive* types. + +## 2. Type Nodes + +**Definition 2.1 (Type node).** A *type node* is a triple + +``` +node = (id, state, f : state → descr) +``` + +where: + +- **id** is a unique identity (an Elixir `reference()`, see [Section 8](#8-node-identity-and-memoization)); +- **state** is a map `%{X => f_X, Y => f_Y, ...}` that records, for every recursion variable, its *generator function*; and +- **f** is one such generator function, producing a `descr` when applied to a state. + +In Elixir syntax a node is simply a three-element tuple: + +```elixir +{id, state, f} +``` + +**Definition 2.2 (stepping).** To *step* a node n_X = (id_X, state, f_X), we apply the generator to the state: + +``` +step(n_X) = f_X(state) +``` + +This yields a descriptor in which every occurrence of a recursion variable Y has been replaced by a fresh, self-contained node (id_Y, state, f_Y). stepping can therefore be iterated as many times as needed. + +## 3. The Role of Nodes and Descriptors + +It is important to distinguish two layers and the operations that belong to each. + +**Descriptors are types.** A `descr` is the canonical representation of a type. All set-theoretic operations—union, intersection, negation, difference—are defined on descriptors and return descriptors: + +``` +union : descr × descr → descr +inter : descr × descr → descr +etc. +``` + +**Nodes are used by type constructors.** Type constructors—`tuple`, `map`, `list`, etc.—build composite types and take *nodes* as their arguments. For instance: + +- `tuple([n_1, ..., n_k])` builds the tuple type {n_1, ..., n_k}; +- `list(n)` builds the list type whose elements have type n; +- `map([{k_1, n_1}, ...])` builds a map type with the given key–value associations. + +By accepting nodes rather than descriptors, constructors can refer to types that have not yet been fully expanded—which is exactly what is needed for recursive definitions. + +**Descriptors pass as nodes.** Since a descriptor d can always be lifted to a node via `fresh_node(d) = (make_ref(), %{}, fn _ -> d end)`, descriptors can be used wherever a node is expected. In practice the implementation can accept both representations transparently. + +**Constructors ≠ set operations.** Constructors and set operations live at different levels: + +| | **Inputs** | **Output** | +|----------------------------------------------|-------------------|------------| +| Type constructors (`tuple`, `list`, ...) | node, ..., node | descr | +| Set operations (`union`, `inter`, ...) | descr, descr | descr | + +Constructors consume nodes because they need to embed possibly-recursive sub-types into a descriptor. Set operations, on the other hand, combine already-formed descriptors pointwise by kind. + +## 4. Constructing Recursive Types + +The construction proceeds in four steps. + +### Step 1 — Read the equations and build generator functions + +For each recursive equation X = τ_X, we build a function f_X = `fn state -> d_X` where d_X is the body of the equation, translated into a descriptor, with every reference to a recursion variable Y replaced by the node expression `(make_ref(), state, state.Y)`. + +### Step 2 — Collect all generators + +This yields a family of generator functions f_X, f_Y, ..., one per equation. + +### Step 3 — Build the state + +Assemble the state map: + +``` +state = %{X => f_X, Y => f_Y, ...} +``` + +The state captures *all the ways to perform one stepping step* for each variable. + +### Step 4 — Build the type nodes + +Each recursive type is now represented as a node (with a fresh id): + +``` +n_X = (id_X, state, f_X) +n_Y = (id_Y, state, f_Y) +... +``` + +## 5. Example: Simple Recursive List + +Consider the recursive type "list of integers": + +``` +X = {int, X} | nil +``` + +**Generator.** + +``` +f_X = fn state -> + %{tuple => [:int, (make_ref(), state, state.X)], + atom => [:nil]} +end +``` + +**State.** + +``` +state = %{X => f_X} +``` + +**Node.** + +``` +n_X = (id_X, state, f_X) +``` + +In Elixir: + +```elixir +fX = fn state -> + %{tuple: [:int, {make_ref(), state, state[:X]}], + atom: [:nil]} +end + +state = %{X: fX} +nX = {make_ref(), state, fX} +``` + +stepping once: + +```elixir +step(nX) +``` + +returns the descriptor: + +``` +%{tuple => [:int, (id', state, f_X)], + atom => [:nil]} +``` + +The inner `(id', state, f_X)` is itself a self-contained node that can be steped again at will. + +## 6. Example: Mutually Recursive Types + +``` +Y = {bool, X} | nil +X = {int, Y} | nil +``` + +**Step 1 — Generators.** + +``` +f_Y = fn state -> + %{tuple => [:bool, (make_ref(), state, state.X)], + atom => [:nil]} +end + +f_X = fn state -> + %{tuple => [:int, (make_ref(), state, state.Y)], + atom => [:nil]} +end +``` + +**Step 3 — State.** + +``` +state = %{Y => f_Y, X => f_X} +``` + +**Step 4 — Nodes.** + +``` +n_Y = (id_Y, state, f_Y) +n_X = (id_X, state, f_X) +``` + +In Elixir: + +```elixir +fY = fn st -> + %{tuple: [:bool, {make_ref(), st, st[:X]}], + atom: [:nil]} +end + +fX = fn st -> + %{tuple: [:int, {make_ref(), st, st[:Y]}], + atom: [:nil]} +end + +state = %{Y: fY, X: fX} + +nY = {make_ref(), state, fY} +nX = {make_ref(), state, fX} +``` + +## 7. Using Nodes in Set Operations + +Set-theoretic operations on descriptors (union, intersection, ...) are extended to nodes by first stepping each operand to obtain a descriptor and then computing the operation on descriptors. + +### 7.1 Union of Two Nodes + +Given two nodes n_1 = (id_1, state_1, f_1) and n_2 = (id_2, state_2, f_2), the union is computed as follows: + +1. **step** both nodes: d_1 = f_1(state_1), d_2 = f_2(state_2). +2. **Compute** the pointwise union of the two descriptors: d = union_descr(d_1, d_2). +3. **Wrap** the result into a self-contained node using `fresh_node`: union_node(n_1, n_2) = fresh_node(union_descr(d_1, d_2)). + +> **Remark (Self-containedness).** Each node produced by stepping is *self-contained*: the triple (id, state, f_Y) already carries enough information to be steped further without any external context. An alternative design would thread the state through every set operation, but self-containedness simplifies the interface: a node is a black box that can be steped on demand. + +### 7.2 Conversion Functions + +Two functions convert between nodes and descriptors: + +```elixir +# node → descr +def step({_id, state, f}), do: f.(state) + +# descr → node +def fresh_node(d) do + {make_ref(), %{}, fn _state -> d end} +end +``` + +`step` takes a node and produces a descriptor by applying the generator to its state. `fresh_node` takes a descriptor and wraps it into a trivial node with a fresh id, an empty state, and a constant generator. + +### 7.3 Elixir Sketch + +```elixir +def union_node(n1, n2) do + d1 = step(n1) + d2 = step(n2) + fresh_node(union_descr(d1, d2)) +end +``` + +## 8. Node Identity and Memoization + +So far a node is a pair (state, f). In practice we augment it with a unique *identity* tag, making a node a triple: + +``` +node = (id, state, f : state → descr) +``` + +In Elixir the identity is a fresh `reference()` obtained by calling `make_ref()`: + +```elixir +def make_node(state, f) do + {make_ref(), state, f} +end +``` + +References are unique values that support constant-time equality comparison. Two nodes created at different call sites will always have distinct identities, even if they happen to carry the same state and generator. + +**Why an identity?** The emptiness decision algorithm for set-theoretic types [1] works by exploring nodes and memoizing which ones have already been visited. Without an identity, the algorithm would need structural comparison of closures—which is impossible in general. With a reference-based identity, the algorithm can record visited nodes in a `MapSet` keyed by their id, guaranteeing termination of the recursive exploration in the presence of cyclic types. + +```elixir +def empty?(node, seen \\ MapSet.new()) +def empty?({id, _state, _f} = node, seen) do + if MapSet.member?(seen, id) do + true + else + seen = MapSet.put(seen, id) + descr = step(node) + empty_descr?(descr, seen) + end +end +``` + +The `seen` set acts as the memoization table: when a node is encountered a second time (detected via its id), the algorithm can conclude without re-exploring it—following the coinductive interpretation that a type variable encountered again is not empty (it contributes no new inhabitants but does not block either). + +**Updated conversion functions.** With the triple representation, the conversion functions become: + +```elixir +# node → descr +def step({_id, state, f}), do: f.(state) + +# descr → node +def fresh_node(d) do + {make_ref(), %{}, fn _state -> d end} +end +``` + +## 9. Summary + +| **Concept** | **Elixir representation** | +|--------------------------|-----------------------------------| +| Type descriptor (descr) | `%{atom: ..., tuple: ..., ...}` | +| Generator function (f_X) | `fn state -> descr end` | +| State (state) | `%{X: fX, Y: fY, ...}` | +| Type node (node) | `{id, state, fX}` | +| Node identity | `make_ref()` | +| stepping | `fX.(state)` | + +The key insight is that every recursive type is encoded as a pair of (i) a shared state recording one stepping step per variable and (ii) the particular generator for the variable of interest. This representation is *coinductive*: we never build an infinite object; instead, we lazily produce one layer of the type on demand by calling `f(state)`. + +--- + +## 10. Implementation Plan + +The following is an ordered list of implementation blocks. Within each block, all items must be implemented together (they are mutually dependent). Blocks themselves are listed in dependency order: a block can only be started once all blocks it depends on are complete. + +### Block A — Core node module *(no dependencies)* + +This block introduces the node data structure and the basic conversion functions. It touches no existing code. + +- **A.1** Define the node triple `{id, state, f}`. +- **A.2** Implement `make_node(state, f)` which calls `make_ref()` to produce a fresh id. +- **A.3** Implement step : node → descr, i.e. `step({_id, state, f}) = f.(state)`. +- **A.4** Implement fresh_node : descr → node, i.e. `fresh_node(d) = {make_ref(), %{}, fn _ -> d end}`. +- **A.5** Implement a coercion helper `to_node(x)` that returns `x` unchanged if it is already a node, or calls `fresh_node(x)` if it is a descr. This allows callers to pass either representation transparently. + +### Block B — Kind internals: descr → node *(depends on A)* + +This is the largest and most invasive block. Every type kind (`tuple_kind`, `list_kind`, `map_kind`, ...) currently stores nested *descrs* for its sub-types. These must all change to store *nodes* instead. + +- **B.1** Change the internal representation of each kind so that sub-type positions hold nodes rather than descrs. For example, `tuple_kind` for arity k stores k nodes instead of k descrs. +- **B.2** Update every type constructor (`tuple/1`, `list/1`, `map/1`, ...) to accept nodes as arguments (using `to_node` for backward compatibility). +- **B.3** Update all kind-level set operations (union, intersection, negation *per kind*) to handle nodes in sub-type positions. Where these operations need to combine sub-types, they must use the node-level operations from Block C (or operate structurally on nodes without stepping, depending on the algorithm). +- **B.4** Update every pattern match and traversal of kind internals across the codebase to expect nodes where descrs were before. + +> **Remark.** Items B.1–B.4 are tightly coupled: changing the internal representation (B.1) immediately breaks all constructors (B.2), kind-level operations (B.3), and traversals (B.4). They must all be updated in a single atomic change. + +### Block C — Set operations on nodes *(depends on A)* + +These are thin wrappers: step both operands, apply the descr-level operation, wrap the result. + +- **C.1** Implement `union_node(n1, n2)`. +- **C.2** Implement `inter_node(n1, n2)`. +- **C.3** Implement `negate_node(n)`. +- **C.4** Implement `diff_node(n1, n2)`. + +Block C can be implemented in parallel with Block B (both depend only on A), but Block B.3 may call into Block C functions. + +### Block D — Recursive type builder *(depends on A + B)* + +The four-step construction algorithm from [Section 4](#4-constructing-recursive-types). + +- **D.1** Read recursive type equations and produce generator functions f_X = `fn state -> d_X`. +- **D.2** Collect all generators into the state map state = `%{X => f_X, ...}`. + + %{X => f_X} + f_X = fn state -> tuple([integer(), {state, state.X}]) |> union(atom(:nil)) end + +- **D.3** Build each type node via `make_node(state, fX)`. +- **D.4** Integrate with the module/function that translates user-facing type syntax into the internal representation. + +### Block E — Emptiness with memoization *(depends on A + B)* + +- **E.1** Update `empty?/1` to accept nodes (not just descrs). +- **E.2** Add a `seen` parameter (`MapSet` of node ids) threaded through the recursion. +- **E.3** When a node id is already in `seen`, return `true` (coinductive base case). +- **E.4** Update `empty_descr?` so that when it recurses into kind sub-types (which are now nodes), it calls the node-level `empty?` with the `seen` set. + +### Block F — Remaining recursive algorithms *(depends on A + B + E)* + +Every algorithm that recurses into the sub-types of a descriptor must be updated in the same spirit as emptiness. + +- **F.1** Subtyping: thread a memoization set; handle nodes in sub-type positions. +- **F.2** Pretty-printing / `to_string`: step nodes on demand and track visited ids to avoid infinite output. +- **F.3** Any other traversal (e.g. type normalization, materialization) that inspects kind internals. + +### Dependency Graph + +``` + ┌─────────────┐ + │ A: Core nodes│ + └──────┬──────┘ + ┌────┴────┐ + ▼ ▼ + ┌──────────────┐ ┌─────────────────┐ + │B: Kind │ │C: Set ops on │ + │ internals │◄┤ nodes │ + └──┬───┬───┬──┘ │ (B.3 may call C)│ + │ │ │ └─────────────────┘ + ▼ │ ▼ +┌────────┐│┌──────────┐ +│D: Rec. │││E: │ +│ builder│││ Emptiness│ +└────────┘│└────┬─────┘ + │ │ + ▼ ▼ + ┌────────────┐ + │F: Other │ + │ algorithms│ + └────────────┘ +``` + +--- + +### References + +[1] Mickaël Laurent and Kim Nguyen. *Implementing Set-Theoretic Types.* Proceedings of the ACM on Programming Languages, 8(POPL), 2024. \ No newline at end of file