Spec: Precise Schema Reconciler Shape Invalidation
Problem
When SchemaReconciler detects schema divergence between the Inspector cache and the live database, it currently invalidates all shapes on the affected table. This is overly conservative - many schema changes don't affect shapes that don't select the changed columns.
The replication stream already does column-level filtering via Shape.is_affected_by_relation_change?/2, but it only sees name/type changes (what Postgres sends in Relation messages). SchemaReconciler has access to richer column metadata but doesn't use it for precise invalidation.
Goal
Make SchemaReconciler invalidate only shapes that are actually affected by schema changes, using the same column-level filtering logic as the replication stream where possible, plus additional attribute-aware logic.
Current Flow
list_relations_with_stale_cache()
→ returns [{oid, relation}] for tables with ANY column divergence
→ remove_shapes_for_relations(stack_id, relations)
→ ShapeStatus.list_shape_handles_for_relations() -- ALL shapes for table
→ remove_shapes() -- invalidates all of them
Proposed Flow
list_relations_with_column_changes()
→ returns [{oid, relation, changed_columns}]
→ for each {oid, relation, changed_columns}:
→ build synthetic Relation with affected_columns
→ Filter.affected_shapes(filter, synthetic_relation)
→ remove only affected shapes
Column Attribute Change Analysis
For each column attribute, determine whether a change requires shape invalidation:
does_change_require_invalidation?(attribute, old_value, new_value, shape)
# Column name changed - shape selecting old name must be invalidated
def does_change_require_invalidation?(:name, old_name, new_name, shape) do
old_name in shape.selected_columns
end
# Type changed - shape selecting column must be invalidated
def does_change_require_invalidation?(:type_id, old_type, new_type, shape) do
# Type change affects data encoding/interpretation
column_selected?(shape, column_name)
end
# Nullability changed
def does_change_require_invalidation?(:not_null, old_val, new_val, shape) do
# NULL -> NOT NULL: PG requires data migration first, shape log has updates
# NOT NULL -> NULL: No data inconsistency
# Conservative: invalidate if shape selects column
# Optimistic: don't invalidate (data is consistent via replication)
# Decision: Don't invalidate - the replication stream already delivered
# any data changes required for the constraint to be added
false
end
# Generated column status changed
def does_change_require_invalidation?(:is_generated, false, true, shape) do
# Regular -> Generated: values now computed, may differ from cached
column_selected?(shape, column_name)
end
def does_change_require_invalidation?(:is_generated, true, false, shape) do
# Generated -> Regular: similar concern
column_selected?(shape, column_name)
end
# Array dimensions changed
def does_change_require_invalidation?(:array_dimensions, old_dims, new_dims, shape) do
# Structural type change
column_selected?(shape, column_name)
end
# PK position changed - affects record identity
def does_change_require_invalidation?(:pk_position, old_pos, new_pos, _shape) do
# PK changes affect how records are keyed, regardless of selected columns
# Must invalidate all shapes on the table
true
end
# Type kind changed (e.g., enum)
def does_change_require_invalidation?(:type_kind, old_kind, new_kind, shape) do
column_selected?(shape, column_name)
end
Attribute Classification
| Attribute |
Change Requires Invalidation? |
Scope |
name |
Yes |
Shapes selecting old name |
type_id |
Yes |
Shapes selecting column |
type_mod |
Yes |
Shapes selecting column |
array_dimensions |
Yes |
Shapes selecting column |
type_kind |
Yes |
Shapes selecting column |
is_generated |
Yes |
Shapes selecting column |
not_null |
No* |
N/A |
pk_position |
Yes |
ALL shapes on table |
formatted_type |
Derived from above |
- |
*Nullability: The data migration required by Postgres before adding NOT NULL comes through the replication stream. Cached shape data is updated via those change events. The constraint itself doesn't affect how Electric serves data.
Implementation Steps
1. Extend list_relations_with_stale_cache to return column diff
@spec list_relations_with_stale_cache(opts :: term()) ::
{:ok, [%{oid: oid(), relation: relation(), column_changes: [column_change()]}]}
@type column_change :: %{
column_name: String.t(),
change_type: :added | :removed | :modified,
changed_attributes: [attribute_change()]
}
@type attribute_change :: %{
attribute: atom(),
old_value: term(),
new_value: term()
}
2. Add compute_column_diff/2 function
@spec compute_column_diff([column_info()], [column_info()]) :: [column_change()]
def compute_column_diff(cached_columns, live_columns) do
cached_by_name = Map.new(cached_columns, &{&1.name, &1})
live_by_name = Map.new(live_columns, &{&1.name, &1})
cached_names = MapSet.new(Map.keys(cached_by_name))
live_names = MapSet.new(Map.keys(live_by_name))
added = MapSet.difference(live_names, cached_names)
removed = MapSet.difference(cached_names, live_names)
common = MapSet.intersection(cached_names, live_names)
added_changes = Enum.map(added, &%{column_name: &1, change_type: :added, ...})
removed_changes = Enum.map(removed, &%{column_name: &1, change_type: :removed, ...})
modified_changes =
common
|> Enum.filter(fn name -> cached_by_name[name] != live_by_name[name] end)
|> Enum.map(fn name ->
%{
column_name: name,
change_type: :modified,
changed_attributes: diff_attributes(cached_by_name[name], live_by_name[name])
}
end)
added_changes ++ removed_changes ++ modified_changes
end
3. Add shapes_affected_by_column_changes/2
@spec shapes_affected_by_column_changes(Filter.t(), [column_change()]) :: MapSet.t(shape_id())
def shapes_affected_by_column_changes(filter, column_changes) do
# Check for PK changes first - affects all shapes
if any_pk_change?(column_changes) do
Filter.all_shape_ids(filter)
else
affected_columns =
column_changes
|> Enum.filter(&requires_invalidation?/1)
|> Enum.map(& &1.column_name)
# Build synthetic Relation and use existing filtering logic
synthetic_relation = %Relation{
affected_columns: affected_columns,
columns: live_columns # for column count check
}
Filter.affected_shapes(filter, synthetic_relation)
end
end
4. Update SchemaReconciler to use precise invalidation
defp handle_diverged_relations({:ok, diverged}, state) do
for %{oid: oid, relation: rel, column_changes: changes} <- diverged do
affected_shapes = shapes_affected_by_column_changes(state.filter, changes)
if MapSet.size(affected_shapes) > 0 do
Logger.notice("Schema change in #{inspect(rel)}: invalidating #{MapSet.size(affected_shapes)} shapes")
ShapeCleaner.remove_shapes(state.stack_id, MapSet.to_list(affected_shapes))
end
Inspector.clean(oid, state.inspector)
end
end
Edge Cases
- Column renamed to existing name: Detected as remove + add, both shapes affected
- Type change within same type family: Still invalidate (e.g., varchar(50) -> varchar(100))
- Multiple attributes changed: Any invalidating change triggers invalidation
- Shape selects all columns: Affected by any column add/remove
Testing
- Add column not selected by any shape → no invalidation
- Add column, shape selects all → invalidation
- Change nullability → no invalidation
- Change type → invalidation of shapes selecting column
- Change PK → invalidation of all shapes
- Rename column → invalidation of shapes selecting old name
Metrics
Add telemetry for:
schema_reconciler.diverged_tables.count
schema_reconciler.shapes_invalidated.count
schema_reconciler.shapes_spared.count (new)
Open Questions
-
Nullability: Should we be conservative and invalidate anyway? The argument for not invalidating is that data consistency is maintained via replication. The argument for invalidating is defense-in-depth.
-
Default value changes: Not in current column_info. Should we track them?
-
Constraint changes (CHECK, UNIQUE): Not tracked. Do they matter for read-only replication?
-
Filter access: SchemaReconciler doesn't currently have access to the Filter. Need to pass it in or query ShapeStatus.
Spec: Precise Schema Reconciler Shape Invalidation
Problem
When
SchemaReconcilerdetects schema divergence between the Inspector cache and the live database, it currently invalidates all shapes on the affected table. This is overly conservative - many schema changes don't affect shapes that don't select the changed columns.The replication stream already does column-level filtering via
Shape.is_affected_by_relation_change?/2, but it only sees name/type changes (what Postgres sends inRelationmessages). SchemaReconciler has access to richer column metadata but doesn't use it for precise invalidation.Goal
Make SchemaReconciler invalidate only shapes that are actually affected by schema changes, using the same column-level filtering logic as the replication stream where possible, plus additional attribute-aware logic.
Current Flow
Proposed Flow
Column Attribute Change Analysis
For each column attribute, determine whether a change requires shape invalidation:
does_change_require_invalidation?(attribute, old_value, new_value, shape)Attribute Classification
nametype_idtype_modarray_dimensionstype_kindis_generatednot_nullpk_positionformatted_type*Nullability: The data migration required by Postgres before adding NOT NULL comes through the replication stream. Cached shape data is updated via those change events. The constraint itself doesn't affect how Electric serves data.
Implementation Steps
1. Extend
list_relations_with_stale_cacheto return column diff2. Add
compute_column_diff/2function3. Add
shapes_affected_by_column_changes/24. Update SchemaReconciler to use precise invalidation
Edge Cases
Testing
Metrics
Add telemetry for:
schema_reconciler.diverged_tables.countschema_reconciler.shapes_invalidated.countschema_reconciler.shapes_spared.count(new)Open Questions
Nullability: Should we be conservative and invalidate anyway? The argument for not invalidating is that data consistency is maintained via replication. The argument for invalidating is defense-in-depth.
Default value changes: Not in current column_info. Should we track them?
Constraint changes (CHECK, UNIQUE): Not tracked. Do they matter for read-only replication?
Filter access: SchemaReconciler doesn't currently have access to the Filter. Need to pass it in or query ShapeStatus.