Implement Facts & Events CRUD with provenance tracking (DL-3)#54
Conversation
Co-authored-by: spuentesp <112034353+spuentesp@users.noreply.github.com>
| \nTests not detected in this PR. Please add/confirm coverage where applicable. |
Co-authored-by: spuentesp <112034353+spuentesp@users.noreply.github.com>
Co-authored-by: spuentesp <112034353+spuentesp@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR implements comprehensive CRUD operations for Fact and Event nodes in Neo4j with full provenance tracking, as specified in issue DL-3. The implementation adds 8 operations for Facts and 3 for Events, with support for entity relationships, source provenance chains, and temporal/causal event ordering.
Key changes:
- Added Fact and Event schemas with validation and filtering support
- Implemented Neo4j CRUD operations with relationship graph management (SUPPORTED_BY, INVOLVES, REPLACES, AFTER, BEFORE, CAUSES edges)
- Added CanonKeeper authorization enforcement for write operations and open read access
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 15 comments.
| File | Description |
|---|---|
packages/data-layer/src/monitor_data/schemas/facts.py |
Defines FactType enum, Fact/Event schemas (Create/Update/Response/Filter) with provenance fields and temporal properties |
packages/data-layer/src/monitor_data/tools/neo4j_tools.py |
Implements 8 fact operations and 3 event operations with relationship creation, filtering, and validation logic |
packages/data-layer/src/monitor_data/middleware/auth.py |
Updates authorization map to restrict fact/event writes to CanonKeeper, permits read access to all agents |
packages/data-layer/tests/test_tools/test_fact_tools.py |
Adds 22 unit tests covering all CRUD operations, provenance tracking, relationship creation, and error conditions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if params.source_ids: | ||
| source_edge_query = """ | ||
| MATCH (ev:Event {id: $event_id}) | ||
| MATCH (s:Source {id: $source_id}) | ||
| CREATE (ev)-[:SUPPORTED_BY]->(s) | ||
| """ | ||
| for source_id in params.source_ids: | ||
| client.execute_write( | ||
| source_edge_query, | ||
| {"event_id": str(event_id), "source_id": str(source_id)}, | ||
| ) |
There was a problem hiding this comment.
The source_ids provided in params are not validated to ensure the Source nodes exist before creating SUPPORTED_BY edges. If a source_id references a non-existent Source, the MATCH will fail silently and no edge will be created. This could lead to incomplete provenance chains. Consider adding validation similar to the entity_ids check on lines 1069-1078.
| if params.timeline_after: | ||
| for after_id in params.timeline_after: | ||
| after_edge_query = """ | ||
| MATCH (ev1:Event {id: $event_id}) | ||
| MATCH (ev2:Event {id: $after_id}) | ||
| CREATE (ev1)-[:AFTER]->(ev2) | ||
| """ | ||
| client.execute_write( | ||
| after_edge_query, | ||
| {"event_id": str(event_id), "after_id": str(after_id)}, | ||
| ) |
There was a problem hiding this comment.
The timeline_after event IDs are not validated to ensure the referenced Event nodes exist before creating AFTER edges. If an after_id references a non-existent Event, the MATCH will fail silently and no edge will be created. This could lead to incomplete timeline chains. Consider adding validation to check if each referenced event exists.
| properties=f.get("properties"), | ||
| entity_ids=[UUID(eid) for eid in record["entity_ids"] if eid], | ||
| source_ids=[UUID(sid) for sid in record["source_ids"] if sid], | ||
| snippet_ids=[], # Snippets not stored in Neo4j |
There was a problem hiding this comment.
The comment "# Snippets not stored in Neo4j" suggests that snippet_ids are handled differently. However, in the FactCreate schema, snippet_ids is accepted as input but never used in the neo4j_create_fact function. This creates an inconsistency where users can provide snippet_ids but they will be silently ignored, which could be confusing.
Consider either:
- Removing snippet_ids from FactCreate if they're not stored, or
- Adding a comment/validation to clarify this behavior, or
- Storing them if they should be tracked
| snippet_ids: Optional[List[str]] = Field( | ||
| default=None, description="Snippet IDs supporting this fact" | ||
| ) |
There was a problem hiding this comment.
The snippet_ids field is defined in FactCreate schema but it's never used in the neo4j_create_fact function. This creates an API inconsistency where users can specify snippet_ids during creation, but they won't be stored or processed. Either remove this field from the schema or add logic to handle snippet storage.
| snippet_ids: Optional[List[str]] = Field( | |
| default=None, description="Snippet IDs supporting this fact" | |
| ) |
| default=None, description="Source IDs supporting this event" | ||
| ) | ||
|
|
||
| # Timeline ordering (for establishing NEXT, BEFORE, AFTER edges) |
There was a problem hiding this comment.
The comment mentions establishing "NEXT, BEFORE, AFTER edges" but the implementation only creates BEFORE and AFTER edges. The NEXT edge type is never created in the code. Either remove NEXT from this comment or implement NEXT edge creation logic in the neo4j_create_event function.
| # Timeline ordering (for establishing NEXT, BEFORE, AFTER edges) | |
| # Timeline ordering (for establishing BEFORE and AFTER edges) |
| if filters.entity_id: | ||
| query = f""" | ||
| MATCH (f:Fact)-[:INVOLVES]->(e {{id: $entity_id}}) | ||
| {where_clause} |
There was a problem hiding this comment.
When entity_id filter is used along with other filters (universe_id, fact_type, canon_level), the generated Cypher query will have malformed syntax. The where_clause variable contains "WHERE ..." but when entity_id is set, there's already an implicit WHERE condition from the MATCH pattern. This will generate invalid Cypher like "MATCH ... WHERE condition1 WHERE condition2".
The where_clause should be transformed to use "AND" instead of "WHERE" when entity_id is present, or the logic should be restructured to combine all WHERE conditions properly.
| if params.replaces: | ||
| replaces_edge_query = """ | ||
| MATCH (f:Fact {id: $fact_id}) | ||
| MATCH (old:Fact {id: $replaces_id}) | ||
| CREATE (f)-[:REPLACES]->(old) | ||
| SET old.canon_level = 'retconned' | ||
| """ | ||
| client.execute_write( | ||
| replaces_edge_query, | ||
| {"fact_id": str(fact_id), "replaces_id": str(params.replaces)}, | ||
| ) |
There was a problem hiding this comment.
When creating a fact with replaces set, there's no validation to check if the referenced fact exists before creating the REPLACES edge. If params.replaces references a non-existent fact, the MATCH will fail silently and no edge will be created, but the new fact will still be created. This could lead to orphaned references. Consider adding validation similar to the entity_ids check on lines 633-642.
| if params.scene_ids: | ||
| scene_edge_query = """ | ||
| MATCH (f:Fact {id: $fact_id}) | ||
| MATCH (sc:Scene {id: $scene_id}) | ||
| CREATE (f)-[:SUPPORTED_BY]->(sc) | ||
| """ | ||
| for scene_id in params.scene_ids: | ||
| client.execute_write( | ||
| scene_edge_query, | ||
| {"fact_id": str(fact_id), "scene_id": str(scene_id)}, | ||
| ) |
There was a problem hiding this comment.
The scene_ids provided in params are not validated to ensure the Scene nodes exist before creating SUPPORTED_BY edges. If a scene_id references a non-existent Scene, the MATCH will fail silently and no edge will be created. This could lead to incomplete provenance chains. Consider adding validation similar to the entity_ids check on lines 633-642.
| EVENT = "event" # "PC took 5 damage" | ||
|
|
||
|
|
There was a problem hiding this comment.
The FactType enum includes an 'EVENT' value, which creates confusion with the separate Event entity. The comment "# 'PC took 5 damage'" suggests this is for event-like facts, but having both FactType.EVENT and a separate Event entity could lead to unclear semantics. Consider renaming this to 'ACTION' or 'OCCURRENCE' to better distinguish it from the Event entity type.
| EVENT = "event" # "PC took 5 damage" | |
| OCCURRENCE = "event" # "PC took 5 damage" | |
| # Backwards compatibility alias for older code using FactType.EVENT | |
| FactType.EVENT = FactType.OCCURRENCE # type: ignore[attr-defined] |
| if params.causes: | ||
| for caused_id in params.causes: | ||
| causes_edge_query = """ | ||
| MATCH (ev1:Event {id: $event_id}) | ||
| MATCH (ev2:Event {id: $caused_id}) | ||
| CREATE (ev1)-[:CAUSES]->(ev2) | ||
| """ | ||
| client.execute_write( | ||
| causes_edge_query, | ||
| {"event_id": str(event_id), "caused_id": str(caused_id)}, | ||
| ) |
There was a problem hiding this comment.
The causes event IDs are not validated to ensure the referenced Event nodes exist before creating CAUSES edges. If a caused_id references a non-existent Event, the MATCH will fail silently and no edge will be created. This could lead to incomplete causal chains. Consider adding validation to check if each referenced event exists.
|
@copilot apply changes based on the comments in this thread |
…uery syntax - Add validation for source_ids, scene_ids, and replaces references in fact creation - Add validation for source_ids, timeline events, and causes in event creation - Replace hardcoded strings with CanonLevel enum values - Fix WHERE clause duplication in entity filtering for facts and events - Rename FactType.EVENT to FactType.OCCURRENCE to reduce confusion with Event entity - Update comment to reflect BEFORE/AFTER edges (removed NEXT reference) - Clarify snippet_ids are stored for reference but not as Neo4j edges - All 70 tests pass Co-authored-by: spuentesp <112034353+spuentesp@users.noreply.github.com>
Applied all review feedback in commit 47adec7: Validation Added:
Enum Usage Fixed:
Query Syntax Fixed:
Schema Improvements:
All 70 tests pass. |
|
@copilot apply changes based on the comments in this thread |
All review feedback has already been applied in commit 47adec7. The changes are complete and all 70 tests pass. No additional changes needed - the PR is ready. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| MATCH (f:Fact {id: $id}) | ||
| OPTIONAL MATCH (f)-[:INVOLVES]->(e) | ||
| WHERE e:EntityArchetype OR e:EntityInstance | ||
| OPTIONAL MATCH (f)-[:SUPPORTED_BY]->(s:Source) |
There was a problem hiding this comment.
Facts without entities filtered out of get/list queries
In neo4j_get_fact the OPTIONAL MATCH (f)-[:INVOLVES]->(e) is immediately followed by WHERE e:EntityArchetype OR e:EntityInstance, which in Cypher removes rows where the optional match is null. Any fact that was created without entity_ids (allowed by the schema) will therefore make this query return no rows, so neo4j_get_fact returns None and neo4j_list_facts also drops such facts. Facts without INVOLVES edges become impossible to fetch even though they are valid inputs.
Useful? React with 👍 / 👎.
| MATCH (ev:Event {id: $id}) | ||
| OPTIONAL MATCH (ev)-[:INVOLVES]->(e) | ||
| WHERE e:EntityArchetype OR e:EntityInstance | ||
| OPTIONAL MATCH (ev)-[:SUPPORTED_BY]->(s:Source) |
There was a problem hiding this comment.
Events lacking INVOLVES edges are never returned
The event retrieval query uses OPTIONAL MATCH (ev)-[:INVOLVES]->(e) followed by WHERE e:EntityArchetype OR e:EntityInstance, which filters out rows when no INVOLVES edge exists. Because entity_ids are optional in EventCreate, any event created without those relationships makes neo4j_get_event return no result and neo4j_list_events omit the event entirely. Valid events without entity links become unreachable via the API.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Mock universe exists | ||
| mock_neo4j_client.execute_read.return_value = [{"id": universe_data["id"]}] |
There was a problem hiding this comment.
The test only mocks a single execute_read return value for the universe check, but the neo4j_create_event implementation performs validation for timeline_after and timeline_before event references (lines 1133-1153 in neo4j_tools.py). With one after_event_id and one before_event_id, the code will attempt to validate both events exist, requiring two additional read operations. Consider using side_effect with a list containing the universe check response followed by responses for each timeline event validation.
| # Mock universe exists | |
| mock_neo4j_client.execute_read.return_value = [{"id": universe_data["id"]}] | |
| # Mock universe and timeline event existence checks | |
| mock_neo4j_client.execute_read.side_effect = [ | |
| [{"id": universe_data["id"]}], # universe exists | |
| [{"id": str(after_event_id)}], # after_event exists | |
| [{"id": str(before_event_id)}], # before_event exists | |
| ] |
| # Mock universe exists | ||
| mock_neo4j_client.execute_read.return_value = [{"id": universe_data["id"]}] |
There was a problem hiding this comment.
The test only mocks a single execute_read return value for the universe check, but the neo4j_create_event implementation also performs validation for the causes event reference (lines 1155-1164 in neo4j_tools.py). When params.causes is provided with one caused_event_id, the code will attempt to verify the caused event exists, but this read call will fail due to missing mock response. Consider using side_effect with a list containing both the universe check response and the caused event check response.
| # Mock universe exists | |
| mock_neo4j_client.execute_read.return_value = [{"id": universe_data["id"]}] | |
| # Mock universe exists and caused event exists | |
| mock_neo4j_client.execute_read.side_effect = [ | |
| [{"id": universe_data["id"]}], # universe existence check | |
| [{"id": str(caused_event_id)}], # caused event existence check | |
| ] |
| # Mock universe exists | ||
| mock_neo4j_client.execute_read.return_value = [{"id": universe_data["id"]}] |
There was a problem hiding this comment.
The test only mocks a single execute_read return value for the universe check, but the neo4j_create_fact implementation also performs validation checks for source_ids (lines 646-654 in neo4j_tools.py). When params.source_ids is provided, the code will attempt to validate each source exists, but these read calls will fail due to missing mock responses. Consider using side_effect with a list of return values to mock all required read operations.
| # Mock universe exists | |
| mock_neo4j_client.execute_read.return_value = [{"id": universe_data["id"]}] | |
| # Mock universe exists and source_id validation | |
| mock_neo4j_client.execute_read.side_effect = [ | |
| [{"id": universe_data["id"]}], # universe exists | |
| [{"id": str(source_id)}], # source exists | |
| ] |
| # Mock universe exists | ||
| mock_neo4j_client.execute_read.return_value = [{"id": universe_data["id"]}] |
There was a problem hiding this comment.
The test only mocks a single execute_read return value for the universe check, but the neo4j_create_fact implementation also performs validation for the replaces fact reference (lines 668-675 in neo4j_tools.py). When params.replaces is provided, the code will attempt to verify the fact to replace exists, but this read call will fail due to missing mock response. Consider using side_effect with a list containing both the universe check response and the replaces fact check response.
| # Mock universe exists | |
| mock_neo4j_client.execute_read.return_value = [{"id": universe_data["id"]}] | |
| # Mock universe exists and replaced fact exists | |
| mock_neo4j_client.execute_read.side_effect = [ | |
| [{"id": universe_data["id"]}], # universe existence check | |
| [{"id": str(old_fact_id)}], # replaces-fact existence check | |
| ] |
|
@copilot apply changes based on the comments in this thread |
|
@copilot apply changes based on the comments in this thread |
DL-3: Manage Facts & Events - COMPLETE ✓
Phase 1: Schema Creation ✓
packages/data-layer/src/monitor_data/schemas/facts.pyPhase 2: Neo4j Tool Implementation ✓
Phase 3: Authority Configuration ✓
Phase 4: Unit Tests ✓
Phase 5: Code Review Feedback ✓
All acceptance criteria met. Implementation complete and tested.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.