Harden developer memories list against malformed records#7784
Harden developer memories list against malformed records#7784tianmind-studio wants to merge 2 commits into
Conversation
Greptile SummaryThis PR hardens the
Confidence Score: 4/5Safe to merge; the changes are additive hardening on a read path and the new validators are well-tested. The coercion logic is thorough and the tests confirm the key failure scenarios. Two minor quality gaps exist: the backend/routers/developer.py — specifically the Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[GET /v1/dev/user/memories] --> B[memories_db.get_memories]
B --> C{For each record}
C --> D{is dict AND has id?}
D -- No --> E[Skip with warning]
D -- Yes --> F{is_locked?}
F -- Yes --> G[Truncate content to 70 chars]
G --> H[CleanerMemory.model_validate]
F -- No --> H
H --> I{ValidationError?}
I -- No --> J[Append to valid_memories]
I -- Yes --> K[Skip with warning]
J --> L[Return valid_memories]
Reviews (1): Last reviewed commit: "Make dev memories list tolerate malforme..." | Re-trigger Greptile |
| @field_validator('created_at', 'updated_at', mode='before') | ||
| def coerce_datetime(cls, value): | ||
| if value in [None, '']: | ||
| return None | ||
| if isinstance(value, datetime): | ||
| return value | ||
| if isinstance(value, str): | ||
| try: | ||
| return datetime.fromisoformat(value.replace('Z', '+00:00')) | ||
| except ValueError: | ||
| return None | ||
| if isinstance(value, (int, float)) and not isinstance(value, bool): | ||
| return value | ||
| return None |
There was a problem hiding this comment.
coerce_datetime returns raw numeric without conversion
When value is an int or float, the validator returns it as-is and relies on Pydantic V2's implicit lax-mode coercion (treating the number as a Unix-seconds timestamp). This works today, but the silent dependency on Pydantic's internal coercion semantics makes the intent opaque and couples correctness to Pydantic's undocumented numeric-datetime interpretation. An explicit datetime.fromtimestamp(value, tz=timezone.utc) inside the validator would both document intent and fail loudly if the stored value is out of range, rather than surfacing as a confusing None in the response after the except ValidationError swallows it.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Addressed in 746e0c00e: numeric created_at/updated_at values are now explicitly converted with datetime.fromtimestamp(..., tz=timezone.utc) and invalid numeric ranges fall back to None. Added coverage for the Unix timestamp path in test_cleaner_memory_coerces_edge_values; local validation: 19 passed.
| @field_validator('id', mode='before') | ||
| def coerce_id(cls, value): | ||
| if value is None: | ||
| return '' | ||
| return str(value) |
There was a problem hiding this comment.
coerce_id allows an empty-string id through CleanerMemory
The coerce_id validator converts None → '', and there is no further check to reject an empty string. The endpoint's pre-filter (not memory.get('id')) blocks empty-ID dicts before model_validate, but CleanerMemory is also used as the response_model for PATCH /v1/dev/user/memories/{memory_id}, which returns memories_db.get_memory(uid, memory_id) raw — with no equivalent pre-filter. A Firestore doc that somehow lacks an id key would produce a serialized response containing "id": "".
| @field_validator('id', mode='before') | |
| def coerce_id(cls, value): | |
| if value is None: | |
| return '' | |
| return str(value) | |
| @field_validator('id', mode='before') | |
| def coerce_id(cls, value): | |
| if not value and value != 0: | |
| return '' | |
| return str(value) |
There was a problem hiding this comment.
Addressed in 746e0c00e: CleanerMemory now rejects missing/empty IDs at model validation time, while the list endpoint still pre-filters malformed records before serialization. Added a regression test for the model-level ValidationError; local validation: 19 passed.
Summary
CleanerMemorytolerate legacy Developer API memory records with invalid optional fields by coercing safe defaults.created_at/updated_atvalues into UTC datetimes instead of returning raw numbers./v1/dev/user/memoriesresponses containing malformed legacy records so a single bad record does not turn the page into a 500.Regression check
With the new tests but
backend/routers/developer.pytemporarily restored toorigin/main:python -m pytest tests\unit\test_dev_api_folder_filters.py -q->2 failed, 16 passed, 2 warnings/v1/dev/user/memories?limit=3&offset=7and directCleanerMemory(...)validation errors for legacy valuesTesting
python -m pytest tests\unit\test_dev_api_folder_filters.py -q->19 passed, 2 warningspython -m black --line-length 120 --skip-string-normalization routers\developer.py tests\unit\test_dev_api_folder_filters.py --checkpython -m py_compile routers\developer.py tests\unit\test_dev_api_folder_filters.pygit diff --check