Skip to content

Harden developer memories list against malformed records#7784

Open
tianmind-studio wants to merge 2 commits into
BasedHardware:mainfrom
tianmind-studio:codex/dev-memories-pagination-resilience
Open

Harden developer memories list against malformed records#7784
tianmind-studio wants to merge 2 commits into
BasedHardware:mainfrom
tianmind-studio:codex/dev-memories-pagination-resilience

Conversation

@tianmind-studio

@tianmind-studio tianmind-studio commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Make CleanerMemory tolerate legacy Developer API memory records with invalid optional fields by coercing safe defaults.
  • Reject missing/empty memory IDs at model validation time, while keeping the list endpoint's per-record guard that skips id-less records before response serialization.
  • Convert numeric created_at / updated_at values into UTC datetimes instead of returning raw numbers.
  • Add HTTP coverage for paginated /v1/dev/user/memories responses containing malformed legacy records so a single bad record does not turn the page into a 500.

Regression check

With the new tests but backend/routers/developer.py temporarily restored to origin/main:

  • python -m pytest tests\unit\test_dev_api_folder_filters.py -q -> 2 failed, 16 passed, 2 warnings
  • failures covered a 500 response from /v1/dev/user/memories?limit=3&offset=7 and direct CleanerMemory(...) validation errors for legacy values

Testing

  • python -m pytest tests\unit\test_dev_api_folder_filters.py -q -> 19 passed, 2 warnings
  • python -m black --line-length 120 --skip-string-normalization routers\developer.py tests\unit\test_dev_api_folder_filters.py --check
  • python -m py_compile routers\developer.py tests\unit\test_dev_api_folder_filters.py
  • git diff --check

@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR hardens the GET /v1/dev/user/memories Developer API endpoint so that a single malformed or legacy Firestore record no longer turns an entire paginated response into a 500. It adds per-field field_validator coercions to CleanerMemory (covering None ids, invalid categories, bad datetimes, non-list tags, stringified booleans, etc.) and a pre-filter that skips non-dict or id-less records before model_validate is called.

  • CleanerMemory now tolerates every previously-required field being absent or of the wrong type by coercing safe defaults, mirroring the hardening already present on the /v3/memories path.
  • A pre-filter guard (isinstance(memory, dict) and memory.get('id')) plus a try/except ValidationError loop replace the previous direct append, preventing any single bad record from bubbling up as an unhandled exception.
  • New HTTP-layer and unit tests in TestDevApiMemoriesHttpLayer cover a mixed-quality page (good record, legacy malformed record, locked record with null content, id-less record) and directly exercise each validator's edge values.

Confidence Score: 4/5

Safe to merge; the changes are additive hardening on a read path and the new validators are well-tested.

The coercion logic is thorough and the tests confirm the key failure scenarios. Two minor quality gaps exist: the coerce_datetime numeric branch silently delegates to Pydantic's internal timestamp coercion rather than explicitly calling datetime.fromtimestamp, and coerce_id allows an empty-string id to propagate through CleanerMemory when the model is used directly (e.g., as the PATCH response model) without the endpoint's pre-filter guarding it.

backend/routers/developer.py — specifically the coerce_id and coerce_datetime validators.

Important Files Changed

Filename Overview
backend/routers/developer.py Adds field_validator coercions to CleanerMemory so legacy/malformed Firestore records survive serialization; adds a pre-filter that skips non-dict or id-less records before model_validate; fixes a potential None[:70] crash in the locked-memory content truncation path.
backend/tests/unit/test_dev_api_folder_filters.py Adds TestDevApiMemoriesHttpLayer with an HTTP-layer test that exercises a mixed-quality page (good record, legacy malformed record, locked record with null content, id-less record) and a direct unit test for CleanerMemory validator edge values; both tests are clean and well-structured.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GET /v1/dev/user/memories] --> B[memories_db.get_memories]
    B --> C{For each record}
    C --> D{is dict AND has id?}
    D -- No --> E[Skip with warning]
    D -- Yes --> F{is_locked?}
    F -- Yes --> G[Truncate content to 70 chars]
    G --> H[CleanerMemory.model_validate]
    F -- No --> H
    H --> I{ValidationError?}
    I -- No --> J[Append to valid_memories]
    I -- Yes --> K[Skip with warning]
    J --> L[Return valid_memories]
Loading

Reviews (1): Last reviewed commit: "Make dev memories list tolerate malforme..." | Re-trigger Greptile

Comment on lines +159 to +172
@field_validator('created_at', 'updated_at', mode='before')
def coerce_datetime(cls, value):
if value in [None, '']:
return None
if isinstance(value, datetime):
return value
if isinstance(value, str):
try:
return datetime.fromisoformat(value.replace('Z', '+00:00'))
except ValueError:
return None
if isinstance(value, (int, float)) and not isinstance(value, bool):
return value
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 coerce_datetime returns raw numeric without conversion

When value is an int or float, the validator returns it as-is and relies on Pydantic V2's implicit lax-mode coercion (treating the number as a Unix-seconds timestamp). This works today, but the silent dependency on Pydantic's internal coercion semantics makes the intent opaque and couples correctness to Pydantic's undocumented numeric-datetime interpretation. An explicit datetime.fromtimestamp(value, tz=timezone.utc) inside the validator would both document intent and fail loudly if the stored value is out of range, rather than surfacing as a confusing None in the response after the except ValidationError swallows it.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 746e0c00e: numeric created_at/updated_at values are now explicitly converted with datetime.fromtimestamp(..., tz=timezone.utc) and invalid numeric ranges fall back to None. Added coverage for the Unix timestamp path in test_cleaner_memory_coerces_edge_values; local validation: 19 passed.

Comment on lines +122 to +126
@field_validator('id', mode='before')
def coerce_id(cls, value):
if value is None:
return ''
return str(value)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 coerce_id allows an empty-string id through CleanerMemory

The coerce_id validator converts None'', and there is no further check to reject an empty string. The endpoint's pre-filter (not memory.get('id')) blocks empty-ID dicts before model_validate, but CleanerMemory is also used as the response_model for PATCH /v1/dev/user/memories/{memory_id}, which returns memories_db.get_memory(uid, memory_id) raw — with no equivalent pre-filter. A Firestore doc that somehow lacks an id key would produce a serialized response containing "id": "".

Suggested change
@field_validator('id', mode='before')
def coerce_id(cls, value):
if value is None:
return ''
return str(value)
@field_validator('id', mode='before')
def coerce_id(cls, value):
if not value and value != 0:
return ''
return str(value)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 746e0c00e: CleanerMemory now rejects missing/empty IDs at model validation time, while the list endpoint still pre-filters malformed records before serialization. Added a regression test for the model-level ValidationError; local validation: 19 passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant