Skip to content

perf: use hash() as sort key in stable repr and canonicalize JSON keys

3d64f14
Select commit
Loading
Failed to load commit list.
Closed

perf: use hash() as sort key in stable repr and canonicalize JSON keys #12

perf: use hash() as sort key in stable repr and canonicalize JSON keys
3d64f14
Select commit
Loading
Failed to load commit list.
DS-Review / DS-Review failed May 6, 2026 in 0s

DS-Review completed

PR Review

PR: This PR replaces repr()-based sorting with hash()-based sorting in stable repr routine and canonicalizes JSON output in the SQLite persistence layer.

Severity Issue
P0 Sorting by hash() breaks cross-process determinism — different processes may produce different orderings for identical collections due to per-process hash seed (PYTHONHASHSEED). This undermines content-based fingerprint guarantees. — src/cashet/hashing.py:347
P1 When keys hash to the same value, Python's stable sort preserves insertion order among them, making canonical representation insertion-order-dependent even within a single process. Need a fallback tie-breaking comparator (e.g., repr()) for total order. — src/cashet/hashing.py:347
P2 The change from repr() to hash() alters the contract: canonical representation is now only deterministic per process. This must be documented in the docstring and README to avoid surprising users with multi-process setups (e.g., Redis, multi-worker). — src/cashet/hashing.py:347
P3 sort_keys=True in json.dumps() on a list is a no-op; it only affects dictionary keys. Remove it to avoid confusion. — src/cashet/store.py:315

Notes

  • The switch to hash() breaks cross-process cache sharing and content-based fingerprint guarantees. Use a deterministic total order (e.g., repr() or a combination with fallback) to maintain stability across interpreter invocations.
  • The sort_keys=True on list serialization is harmless but misleading; consider removing for clarity.

Verdict

Request changes — The P0 cross-process determinism issue must be resolved before merging; the PR's performance gains do not outweigh the loss of content-addressability across processes.