Extend the existing in-memory caching prototype with a durable backend so that expensive CallableModel results survive process restarts. Long-running ETL, training, and reporting workflows can then resume without redoing completed work, and repeated runs across sessions reuse prior outputs.
Semantically this is the same as the in-memory cache: a write-through, invalidatable optimization. A miss simply triggers recomputation. It composes naturally with the in-memory cache as an L1/L2 layer.
Behavior
- Pluggable storage backend (
diskcache is a reasonable default; object storage is a useful follow-on).
- Same identity / cache-key semantics as the in-memory evaluator.
- Standard cache controls: TTL, size limits, manual invalidation.
Open questions
- Identity and staleness. What goes into the cache key beyond model identity and context? Source hashing, explicit user-bumped versions, or a combination?
- Large results. Inline storage of multi-GB results is impractical. Should result types be able to spill to external storage and store only a pointer in the cache record, and how does that interact with the arrow/narwhals/pandas result types?
Extend the existing in-memory caching prototype with a durable backend so that expensive
CallableModelresults survive process restarts. Long-running ETL, training, and reporting workflows can then resume without redoing completed work, and repeated runs across sessions reuse prior outputs.Semantically this is the same as the in-memory cache: a write-through, invalidatable optimization. A miss simply triggers recomputation. It composes naturally with the in-memory cache as an L1/L2 layer.
Behavior
diskcacheis a reasonable default; object storage is a useful follow-on).Open questions