diff --git a/.claude/commands/add-codec.md b/.claude/commands/add-codec.md new file mode 100644 index 00000000..9d8b8d83 --- /dev/null +++ b/.claude/commands/add-codec.md @@ -0,0 +1,51 @@ +# Command: add-codec + +Add a codec (encoder/decoder pair) to an existing store or create a new codec for `ValueCodecs`/`KeyCodecs`. + +## Usage + +``` +/add-codec +``` + +## Examples + +``` +/add-codec add msgpack serialization to my store +/add-codec add base64 encoding to keys +/add-codec create a Zod-validated value codec for UserSchema +``` + +## What this command does + +1. Reads `dol/kv_codecs.py` to understand existing codecs and patterns +2. Reads `dol/trans.py` for `Codec`, `ValueCodec`, `KeyCodec` definitions +3. Implements the codec as: + - A `ValueCodec(encoder=..., decoder=...)` or `KeyCodec(encoder=..., decoder=...)` if it's a reusable pair + - A `wrap_kvs(store, obj_of_data=..., data_of_obj=...)` if it's one-off +4. Shows how to apply it to a store and how to compose it with existing codecs + +## Output pattern + +```python +from dol.trans import ValueCodec +import msgpack + +msgpack_codec = ValueCodec( + encoder=msgpack.dumps, + decoder=msgpack.loads, +) + +# Apply to any store +MsgpackStore = msgpack_codec(dict) + +# Compose: msgpack + gzip +from dol import ValueCodecs +compressed_msgpack = msgpack_codec + ValueCodecs.gzip() +``` + +## Notes + +- Use `Codec.compose_with` / `+` operator to chain codecs +- See `dol/kv_codecs.py` for real examples of `ValueCodecs.*` factories +- See `misc/docs/python_design.md` section "Codec Abstraction" for details diff --git a/.claude/commands/explain-store.md b/.claude/commands/explain-store.md new file mode 100644 index 00000000..ed219379 --- /dev/null +++ b/.claude/commands/explain-store.md @@ -0,0 +1,46 @@ +# Command: explain-store + +Explain how a given dol store works, tracing the key/value transform pipeline. + +## Usage + +``` +/explain-store +``` + +## What this command does + +1. Finds the store definition in the codebase +2. Traces the MRO (method resolution order) to identify all transform layers +3. For each `wrap_kvs` layer, shows what transforms are applied +4. Draws the data flow diagram for read and write operations +5. Shows a concrete example: what happens when you do `store['some_key']` and `store['some_key'] = value` + +## Example output format + +``` +Store: JsonFileStore (wrap_kvs applied to Files) + +Read pipeline: + key='report' + → id_of_key: 'report' + '.json' = 'report.json' + → Files.__getitem__('report.json') + → raw_data: b'{"x": 1}' + → obj_of_data: json.loads(raw_data) + → returns: {'x': 1} + +Write pipeline: + key='report', obj={'x': 1} + → id_of_key: 'report.json' + → data_of_obj: json.dumps({'x': 1}) = '{"x": 1}' + → Files.__setitem__('report.json', '{"x": 1}') + +Iteration: + Files.__iter__() → ['report.json', 'data.json', ...] + → key_of_id: strip '.json' → ['report', 'data', ...] +``` + +## Notes + +- See `dol/base.py:Store.__getitem__` for the actual implementation +- See `misc/docs/python_design.md` for the full class hierarchy diff --git a/.claude/commands/new-store.md b/.claude/commands/new-store.md new file mode 100644 index 00000000..edac3638 --- /dev/null +++ b/.claude/commands/new-store.md @@ -0,0 +1,52 @@ +# Command: new-store + +Create a new dol store wrapping a given backend, with key/value transforms. + +## Usage + +``` +/new-store +``` + +## What this command does + +1. Reads `dol/base.py` to understand `KvReader`/`KvPersister`/`Store` +2. Reads `dol/trans.py` to understand `wrap_kvs` and `store_decorator` +3. Asks clarifying questions if needed: + - What is the backend? (files, S3, DB, dict, custom) + - What are the keys? (strings, paths, tuples?) + - What are the values? (bytes, JSON, Python objects?) + - Read-only or read-write? + - Any key transformation needed? (prefix, suffix, format conversion) + - Any value serialization needed? (JSON, pickle, gzip, custom) +4. Generates the store class/factory using `wrap_kvs` (preferred) or subclassing +5. Adds a docstring and a doctest example + +## Output pattern + +```python +from dol import wrap_kvs, KvReader # or KvPersister +# + relevant codec imports + +def make__store(root_path, ...): + """ + + >>> s = make__store(...) + >>> s['key'] = value + >>> s['key'] + value + """ + return wrap_kvs( + , + id_of_key=..., + key_of_id=..., + obj_of_data=..., + data_of_obj=..., + ) +``` + +## Notes + +- Always include a working doctest using `dict` as the backend +- Follow the `X_of_Y` naming convention for transforms +- See `CLAUDE.md` and `misc/docs/python_design.md` for full context diff --git a/.claude/rules/dol-conventions.md b/.claude/rules/dol-conventions.md new file mode 100644 index 00000000..777a2801 --- /dev/null +++ b/.claude/rules/dol-conventions.md @@ -0,0 +1,54 @@ +# dol Coding Conventions + +## Interface Choices + +- **New read-only stores**: subclass `KvReader` (from `dol.base`), not `Mapping` directly. +- **New read-write stores**: subclass `KvPersister` (from `dol.base`). +- **Avoid subclassing `Store` directly** unless you need the `_id_of_key` / `_key_of_id` / `_data_of_obj` / `_obj_of_data` hook protocol. Prefer `wrap_kvs` instead — it's more composable and doesn't create a new class in the inheritance chain. + +## Naming Conventions + +- Transform functions use `X_of_Y` naming: `key_of_id` (given `_id`, return `key`), `id_of_key` (given `key`, return `_id`). Follow this in new code. +- Use `_id` for inner/backend keys, `k` or `key` for outer/interface keys. +- Use `data` for serialized/raw backend values, `obj` or `v` or `value` for outer Python objects. + +## `wrap_kvs` Usage + +- Prefer `wrap_kvs` over subclassing when adding key/value transforms. +- Use `obj_of_data`/`data_of_obj` (no key context) when the transform is the same for all values. +- Use `postget`/`preset` (with key context) only when the transform depends on the key (e.g., file extension). +- **Do not use `bytes.decode` directly** as `obj_of_data` — use `lambda b: b.decode()` instead (signature-inference bug, Issue #9). + +## Store Construction + +- Always test new stores with `dict()` as backend before using real storage. +- Make stores parametrizable: accept `store=None` (or `store_factory=dict`) as a parameter. +- Return the store from a factory function rather than hardcoding the backend. + +## Purity and Dependencies + +- Core `dol` has no external dependencies. Keep new core code dependency-free. +- If a new feature needs an external dependency, put it in a separate module with a clear optional import and a helpful error message if the dependency is missing. + +## Codecs + +- Use `ValueCodecs` and `KeyCodecs` from `dol.kv_codecs` for common encoders/decoders. +- For custom codecs, use `dol.trans.ValueCodec` / `KeyCodec` dataclasses (not ad-hoc lambdas) when the codec needs to be reused or composed. +- Compose codecs with `+` operator: `ValueCodecs.pickle() + ValueCodecs.gzip()`. + +## Caching + +- Use `@cache_this` for expensive properties/methods, specifying an explicit `cache=` store if persistence across sessions is needed. +- Use `cache_vals(store)` to add an in-memory read cache to a slow store. +- Use `store_cached(cache_store)(func)` to persist function results. + +## Testing + +- Tests live in `dol/tests/`. +- Doctests in module docstrings are the primary documentation for functions — keep them runnable. +- Use `utils_for_tests.py` for shared test fixtures. + +## No Silent Failures + +- If a transform function is passed that will fail silently, raise an informative error early (e.g., in `__init__` or at decoration time). +- Prefer explicit errors over silent incorrect behavior. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..ca5a590d --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,199 @@ +# dol — AI Agent Guide + +`dol` is a pure-Python (no dependencies) toolkit for wrapping any storage backend (files, S3, databases, dicts) behind a uniform dict-like interface. Version 0.3.38. Python ≥ 3.10. + +For a comprehensive agent-readable API reference, see [llms-full.txt](llms-full.txt). +For a quick orientation, see [llms.txt](llms.txt). + +--- + +## Key Files + +| File | What's in it | +|------|-------------| +| `dol/base.py` | `Collection`, `KvReader`, `KvPersister`, `Store` — the class hierarchy | +| `dol/trans.py` | `wrap_kvs` (core), `store_decorator`, `filt_iter`, `cached_keys`, `Codec`, `kv_wrap` | +| `dol/kv_codecs.py` | `ValueCodecs`, `KeyCodecs` — ready-made codec namespaces | +| `dol/caching.py` | `cache_this`, `cache_vals`, `store_cached`, `WriteBackChainMap` | +| `dol/paths.py` | `KeyTemplate`, `mk_relative_path_store`, `KeyPath`, `path_get/set/filter` | +| `dol/filesys.py` | `Files`, `TextFiles`, `JsonFiles`, `PickleFiles` — filesystem stores | +| `dol/sources.py` | `FlatReader`, `FanoutReader/Persister`, `CascadedStores` | +| `dol/signatures.py` | `Sig` — signature arithmetic | +| `dol/util.py` | `Pipe`, `lazyprop`, `partialclass`, `groupby` | +| `dol/__init__.py` | Public API — all exports live here | + +--- + +## Core Pattern: Building Stores + +The fundamental operation is **wrapping a backend with transforms**: + +```python +from dol import wrap_kvs, Files +import json + +# Add JSON serialization to a file store +JsonFileStore = wrap_kvs(Files, obj_of_data=json.loads, data_of_obj=json.dumps) + +# Or wrap an instance +s = wrap_kvs(dict(), id_of_key=lambda k: k.upper(), key_of_id=str.lower) +``` + +`wrap_kvs` parameters: +- `key_of_id` / `id_of_key` — outgoing/incoming key transforms +- `obj_of_data` / `data_of_obj` — outgoing/incoming value transforms +- `postget(key, data) → obj` — value transform that knows the key (for reads) +- `preset(key, obj) → data` — value transform that knows the key (for writes) +- `key_codec` / `value_codec` — `Codec` objects (encoder+decoder pair) + +--- + +## Core Conventions + +- **`X_of_Y` naming**: `key_of_id` = "give me a key, you give me an id" (outgoing). `id_of_key` = "give me an id, you give me a key" (incoming). Always pairs. +- **KvReader for read-only**: subclass `KvReader` (not `KvPersister`) when writes aren't needed. +- **KvPersister for read-write**: `clear()` is disabled — override only if you're sure. +- **Test with `dict`, deploy with real backend**: `wrap_kvs(dict, ...)` first, then swap `dict` for `Files`, a DB store, etc. +- **Transforms are pure functions**: they should be stateless and not have side effects. + +--- + +## How to Create a New Store + +### Option 1: `wrap_kvs` (preferred for most cases) + +```python +from dol import wrap_kvs + +MyStore = wrap_kvs(dict, + id_of_key=lambda k: k + '.json', + key_of_id=lambda _id: _id[:-5], + obj_of_data=json.loads, + data_of_obj=json.dumps, +) +``` + +### Option 2: Subclass `KvReader`/`KvPersister` + +```python +from dol.base import KvReader + +class MyReader(KvReader): + def __getitem__(self, k): ... + def __iter__(self): ... + def __len__(self): ... # optional, falls back to iteration count +``` + +### Option 3: Subclass `Store` (when you need transform hooks) + +```python +from dol.base import Store + +class MyStore(Store): + def _id_of_key(self, k): return k.upper() + def _key_of_id(self, _id): return _id.lower() + def _data_of_obj(self, obj): return json.dumps(obj) + def _obj_of_data(self, data): return json.loads(data) +``` + +--- + +## Ready-Made Codecs + +```python +from dol import ValueCodecs, KeyCodecs, Pipe + +# Common value codecs +ValueCodecs.pickle() # pickle.dumps / pickle.loads +ValueCodecs.json() # json.dumps / json.loads +ValueCodecs.gzip() # compress/decompress +ValueCodecs.str_to_bytes() # encode/decode + +# Key codecs +KeyCodecs.suffixed('.pkl') # add/strip suffix +KeyCodecs.prefixed('ns:') # add/strip prefix + +# Chain with Pipe +MyStore = Pipe(KeyCodecs.suffixed('.pkl'), ValueCodecs.pickle())(dict) +``` + +--- + +## Store Decorators + +Most tools in `trans.py` use `@store_decorator`, making them work 4 ways: + +```python +from dol import filt_iter, cached_keys + +# As class decorator +@filt_iter(filt=lambda k: k.endswith('.json')) +class MyStore(dict): ... + +# As instance wrapper +s = filt_iter(my_store, filt=lambda k: k.endswith('.json')) + +# As factory +json_only = filt_iter(filt=lambda k: k.endswith('.json')) +s = json_only(my_store) +``` + +--- + +## Caching + +```python +from dol import cache_this, cache_vals, store_cached + +# Cache a property or method +class MyClass: + @cache_this + def expensive(self): return sum(range(1_000_000)) + +# Cache fetched values from a slow store +fast = cache_vals(slow_store) + +# Persist function results across sessions +@store_cached(JsonFiles('/cache')) +def compute(x, y): return slow_computation(x, y) +``` + +--- + +## Testing Approach + +Always prototype with `dict` as the backend: + +```python +# 1. Test logic with dict +s = wrap_kvs(dict(), obj_of_data=json.loads, data_of_obj=json.dumps) +s['key'] = {'a': 1} +assert s['key'] == {'a': 1} + +# 2. Swap to real backend +from dol import Files +s = wrap_kvs(Files('/data'), obj_of_data=json.loads, data_of_obj=json.dumps) +``` + +Run tests: `pytest dol/tests/` + +--- + +## Documentation Index (`misc/docs/`) + +| Document | Contents | +|----------|----------| +| [general_design.md](misc/docs/general_design.md) | Language-agnostic design: what dol is, the KV pipeline, layered composition, patterns | +| [python_design.md](misc/docs/python_design.md) | Python architecture: class hierarchy, `wrap_kvs` deep dive, `Codec`/`Sig`/`Pipe`, critique | +| [issues_and_discussions.md](misc/docs/issues_and_discussions.md) | GitHub issues/discussions themes, known limitations, open design questions | +| [frontend_dol_ideas.md](misc/docs/frontend_dol_ideas.md) | `zoddal` design: TypeScript KV interface, adapters, Zod bridge, zod-collection-ui integration | + +--- + +## Known Limitations / Gotchas + +- **`wrap_kvs` + `self` inside methods**: When a `wrap_kvs`-decorated class uses `self[k]` in its own methods, `self` is the unwrapped instance. Re-apply the wrapper to `self` if transforms are needed (Issue #18). +- **`clear()` is disabled** on `KvPersister`. Call `ensure_clear_to_kv_store(store)` to re-enable. +- **No async support** in core. Use synchronous wrappers for async backends (thread pool, etc.). +- **`bytes.decode` as `obj_of_data`** causes issues — use `lambda b: b.decode()` instead (Issue #9). +- **Windows paths**: Some path-related code has Unix assumptions. Issues #52, #58 track this. diff --git a/llms-full.txt b/llms-full.txt new file mode 100644 index 00000000..48d247dd --- /dev/null +++ b/llms-full.txt @@ -0,0 +1,743 @@ +# dol + +> `dol` (Data Object Layer) is a pure-Python toolkit for wrapping any storage backend — files, S3, databases, dicts — behind a uniform dict-like (`Mapping`/`MutableMapping`) interface. Use it to separate domain logic from storage implementation, add key/value transform layers, and build composable data pipelines with no dependencies. + +## Key concepts + +- **All stores are `Mapping` or `MutableMapping`**. You interact with any backend the same way you use a Python dict: `store[k]`, `store[k] = v`, `del store[k]`, `for k in store`. +- **`wrap_kvs` is the core function**. It wraps a store class or instance with key/value transforms. Stack multiple `wrap_kvs` calls to build transform pipelines ("Russian dolls"). +- **Transforms come in pairs**: `key_of_id`/`id_of_key` for keys; `obj_of_data`/`data_of_obj` for values. Use `postget`/`preset` when the transform depends on the key. +- **Test with `dict`, deploy with real storage**. All dol stores accept a `dict` as the backend; swap it for `Files`, `ZipFiles`, a DB store, etc. when ready. +- **Pure Python, zero dependencies**. The core package (`dol`) has no external requirements. + +## What dol is NOT + +- Not a query engine — no filter-by-field, join, or aggregation. Use the backend's query API directly. +- Not an ORM — no schema definition, migration, or relationship management. +- Not domain-driven — stores are key-value only; domain meaning lives in the code that uses them. + +## Core API + +- [dol/trans.py](dol/trans.py) — `wrap_kvs` (the most important function), `store_decorator`, `filt_iter`, `cached_keys`, `flatten`, `Codec`, `ValueCodec`, `KeyCodec` +- [dol/base.py](dol/base.py) — `KvReader`, `KvPersister`, `Store`, `Collection`, `MappingViewMixin` +- [dol/kv_codecs.py](dol/kv_codecs.py) — `ValueCodecs`, `KeyCodecs` (ready-made codec namespaces) +- [dol/caching.py](dol/caching.py) — `cache_this`, `cache_vals`, `store_cached`, `WriteBackChainMap` +- [dol/paths.py](dol/paths.py) — `KeyTemplate`, `mk_relative_path_store`, `KeyPath`, `path_get`, `path_set` +- [dol/filesys.py](dol/filesys.py) — `Files`, `TextFiles`, `JsonFiles`, `PickleFiles` +- [dol/sources.py](dol/sources.py) — `FlatReader`, `FanoutReader`, `FanoutPersister`, `CascadedStores` +- [dol/signatures.py](dol/signatures.py) — `Sig` (signature arithmetic) + +## Examples + +- [README.md](README.md) — copy data between backends, add serialization layers +- [dol/tests/test_trans.py](dol/tests/test_trans.py) — wrap_kvs tests +- [dol/tests/test_caching.py](dol/tests/test_caching.py) — caching patterns +- [dol/tests/test_paths.py](dol/tests/test_paths.py) — path key patterns +- [dol/tests/test_filesys.py](dol/tests/test_filesys.py) — file store usage + +## Optional + +- [misc/docs/general_design.md](misc/docs/general_design.md) — language-agnostic design concepts (middleware orientation, KV transform pipeline, layered composition) +- [misc/docs/python_design.md](misc/docs/python_design.md) — Python-specific architecture, class hierarchy, all `wrap_kvs` params, design critique +- [misc/docs/issues_and_discussions.md](misc/docs/issues_and_discussions.md) — open design questions and known limitations + +--- + +## module: dol.base + +Base classes for the store hierarchy. + +### Collection + +```python +class Collection(collections.abc.Collection): +``` + +Extends `collections.abc.Collection` with a `head()` method. Default `__len__` and `__contains__` work by iteration (override for efficiency). + +```python +def head(self): + """Get first element (or (k,v) if has .items()).""" +``` + +### KvReader + +```python +class KvReader(MappingViewMixin, Collection, Mapping): +``` + +Read-only key-value store. Extends `Mapping` with `head()`. `__reversed__` raises `NotImplementedError` by design. + +```python +# Usage: any class implementing __getitem__ and __iter__ can subclass KvReader +class MyReader(KvReader): + def __getitem__(self, k): ... + def __iter__(self): ... + def __len__(self): ... +``` + +### KvPersister + +```python +class KvPersister(KvReader, MutableMapping): +``` + +Read-write store. Adds `__setitem__` and `__delitem__`. **`clear()` is disabled** (raises if called — too destructive for persistent backends). + +### Store + +```python +class Store(KvPersister): + def __init__(self, store=dict): ... +``` + +The central class. Wraps an inner `store` with 4 transform hooks (all default to identity): + +```python +_id_of_key(self, k) # outer key → inner key (called on reads, writes, deletes) +_key_of_id(self, _id) # inner key → outer key (called on iteration) +_data_of_obj(self, obj) # outer value → stored data (called on writes) +_obj_of_data(self, data)# stored data → outer value (called on reads) +``` + +Data flow: +``` +read: k → _id_of_key → store[_id] → _obj_of_data → return obj +write: k → _id_of_key, obj → _data_of_obj → store[_id] = data +iter: for _id in store → _key_of_id → yield k +``` + +```python +# Example: Store with key/value transforms +class MyStore(Store): + def _id_of_key(self, k): return k.upper() + def _key_of_id(self, _id): return _id.lower() + def _data_of_obj(self, obj): return chr(obj) + def _obj_of_data(self, data): return ord(data) + +s = MyStore() +s['foo'] = 65 # stores 'A' under 'FOO' +s['foo'] # returns 65 +list(s) # ['foo'] +``` + +--- + +## module: dol.trans + +Transformation and wrapping tools. The most important module. + +### wrap_kvs + +```python +@store_decorator +def wrap_kvs( + store=None, + *, + key_of_id=None, # outgoing key transform: inner_id → outer_key + id_of_key=None, # incoming key transform: outer_key → inner_id + obj_of_data=None, # outgoing value transform: stored_data → python_obj + data_of_obj=None, # incoming value transform: python_obj → stored_data + preset=None, # (key, obj) → data [write, key-aware] + postget=None, # (key, data) → obj [read, key-aware] + key_codec=None, # Codec(encoder=id_of_key, decoder=key_of_id) + value_codec=None, # Codec(encoder=data_of_obj, decoder=obj_of_data) + key_encoder=None, # alias for id_of_key + key_decoder=None, # alias for key_of_id + value_encoder=None, # alias for data_of_obj + value_decoder=None, # alias for obj_of_data + name=None, + wrapper=None, # wrapper class, defaults to Store + outcoming_key_methods=(), + outcoming_value_methods=(), + ingoing_key_methods=(), + ingoing_value_methods=(), +) -> type | object: +``` + +Make a Store with the given key/value transforms applied. Can wrap a class (returns new class) or an instance (returns wrapped instance). + +**`@store_decorator` makes it work in 4 modes:** +```python +# 1. Class decorator (no parens) +@wrap_kvs(obj_of_data=json.loads, data_of_obj=json.dumps) +class MyStore(dict): ... + +# 2. Type wrapping +JsonDict = wrap_kvs(dict, obj_of_data=json.loads, data_of_obj=json.dumps) + +# 3. Instance wrapping +d = {} +d = wrap_kvs(d, obj_of_data=json.loads, data_of_obj=json.dumps) + +# 4. Partial (factory) +json_wrap = wrap_kvs(obj_of_data=json.loads, data_of_obj=json.dumps) +MyStore = json_wrap(dict) +``` + +**`obj_of_data` vs `postget`:** +- `obj_of_data(data) → obj` — value transform, no key context +- `postget(key, data) → obj` — value transform with key context (e.g., choose deserializer by file extension) + +```python +# Key transform: strip prefix +s = wrap_kvs(dict, + id_of_key=lambda k: f"user:{k}", + key_of_id=lambda _id: _id[len("user:"):], +) + +# Value transform: JSON serialization +s = wrap_kvs(dict, obj_of_data=json.loads, data_of_obj=json.dumps) + +# Key-conditioned value transform +s = wrap_kvs(dict, + postget=lambda k, v: json.loads(v) if k.endswith('.json') else pickle.loads(v), + preset=lambda k, v: json.dumps(v) if k.endswith('.json') else pickle.dumps(v), +) + +# Stacking layers +s = dict() +s = wrap_kvs(s, id_of_key=lambda k: k + '.pkl', key_of_id=lambda _id: _id[:-4]) +s = wrap_kvs(s, obj_of_data=pickle.loads, data_of_obj=pickle.dumps) +``` + +### filt_iter + +```python +@store_decorator +def filt_iter(store=None, *, filt: Callable | Iterable = take_everything) -> type | object: +``` + +Filter the keys visible in a store. `filt` can be a boolean function or an explicit collection of keys to include. + +```python +# Keep only keys ending in '.json' +s = filt_iter(my_store, filt=lambda k: k.endswith('.json')) + +# Keep only specific keys +s = filt_iter(my_store, filt=['key1', 'key2']) + +# As class decorator +@filt_iter(filt=lambda k: not k.startswith('_')) +class PublicStore(dict): ... +``` + +### cached_keys + +```python +@store_decorator +def cached_keys(store=None, *, keys_cache: Callable | Collection = list) -> type | object: +``` + +Cache the result of `__iter__`. Use when iterating is expensive (remote API, large filesystem). + +```python +# Cache keys as a list (preserves order) +s = cached_keys(remote_store) + +# Cache as sorted list +s = cached_keys(remote_store, keys_cache=sorted) + +# Cache as set (faster __contains__) +s = cached_keys(remote_store, keys_cache=set) + +# Refresh cache +del s._keys_cache +``` + +### flatten + +```python +@store_decorator +def flatten(store=None, *, levels=None, cache_keys=False) -> type | object: +``` + +Flatten a nested store (store of stores) into a single-level store. + +### store_decorator + +```python +def store_decorator(func) -> Callable: +``` + +Meta-decorator that makes a class-transforming function work in 4 modes: class decorator, class decorator factory, instance decorator, instance decorator factory. + +```python +@store_decorator +def my_deco(store=None, *, param='default'): + # always receives a class; transforms it + store.some_method = lambda self: param + return store + +# 4 equivalent ways to use: +@my_deco # class decorator, defaults +@my_deco(param='x') # class decorator factory +s = my_deco(instance) # instance decorator, defaults +s = my_deco(param='x')(inst) # instance decorator factory +``` + +### Codec / ValueCodec / KeyCodec / KeyValueCodec + +```python +@dataclass +class Codec(Generic[DecodedType, EncodedType]): + encoder: Callable + decoder: Callable + + def compose_with(self, other) -> Codec: ... # chain codecs + def invert(self) -> Codec: ... # swap encoder/decoder + __add__ = compose_with + __invert__ = invert + +class ValueCodec(Codec): + def __call__(self, store): # wraps store with value codec + return wrap_kvs(store, data_of_obj=self.encoder, obj_of_data=self.decoder) + +class KeyCodec(Codec): + def __call__(self, store): # wraps store with key codec + return wrap_kvs(store, id_of_key=self.encoder, key_of_id=self.decoder) + +class KeyValueCodec(Codec): + def __call__(self, store): # wraps store with key-conditioned codec + return wrap_kvs(store, preset=self.encoder, postget=self.decoder) +``` + +```python +# Codec composition +from dol.trans import ValueCodec +import json, gzip + +json_codec = ValueCodec(encoder=json.dumps, decoder=json.loads) +gzip_codec = ValueCodec(encoder=gzip.compress, decoder=gzip.decompress) + +json_gzip_codec = json_codec + gzip_codec # json → gzip on write; gunzip → json on read +MyStore = json_gzip_codec(dict) +``` + +--- + +## module: dol.kv_codecs + +Ready-made codec namespaces. + +### ValueCodecs + +Namespace class with factory methods returning `ValueCodec` instances: + +```python +from dol import ValueCodecs + +ValueCodecs.pickle() # pickle.dumps / pickle.loads +ValueCodecs.json() # json.dumps / json.loads +ValueCodecs.gzip() # gzip.compress / gzip.decompress +ValueCodecs.csv() # csv encode/decode (list of lists ↔ csv string) +ValueCodecs.str_to_bytes() # str.encode / bytes.decode + +# Compose with + +ValueCodecs.pickle() + ValueCodecs.gzip() # pickle then gzip +``` + +### KeyCodecs + +```python +from dol import KeyCodecs + +KeyCodecs.suffixed('.json') # add/strip '.json' suffix +KeyCodecs.prefixed('user:') # add/strip 'user:' prefix +``` + +### Using with Pipe + +```python +from dol import ValueCodecs, KeyCodecs, Pipe + +# Chain key and value wrappers into a single store factory +MyStore = Pipe( + KeyCodecs.suffixed('.pkl'), + ValueCodecs.pickle(), +)(dict) + +s = MyStore() +s['mykey'] = {'data': 42} # stored as 'mykey.pkl' with pickle bytes +s['mykey'] # returns {'data': 42} +``` + +--- + +## module: dol.filesys + +File system stores. All use relative paths as keys and bytes as values (unless otherwise noted). + +```python +from dol import Files, TextFiles, JsonFiles, PickleFiles + +# Files: bytes values +s = Files('/path/to/folder') +s['data.bin'] = b'raw bytes' +data = s['data.bin'] # bytes + +# TextFiles: string values, UTF-8 +t = TextFiles('/path/to/folder') +t['notes.txt'] = 'some text' + +# JsonFiles: JSON-serialized values +j = JsonFiles('/path/to/folder') +j['config.json'] = {'key': 'value'} # auto-serializes to JSON on write + +# PickleFiles: pickle-serialized values +p = PickleFiles('/path/to/folder') +p['model.pkl'] = my_sklearn_model + +# DirReader: recursively lists subdirectories +from dol import DirReader +d = DirReader('/path/to/root') +list(d) # ['subdir1', 'subdir2', ...] +``` + +Key helpers: +```python +from dol import ensure_dir, mk_dirs_if_missing, resolve_path, temp_dir + +path = resolve_path('~/data') # expands ~ +with temp_dir() as td: # temporary directory context manager + s = Files(td) + s['test.bin'] = b'data' +``` + +--- + +## module: dol.caching + +### cache_this + +```python +def cache_this( + method=None, + *, + cache=None, # where to store: dict, 'attr_name', or a Mapping + key=None, # key function or explicit key + ignore=frozenset(), # parameter names to ignore in cache key +) -> property | descriptor: +``` + +Cache property or method results. Auto-detects property vs method based on signature. + +```python +class MyClass: + @cache_this + def expensive_property(self): # zero non-self args → cached_property + return sum(range(1_000_000)) + + @cache_this(cache={}) # shared dict cache across all instances + def expensive_method(self, x, y): + return compute(x, y) + + def __init__(self): + self._cache = {} + + @cache_this(cache='_cache') # use instance attribute as cache + def instance_cached(self, data): + return process(data) +``` + +### cache_vals + +```python +def cache_vals(store, *, cache=dict) -> object: +``` + +Add an in-memory cache layer in front of a store. Reads are cached after first fetch. + +```python +from dol import cache_vals + +fast_store = cache_vals(slow_remote_store) +fast_store['key'] # fetches from remote and caches +fast_store['key'] # returns from cache +``` + +### store_cached + +```python +def store_cached(store, key_func=None) -> Callable: +``` + +Decorator to memoize a function using a dol store as memory. + +```python +from dol import store_cached, PickleFiles + +@store_cached(PickleFiles('/path/to/cache')) +def expensive_computation(x, y): + return very_slow_compute(x, y) + +# Result is persisted to disk across process restarts +result = expensive_computation(1, 2) +``` + +--- + +## module: dol.paths + +### path_get / path_set / path_filter + +```python +def path_get(d: Mapping, path: tuple) -> Any: ... +def path_set(d: Mapping, path: tuple, value: Any) -> None: ... +def path_filter(condition: Callable, d: Mapping) -> Iterator[tuple]: ... +``` + +Navigate nested mappings via tuple paths. + +```python +from dol import path_get, path_set + +d = {'a': {'b': {'c': 42}}} +path_get(d, ('a', 'b', 'c')) # 42 +path_set(d, ('a', 'b', 'd'), 99) +list(path_filter(lambda p, k, v: v == 42, d)) # [('a', 'b', 'c')] +``` + +### KeyTemplate + +```python +class KeyTemplate: + def __init__(self, template: str): ... + def key_to_dict(self, key: str) -> dict: ... + def dict_to_key(self, d: dict) -> str: ... +``` + +Parse and format structured string keys. + +```python +from dol.paths import KeyTemplate + +kt = KeyTemplate('{user}/{year}/{month}.json') +kt.key_to_dict('alice/2024/01.json') +# {'user': 'alice', 'year': '2024', 'month': '01'} +kt.dict_to_key({'user': 'alice', 'year': '2024', 'month': '01'}) +# 'alice/2024/01.json' +``` + +### mk_relative_path_store + +```python +def mk_relative_path_store(store_cls, *, prefix='', sep='/') -> type: +``` + +Turn a store that uses absolute paths into one that uses paths relative to a root. + +```python +from dol.paths import mk_relative_path_store +from dol import Files + +RelFiles = mk_relative_path_store(Files) +s = RelFiles('/data/users') +s['alice/profile.json'] # reads /data/users/alice/profile.json +``` + +--- + +## module: dol.sources + +Multi-store composition. + +### FlatReader + +```python +class FlatReader(KvReader): +``` + +Flatten a store-of-stores into a single-level store. Keys are generated by combining outer and inner keys. + +```python +from dol.sources import FlatReader + +outer = {'A': {'x': 1, 'y': 2}, 'B': {'z': 3}} +flat = FlatReader(outer) +list(flat) # [('A', 'x'), ('A', 'y'), ('B', 'z')] +``` + +### FanoutReader / FanoutPersister + +Broadcast reads to all stores, aggregate results; broadcast writes to all stores. + +```python +from dol.sources import FanoutPersister + +s = FanoutPersister(store1, store2) +s['key'] = value # writes to both store1 and store2 +``` + +### CascadedStores + +Writes to all stores; reads from first store that has the key. + +```python +from dol.sources import CascadedStores + +s = CascadedStores(fast_cache, slow_backend) +s['key'] # reads from fast_cache first, falls through to slow_backend +s['key'] = value # writes to both +``` + +### FuncReader + +A read-only store where keys are names of callables and values are their results. + +```python +from dol.sources import FuncReader + +s = FuncReader(len=len, max=max, min=min) +s['len']([1, 2, 3]) # 3 +``` + +--- + +## module: dol.signatures + +### Sig + +Rich signature manipulation for composing function interfaces. + +```python +from dol.signatures import Sig + +sig = Sig(func) +sig.names # ['a', 'b', 'c'] +sig.defaults # {'b': 2, 'c': 3} +sig.annotations # {'a': int} + +# Arithmetic +new_sig = Sig(f) + Sig(g) # merge signatures +new_sig = Sig(f) + ['extra'] # add parameter +new_sig = Sig(f) - ['verbose'] # remove parameter + +# Apply to function +@Sig(['x', 'y']) +def my_func(*args, **kwargs): ... # now has signature (x, y) +``` + +--- + +## module: dol.util + +### Pipe + +```python +class Pipe: + def __init__(self, *funcs): ... + def __call__(self, x): ... # apply funcs left to right +``` + +Left-to-right function composition. + +```python +from dol import Pipe + +f = Pipe(str.encode, gzip.compress) +# f(s) == gzip.compress(s.encode()) + +# Use as store factory chain +MyStore = Pipe(KeyCodecs.suffixed('.pkl'), ValueCodecs.pickle())(dict) +``` + +### lazyprop + +```python +def lazyprop(func) -> property: +``` + +Lazy-evaluated property: computed once on first access, cached on the instance. + +```python +class MyStore: + @lazyprop + def index(self): + return {k: i for i, k in enumerate(self)} +``` + +--- + +## Common Patterns + +### Pattern 1: Add serialization to any store + +```python +from dol import wrap_kvs +import json + +JsonStore = wrap_kvs(dict, obj_of_data=json.loads, data_of_obj=json.dumps) +s = JsonStore() +s['config'] = {'debug': True} # stored as JSON string +s['config'] # returns {'debug': True} +``` + +### Pattern 2: Build a namespaced file store + +```python +from dol import Files, wrap_kvs + +def make_user_store(username): + return wrap_kvs( + Files('/data'), + id_of_key=lambda k: f"{username}/{k}", + key_of_id=lambda _id: _id[len(username)+1:], + obj_of_data=lambda b: b.decode(), + data_of_obj=lambda s: s.encode(), + ) + +store = make_user_store('alice') +store['notes.txt'] = 'Hello' # writes to /data/alice/notes.txt +``` + +### Pattern 3: Persist a function's results + +```python +from dol import store_cached, JsonFiles + +@store_cached(JsonFiles('/path/to/cache')) +def fetch_data(url): + import urllib.request + return json.loads(urllib.request.urlopen(url).read()) +``` + +### Pattern 4: Filter a store to a subset of keys + +```python +from dol import wrap_kvs, filt_iter, Files + +# Only show .json files +json_store = filt_iter(Files('/data'), filt=lambda k: k.endswith('.json')) +``` + +### Pattern 5: Test with dict, deploy with files + +```python +def make_store(backend=None): + if backend is None: + backend = {} # use dict for testing + return wrap_kvs( + backend, + obj_of_data=json.loads, + data_of_obj=json.dumps, + ) + +# In tests: +s = make_store() + +# In production: +from dol import Files +s = make_store(Files('/data')) +``` + +### Pattern 6: Copy data between backends + +```python +from dol import ValueCodecs, KeyCodecs, Pipe + +src = Pipe(KeyCodecs.suffixed('.pkl'), ValueCodecs.pickle())(src_backend) +tgt = Pipe(KeyCodecs.suffixed('.json'), ValueCodecs.json())(tgt_backend) + +tgt.update(src) # copy all items, re-encoding key format and serialization +``` diff --git a/llms.txt b/llms.txt new file mode 100644 index 00000000..19d9ccb0 --- /dev/null +++ b/llms.txt @@ -0,0 +1,47 @@ +# dol + +> `dol` (Data Object Layer) is a pure-Python toolkit for wrapping any storage backend — files, S3, databases, dicts — behind a uniform dict-like (`Mapping`/`MutableMapping`) interface. Use it to separate domain logic from storage implementation, add key/value transform layers, and build composable data pipelines with no dependencies. + +## Key concepts + +- **All stores are `Mapping` or `MutableMapping`**. You interact with any backend the same way you use a Python dict: `store[k]`, `store[k] = v`, `del store[k]`, `for k in store`. +- **`wrap_kvs` is the core function**. It wraps a store class or instance with key/value transforms. Stack multiple `wrap_kvs` calls to build transform pipelines ("Russian dolls"). +- **Transforms come in pairs**: `key_of_id`/`id_of_key` for keys; `obj_of_data`/`data_of_obj` for values. Use `postget`/`preset` when the transform depends on the key. +- **Test with `dict`, deploy with real storage**. All dol stores accept a `dict` as the backend; swap it for `Files`, `ZipFiles`, a DB store, etc. when ready. +- **Pure Python, zero dependencies**. The core package (`dol`) has no external requirements. + +## What dol is NOT + +- Not a query engine — no filter-by-field, join, or aggregation. Use the backend's query API directly. +- Not an ORM — no schema definition, migration, or relationship management. +- Not domain-driven — stores are key-value only; domain meaning lives in the code that uses them. + +## Core API + +- [dol/trans.py](dol/trans.py) — `wrap_kvs` (the most important function), `store_decorator`, `filt_iter`, `cached_keys`, `flatten`, `Codec`, `ValueCodec`, `KeyCodec` +- [dol/base.py](dol/base.py) — `KvReader`, `KvPersister`, `Store`, `Collection`, `MappingViewMixin` +- [dol/kv_codecs.py](dol/kv_codecs.py) — `ValueCodecs`, `KeyCodecs` (ready-made codec namespaces) +- [dol/caching.py](dol/caching.py) — `cache_this`, `cache_vals`, `store_cached`, `WriteBackChainMap` +- [dol/paths.py](dol/paths.py) — `KeyTemplate`, `mk_relative_path_store`, `KeyPath`, `path_get`, `path_set` +- [dol/filesys.py](dol/filesys.py) — `Files`, `TextFiles`, `JsonFiles`, `PickleFiles` +- [dol/sources.py](dol/sources.py) — `FlatReader`, `FanoutReader`, `FanoutPersister`, `CascadedStores` +- [dol/signatures.py](dol/signatures.py) — `Sig` (signature arithmetic) + +## Examples + +- [README.md](README.md) — copy data between backends, add serialization layers +- [dol/tests/test_trans.py](dol/tests/test_trans.py) — wrap_kvs tests +- [dol/tests/test_caching.py](dol/tests/test_caching.py) — caching patterns +- [dol/tests/test_paths.py](dol/tests/test_paths.py) — path key patterns +- [dol/tests/test_filesys.py](dol/tests/test_filesys.py) — file store usage + +## Optional + +- [misc/docs/general_design.md](misc/docs/general_design.md) — language-agnostic design concepts (middleware orientation, KV transform pipeline, layered composition) +- [misc/docs/python_design.md](misc/docs/python_design.md) — Python-specific architecture, class hierarchy, all `wrap_kvs` params, design critique +- [misc/docs/issues_and_discussions.md](misc/docs/issues_and_discussions.md) — open design questions and known limitations +- [dol/trans.py](dol/trans.py) — `store_decorator`, `double_up_as_factory`, `kv_wrap` (alternative interface to `wrap_kvs`) +- [dol/appendable.py](dol/appendable.py) — append semantics on stores +- [dol/naming.py](dol/naming.py) — `StrTupleDict` for tuple↔string key conversion +- [dol/zipfiledol.py](dol/zipfiledol.py) — `ZipFiles`, `ZipReader` (zip archive stores) +- [dol/explicit.py](dol/explicit.py) — `ExplicitKeyMap`, `KeysReader` diff --git a/misc/docs/frontend_dol_ideas.md b/misc/docs/frontend_dol_ideas.md new file mode 100644 index 00000000..faf28a14 --- /dev/null +++ b/misc/docs/frontend_dol_ideas.md @@ -0,0 +1,491 @@ +# Frontend dol Ideas: Toward `zoddal` + +This document explores how the design principles of `dol` translate to the frontend (TypeScript/React) ecosystem, and proposes the architecture for `zoddal` — a **ZOD-Data-Access-Layer** for frontend applications. + +For background on dol's general design, see [general_design.md](general_design.md). + +--- + +## The Core Analogy + +Python's `dol` is built around two key ideas: +1. **A minimal, language-native KV interface** (`Mapping`/`MutableMapping` ≈ Python's `dict`) +2. **Composable transform layers** that adapt any backend to that interface + +The frontend has an analogous "language-native" KV concept: JavaScript's `Map` and the object-access pattern `record[key]`. But unlike Python, the frontend also has a second critical concern: **asynchrony** — all real storage operations are `async`. + +The zoddal design builds on this: a **type-safe, async KV interface for the frontend**, with the same composable transform philosophy as dol. + +--- + +## Two Layers of Frontend Affordances + +Frontend applications have two distinct "collection affordance" layers that need to be bridged: + +### Layer 1: Storage Layer (Backend-Facing) + +Talking to REST APIs, IndexedDB, localStorage, cloud storage: +- `GET /users/42` → fetch a user by ID +- `POST /users` → create a new user +- `PUT /users/42` → update +- `DELETE /users/42` → delete +- `GET /users?filter=...` → list with query + +This maps naturally to a KV interface: +```typescript +interface KvStore { + get(key: K): Promise; + set(key: K, value: V): Promise; + delete(key: K): Promise; + keys(): AsyncIterable; // or Promise + has(key: K): Promise; +} +``` + +### Layer 2: UI Layer (Component-Facing) + +Presenting collections in the UI: tables, grids, forms, CRUD dialogs. This is what `zod-collection-ui` addresses with its `defineCollection` / `DataProvider` interface. + +The `DataProvider` interface (from zod-collection-ui): +```typescript +interface DataProvider { + getList(params: { sort?, filter?, search?, pagination? }): Promise<{ data: T[]; total: number }>; + getOne(id: string): Promise; + create(data: Partial): Promise; + update(id: string, data: Partial): Promise; + delete(id: string): Promise; +} +``` + +### The Gap: Bridging Storage and UI + +Currently these two layers are often implemented independently, creating duplication. `zoddal` aims to be **DRY across both**: a `KvStore` at the storage layer can be adapted to a `DataProvider` at the UI layer through a standard bridge. + +--- + +## The zoddal Architecture + +### Core Interfaces + +```typescript +// Read-only key-value store (analog to dol's KvReader) +interface KvReader { + get(key: K): Promise; + has(key: K): Promise; + keys(): Promise; // or AsyncIterable + values(): Promise; + entries(): Promise<[K, V][]>; + head(): Promise<[K, V] | undefined>; +} + +// Read-write store (analog to dol's KvPersister) +interface KvStore extends KvReader { + set(key: K, value: V): Promise; + delete(key: K): Promise; +} + +// Mutable store with update semantics (for REST PATCH-style updates) +interface MutableKvStore extends KvStore { + update(key: K, patch: Partial): Promise; +} +``` + +### The Transform Pipeline + +Mirroring dol's `wrap_kvs`, zoddal provides `wrapKvs`: + +```typescript +function wrapKvs( + store: KvStore, + transforms: { + // Key transforms + keyOfId?: (id: K1) => K2; // outgoing: storage key → interface key + idOfKey?: (key: K2) => K1; // incoming: interface key → storage key + // Value transforms + objOfData?: (data: V1) => V2; // outgoing: raw data → domain object + dataOfObj?: (obj: V2) => V1; // incoming: domain object → raw data + // Key-conditioned transforms (analog to preset/postget) + postget?: (key: K2, data: V1) => V2; + preset?: (key: K2, obj: V2) => V1; + } +): KvStore +``` + +**Example**: Adapting a REST API to a typed KV store: + +```typescript +const rawApiStore: KvStore = restAdapter('/api/users'); + +const userStore = wrapKvs(rawApiStore, { + // Keys: external IDs stay as strings + // Values: validate with Zod schema + objOfData: (raw) => UserSchema.parse(raw), + dataOfObj: (user) => UserSchema.partial().parse(user), +}); +``` + +--- + +## Codec Pattern in TypeScript + +The `Codec` pattern from dol maps cleanly: + +```typescript +interface Codec { + encode: (decoded: Decoded) => Encoded; + decode: (encoded: Encoded) => Decoded; +} + +// Compose codecs (like dol's Codec + operator) +function composeCodecs( + first: Codec, + second: Codec +): Codec { + return { + encode: (a) => second.encode(first.encode(a)), + decode: (c) => first.decode(second.decode(c)), + }; +} + +// ValueCodec: wraps a store with a value codec +class ValueCodec implements Codec { + constructor(public encode: (d: Decoded) => Encoded, + public decode: (e: Encoded) => Decoded) {} + + wrap(store: KvStore): KvStore { + return wrapKvs(store, { objOfData: this.decode, dataOfObj: this.encode }); + } +} + +// KeyCodec: wraps a store with a key codec +class KeyCodec implements Codec { + constructor(public encode: (k: OuterKey) => InnerKey, + public decode: (id: InnerKey) => OuterKey) {} + + wrap(store: KvStore): KvStore { + return wrapKvs(store, { idOfKey: this.encode, keyOfId: this.decode }); + } +} +``` + +**Ready-made codecs:** +```typescript +const Codecs = { + json: new ValueCodec(JSON.stringify, JSON.parse), + zodValidated: (schema: z.ZodType) => + new ValueCodec( + (v) => v, // no encoding on write (validated upstream) + (raw) => schema.parse(raw) // decode = validate + parse + ), + urlEncoded: new KeyCodec(encodeURIComponent, decodeURIComponent), + pathPrefixed: (prefix: string) => + new KeyCodec( + (k) => `${prefix}/${k}`, + (id) => id.slice(prefix.length + 1), + ), +}; +``` + +--- + +## Built-In Adapters + +### In-Memory Adapter (for testing) + +```typescript +function memStore(initial?: Map): KvStore { + const map = initial ?? new Map(); + return { + get: async (k) => map.get(k), + set: async (k, v) => { map.set(k, v); }, + delete: async (k) => { map.delete(k); }, + has: async (k) => map.has(k), + keys: async () => [...map.keys()], + values: async () => [...map.values()], + entries: async () => [...map.entries()], + head: async () => map.size > 0 ? [map.keys().next().value, map.values().next().value] : undefined, + }; +} +``` + +### REST Adapter + +```typescript +function restAdapter(baseUrl: string): MutableKvStore { + return { + get: async (id) => { + const res = await fetch(`${baseUrl}/${id}`); + if (res.status === 404) return undefined; + return res.json(); + }, + set: async (id, value) => { + await fetch(`${baseUrl}/${id}`, { + method: 'PUT', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify(value), + }); + }, + delete: async (id) => { + await fetch(`${baseUrl}/${id}`, { method: 'DELETE' }); + }, + has: async (id) => { + const res = await fetch(`${baseUrl}/${id}`, { method: 'HEAD' }); + return res.ok; + }, + keys: async () => { + const res = await fetch(baseUrl); + const items: T[] = await res.json(); + return items.map((item: any) => item.id); + }, + values: async () => { + const res = await fetch(baseUrl); + return res.json(); + }, + entries: async () => { + const res = await fetch(baseUrl); + const items: T[] = await res.json(); + return items.map((item: any) => [item.id, item] as [string, T]); + }, + head: async () => { + const res = await fetch(`${baseUrl}?limit=1`); + const items: T[] = await res.json(); + if (!items.length) return undefined; + const item = items[0] as any; + return [item.id, item] as [string, T]; + }, + update: async (id, patch) => { + const res = await fetch(`${baseUrl}/${id}`, { + method: 'PATCH', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify(patch), + }); + return res.json(); + }, + }; +} +``` + +### IndexedDB Adapter + +```typescript +function indexedDbStore(dbName: string, storeName: string): KvStore { + // ... opens IDB connection, implements get/set/delete/keys via IDB transactions +} +``` + +--- + +## Bridge to `zod-collection-ui`'s `DataProvider` + +The `DataProvider` interface (used by `zod-collection-ui`) can be derived from a `KvStore` + a Zod schema: + +```typescript +function kvStoreToDataProvider( + store: MutableKvStore, + schema: z.ZodType, +): DataProvider { + return { + getList: async ({ sort, filter, search, pagination }) => { + const entries = await store.entries(); + let items = entries.map(([, v]) => v); + + // Apply filtering, sorting, search in-memory + // (or delegate to store if it supports server-side queries) + if (filter) items = applyFilter(items, filter); + if (search) items = applySearch(items, search); + if (sort) items = applySort(items, sort); + + const total = items.length; + if (pagination) { + const { page, pageSize } = pagination; + items = items.slice((page - 1) * pageSize, page * pageSize); + } + return { data: items, total }; + }, + getOne: async (id) => { + const item = await store.get(id); + if (!item) throw new Error(`Not found: ${id}`); + return item; + }, + create: async (data) => { + const item = schema.parse({ ...data, id: crypto.randomUUID() }); + await store.set(item.id, item); + return item; + }, + update: async (id, data) => { + const existing = await store.get(id); + if (!existing) throw new Error(`Not found: ${id}`); + const updated = schema.parse({ ...existing, ...data }); + await store.set(id, updated); + return updated; + }, + delete: async (id) => { + await store.delete(id); + }, + }; +} +``` + +This bridge means: **define your storage once as a `KvStore`, get a `DataProvider` for free**. + +--- + +## Zod as the Interface Definition Language + +In Python, `dol` relies on type hints and `collections.abc` protocols to define interfaces. In the frontend, **Zod schemas** play this role: + +- Zod schemas describe the *shape* of domain objects (what Python type hints do) +- Zod `.meta()` annotations describe *affordances* (what Python docstrings/comments do) +- `defineCollection(schema)` derives a full UI + storage configuration (analogous to `wrap_kvs` deriving a full store from transforms) + +```typescript +// Python dol analogy: +// schema ≈ Python class with type hints +// .meta() ≈ docstring with behavioral hints +// wrapKvs ≈ wrap_kvs + +const UserSchema = z.object({ + id: z.string().uuid().meta({ editable: false }), + name: z.string().meta({ sortable: true, searchable: true }), + email: z.string().email().meta({ filterable: 'exact' }), + role: z.enum(['admin', 'user']).meta({ filterable: 'select', groupable: true }), +}); + +// Storage layer: derive a KV store +const userStore = wrapKvs( + restAdapter('/api/users'), + { objOfData: (raw) => UserSchema.parse(raw) } +); + +// UI layer: derive a collection definition +const userCollection = defineCollection({ + schema: UserSchema, + store: userStore, // zoddal bridge: store IS the data provider +}); +``` + +--- + +## The DRY Principle Across Layers + +The key insight is that both the storage layer and the UI layer need the same fundamental operations — list, get, set, delete — just with different ergonomics: + +| Operation | Storage KV | UI DataProvider | +|-----------|-----------|----------------| +| List all | `store.keys()` | `getList({ pagination })` | +| Get one | `store.get(id)` | `getOne(id)` | +| Create | `store.set(id, value)` | `create(data)` | +| Update | `store.set(id, {...existing, ...patch})` | `update(id, patch)` | +| Delete | `store.delete(id)` | `delete(id)` | + +The differences (pagination, sorting, filtering at the UI level; raw bytes/URLs at the storage level) are handled by the transform layers and the bridge function — **not by duplicating the core CRUD logic**. + +--- + +## Capability Discovery + +Mirroring the `CollectionCapabilities` concept from the Cosmograph resource design: + +```typescript +interface StoreCapabilities { + canCreate: boolean; + canUpdate: boolean; + canDelete: boolean; + canList: boolean; + supportsServerSideFilter: boolean; + supportsServerSideSort: boolean; + supportsServerSidePagination: boolean; + maxPageSize?: number; +} + +// Store can declare its capabilities +interface CapableKvStore extends KvStore { + capabilities(): Promise; +} +``` + +The `kvStoreToDataProvider` bridge reads capabilities to decide whether to do server-side or client-side filtering/sorting. + +--- + +## Proposed `zoddal` Package Architecture + +``` +zoddal/ +├── core/ +│ ├── types.ts KvReader, KvStore, MutableKvStore, Codec, StoreCapabilities +│ ├── wrap.ts wrapKvs(), ValueCodec, KeyCodec, composeCodecs() +│ └── codecs.ts Codecs.json, Codecs.zodValidated, Codecs.urlEncoded, etc. +│ +├── adapters/ +│ ├── memory.ts memStore() — in-memory, for testing +│ ├── rest.ts restAdapter() — REST/HTTP with fetch +│ ├── indexeddb.ts indexedDbStore() — browser IndexedDB +│ └── localStorage.ts localStorageStore() — simple key-value +│ +├── bridge/ +│ └── dataProvider.ts kvStoreToDataProvider() — bridge to zod-collection-ui +│ +└── index.ts public exports +``` + +**Usage at a glance:** + +```typescript +import { restAdapter, wrapKvs, Codecs, kvStoreToDataProvider } from 'zoddal'; +import { defineCollection } from 'zod-collection-ui'; + +// 1. Define schema +const PostSchema = z.object({ + id: z.string(), + title: z.string().meta({ sortable: true, searchable: true }), + content: z.string().meta({ editable: true }), + status: z.enum(['draft', 'published']).meta({ filterable: 'select' }), +}); + +// 2. Build storage store (one line) +const postStore = wrapKvs(restAdapter('/api/posts'), { + objOfData: (raw) => PostSchema.parse(raw), +}); + +// 3. Bridge to UI (one line) +const postDataProvider = kvStoreToDataProvider(postStore, PostSchema); + +// 4. Define collection (plugs into any zod-collection-ui renderer) +const postCollection = defineCollection({ + schema: PostSchema, + dataProvider: postDataProvider, + affordances: { create: true, delete: true, search: true }, +}); +``` + +--- + +## Differences from Python dol + +| Aspect | Python dol | zoddal | +|--------|-----------|--------| +| All operations | Synchronous | `async`/`Promise`-based | +| Key type | Any hashable | Typically `string` (URL path segments) | +| Interface definition | `collections.abc` ABCs | TypeScript interfaces | +| Schema language | Type hints + ABCs | Zod schemas | +| Codec composition | `+` operator on `Codec` | `composeCodecs()` function | +| Test backend | `dict` | `memStore()` | +| "Russian dolls" | `wrap_kvs()` stacking | `wrapKvs()` stacking | +| UI layer bridge | N/A (separate concern) | `kvStoreToDataProvider()` | + +--- + +## Open Questions for zoddal Design + +1. **Async iteration**: Should `keys()` return `Promise` (all at once) or `AsyncIterable` (streaming)? The latter is more general but more complex to use in React. + +2. **Optimistic updates**: The KV model is pull-based (fetch → display). For optimistic UI updates, a separate in-memory overlay store (like dol's `WriteBackChainMap`) could be used before the async write completes. + +3. **Reactivity**: A `KvStore` is not reactive by itself. A thin reactive wrapper (using `zustand` or signals) around the store would allow React components to subscribe to changes. + +4. **Error handling**: Should `get(key)` throw on missing (like Python's `KeyError`) or return `undefined`? The `undefined` model is more idiomatic in TypeScript. The `zoddal` proposal uses `undefined` for missing keys. + +5. **Bulk operations**: `updateMany`, `deleteMany` are common in UI contexts but not in the minimal KV interface. Should these be optional methods on `MutableKvStore`? + +6. **Server-side queries**: The `DataProvider.getList` interface supports server-side filtering, but the base `KvStore.keys()` doesn't. The bridge uses capability discovery to decide. Alternatively, add an optional `query(params)` method to stores that support it. diff --git a/misc/docs/general_design.md b/misc/docs/general_design.md new file mode 100644 index 00000000..5d566661 --- /dev/null +++ b/misc/docs/general_design.md @@ -0,0 +1,222 @@ +# dol: General Design and Architecture + +## What dol Is + +`dol` is a toolkit for building **Data Object Layers** — uniform, dict-like interfaces to any storage backend. The core idea: separate *what data operations your domain needs* from *how those operations are implemented in a specific backend*. + +This places dol in the family of: +- **Data Access Object (DAO)** — objects that abstract storage operations +- **Repository Pattern** — domain-facing interface to a collection of entities +- **Hexagonal Architecture (Ports & Adapters)** — the "port" is the KV interface; adapters are backend implementations + +But dol has a distinctive orientation: it is **middleware**, not domain logic and not backend infrastructure. It provides a common language — the key-value (KV) interface — that both domain code and backend adapters can speak. + +--- + +## The Key Insight: Language-Native Interfaces + +Most languages have a first-class "mapping" concept (Python's `dict`, JavaScript's `Map`, Java's `Map`). Rather than inventing a new CRUD API, dol maps storage operations onto this native interface: + +| Storage operation | KV interface | +|---|---| +| Read item by key | `store[key]` (`__getitem__`) | +| Write item at key | `store[key] = value` (`__setitem__`) | +| Delete item | `del store[key]` (`__delitem__`) | +| List all keys | `for k in store` (`__iter__`) | +| Count items | `len(store)` (`__len__`) | +| Check existence | `key in store` (`__contains__`) | + +Code using a `dol` store looks exactly like code using a dict. This means: +- No new API to learn — use the language's built-in mapping idioms +- Tests can use `dict` as a drop-in backend +- Tools that work on dicts (comprehensions, `update`, `copy`, etc.) work on stores + +--- + +## Interface Hierarchy: From Full to Minimal + +A full `MutableMapping` (Python's dict-like ABC) includes 14+ methods. dol defines a reduced, pragmatic hierarchy: + +``` +Collection ← __iter__, __contains__, __len__, head() + │ +KvReader ← + __getitem__, keys(), values(), items() [read-only] + │ +KvPersister ← + __setitem__, __delitem__ [read-write] + │ +Store ← + key/value transform hooks [configurable] +``` + +Key reductions from `MutableMapping`: +- **No `.clear()`** — too destructive for persistent storage; disabled by default +- **No guaranteed order** — backends vary; `__reversed__` raises `NotImplementedError` +- **`__len__` and `__contains__` via iteration** — correct by default; override for efficiency + +This hierarchy reflects what storage backends actually provide: list, get, set, delete — not necessarily atomic batch-clear or ordered traversal. + +--- + +## The Middleware Principle + +dol occupies the space *between* domain logic and storage backends: + +``` +Domain Code dol Layer Storage Backend +(business logic) (KV interface) (files, DB, S3...) + ┌──────────────────────┐ + store['user/42'] │ key transform │ /data/users/00042.json + store[key] │──────────────────────│ db.query("SELECT...") + value = json obj │ value transform │ bytes on disk + └──────────────────────┘ +``` + +The transforms are the core contribution: they let you define the interface your domain wants (clean keys, rich objects) while the backend stores what it needs to store (raw paths, serialized bytes). + +--- + +## The KV Transform Pipeline + +Every read and write passes through a pair of transformations: + +``` +READ: + key → [id_of_key] → internal_id → backend[id] → raw_data → [obj_of_data] → value + +WRITE: + key → [id_of_key] → internal_id + value → [data_of_obj] → raw_data → backend[id] = raw_data + +ITERATE: + backend.__iter__() → internal_id → [key_of_id] → key +``` + +The four transform functions (`id_of_key`, `key_of_id`, `obj_of_data`, `data_of_obj`) default to identity. You only implement what you need. + +**Example**: A store of JSON files where keys are relative paths without `.json` extension: + +``` +key: "user/42" + ↓ id_of_key: k → k + ".json" +id: "user/42.json" + ↓ backend read +data: b'{"name": "Alice", "age": 30}' + ↓ obj_of_data: json.loads +obj: {"name": "Alice", "age": 30} +``` + +### The `preset`/`postget` Extension + +Sometimes the value transform needs to know the key — for example, to choose the right serializer based on file extension. Two additional transforms handle this: + +- `preset(key, value) → raw_data` — applied on write, key-aware +- `postget(key, raw_data) → value` — applied on read, key-aware + +```python +def postget(k, v): + if k.endswith('.json'): return json.loads(v) + if k.endswith('.pkl'): return pickle.loads(v) + return v +``` + +--- + +## Layered Composition (Russian Dolls) + +The name "dol" evokes Russian dolls: layers of wrappers, each adding a transformation. A store is built by stacking layers: + +``` + raw_backend (dict, files, S3, DB...) + │ + key_transform (strip prefix, add extension) + │ + value_transform (serialize/deserialize) + │ + filter_layer (hide internal keys) + │ + cache_layer (in-memory cache) + │ + domain_store (what domain code sees) +``` + +Each layer is independent and composable. You can add, remove, or swap layers without touching domain code or the backend. + +The primary tool for adding layers is `wrap_kvs` (see [python_design.md](python_design.md) for details): + +```python +from dol import wrap_kvs + +# Add json serialization to any store +JsonStore = wrap_kvs(dict, + obj_of_data=json.loads, + data_of_obj=json.dumps, +) + +# Add prefix to all keys +PrefixedStore = wrap_kvs(dict, + id_of_key=lambda k: f"prefix/{k}", + key_of_id=lambda id: id[len("prefix/"):], +) + +# Stack both layers +store = PrefixedStore() +store = wrap_kvs(store, obj_of_data=json.loads, data_of_obj=json.dumps) +``` + +--- + +## Caching as a First-Class Concern + +dol treats caching not as an optimization afterthought but as a composable layer: + +- **Key caching** — cache iteration results (`cached_keys`) +- **Value caching** — cache fetched values (`cache_vals`, `WriteBackChainMap`) +- **Method caching** — cache expensive property/method results (`cache_this`, `store_cached`) +- **Write-back caching** — reads from fast cache, writes through to slow backend + +The cache backend is itself a store — enabling persistent caches, distributed caches, or custom eviction strategies. + +--- + +## Store Composition Patterns + +Beyond single-store wrapping, dol supports multi-store composition: + +| Pattern | Class | Behavior | +|---------|-------|----------| +| Union view | `FlatReader` | Merge multiple stores into one flat view | +| Fan-out reads | `FanoutReader` | Reads return dict of results from all stores | +| Fan-out writes | `FanoutPersister` | Writes go to all stores simultaneously | +| Cascaded | `CascadedStores` | Writes to all, reads from first available | + +--- + +## Design Benefits + +1. **Testability**: Develop with `dict` as backend; swap to real storage when ready +2. **Portability**: Same domain code works with S3, files, MongoDB, SQLite — just change the bottom layer +3. **Incrementalism**: Start with raw backend access; add transform layers progressively +4. **Separation of concerns**: Key format, serialization, filtering, caching are separate layers +5. **Composability**: Layers can be mixed and matched freely +6. **Discoverability**: One interface (`Mapping`) — IDEs and agents know what to expect + +--- + +## What dol Is Not + +- **Not an ORM**: dol doesn't map objects to relational tables. It maps Python values to opaque storage cells. +- **Not a query engine**: dol doesn't support complex queries (filter by field, join, aggregate). That's the backend's job. dol provides list+get+set+delete only. +- **Not domain-driven**: dol stores are intentionally domain-agnostic. The domain meaning lives in the code that uses the store, not in the store itself. +- **Not a schema validator**: dol doesn't enforce data schemas. Validation is a layer you add (e.g., via `data_of_obj`). + +--- + +## Relation to Existing Patterns + +| Pattern | Similarity | Difference | +|---------|-----------|------------| +| Repository Pattern | Abstracts storage; testable | dol is not domain-driven; uses KV not domain methods | +| DAO | Wraps storage operations | dol focuses on KV interface specifically | +| Active Record | Object knows how to store itself | dol is separate from domain objects | +| Hexagonal | Ports & adapters | dol is specifically the "storage port" | +| Decorator Pattern | Wraps objects adding behavior | dol uses this structurally for transform layers | +| Adapter Pattern | Converts one interface to another | Each dol wrapper is an adapter | diff --git a/misc/docs/generate_llms_txt_instruction.md b/misc/docs/generate_llms_txt_instruction.md new file mode 100644 index 00000000..906f6c21 --- /dev/null +++ b/misc/docs/generate_llms_txt_instruction.md @@ -0,0 +1,47 @@ +# Instruction: Generate `llms.txt` and `llms-full.txt` + +Read this codebase and produce two files that help AI agents **use** this package as a tool/dependency (not contribute to it). + +## Step 1: Understand the package + +- Read `pyproject.toml` (or `setup.cfg`) for: package name, summary, dependencies, entry points. +- Read `__init__.py` to identify the **public API** (i.e. what's in `__all__`, or all non-underscore top-level names). +- Read docstrings and type hints of every public function/class. +- Read doctests and `tests/` for **usage examples**. +- Read `README.md` if present. + +## Step 2: Produce `llms.txt` + +Follow the spec at https://llmstxt.org. The file must contain, in order: + +1. **H1**: Package name. +2. **Blockquote**: One-paragraph summary — what the package does, when to use it, key concepts. +3. **Body notes** (no headings): Important caveats, gotchas, design philosophy — things an agent must know before using the package. Include: + - Core abstractions (e.g. "all stores are `MutableMapping`"). + - Common patterns and idioms. + - What this package is **not** (prevent misuse). +4. **`## Core API`**: Links to markdown docs for the most-used functions/classes. If no hosted docs exist, use relative paths like `src/pkg/module.py` or GitHub raw URLs. +5. **`## Examples`**: Links to example scripts, notebooks, or test files that demonstrate typical usage. +6. **`## Optional`**: Links to advanced/niche docs an agent can skip for basic usage. + +Keep it **under 4K tokens**. Prefer terse, expert-level language — the reader is an LLM, not a beginner. + +## Step 3: Produce `llms-full.txt` + +A single markdown file containing **everything an agent needs to use the package**, concatenated in this order: + +1. The content of `llms.txt` (as the header/overview). +2. For **each public module**, a section (`## module_name`) containing: + - Module docstring. + - For each public function/class: signature, docstring, and **one usage example** (prefer doctests; fall back to test cases). +3. Any additional notes from README that weren't already covered. + +Format function signatures as fenced code blocks. Keep docstrings verbatim — don't paraphrase. Strip internal helpers (single-underscore functions) unless they appear in public docstrings or examples. + +## Quality checklist + +- [ ] An agent reading only `llms.txt` can answer: "What does this package do? Should I use it for X?" +- [ ] An agent reading `llms-full.txt` can write correct code using this package **without** accessing the source. +- [ ] No broken links. No placeholder text. +- [ ] `llms.txt` fits comfortably in a small context window (~4K tokens). +- [ ] `llms-full.txt` includes real, runnable examples for every major public function. diff --git a/misc/docs/issues_and_discussions.md b/misc/docs/issues_and_discussions.md new file mode 100644 index 00000000..d72375fe --- /dev/null +++ b/misc/docs/issues_and_discussions.md @@ -0,0 +1,263 @@ +# dol: Issues and Discussions — Themes and Insights + +This document summarizes the major themes from GitHub issues and discussions in the [i2mint/dol](https://github.com/i2mint/dol) repository. The emphasis is on **design and architecture** themes, since many issues are dev/design discussions rather than bug reports. + +Sources: GitHub issues (as of early 2026) and GitHub discussions. + +--- + +## Theme 1: `wrap_kvs` Design Tensions + +The single most recurring topic. `wrap_kvs` is central to dol but its current design has acknowledged problems. + +### 1a. Signature-based conditioning (Issue #9, Discussion #34) + +**The problem**: `wrap_kvs` uses the *signature* of the transform function to decide how to apply it — specifically, whether the function receives `(data)` or `(self, data)`. This causes divergent behavior for functionally equivalent inputs: + +```python +# These should behave identically, but don't: +wrap_kvs(store, obj_of_data=lambda x: bytes.decode(x)) # works +wrap_kvs(store, obj_of_data=bytes.decode) # fails! +``` + +**Root cause**: The code checks whether `obj_of_data` has 1 or 2+ required args, and applies it as `obj_of_data(data)` or `obj_of_data(self, data)` accordingly. This "Postelization" (being liberal in what you accept) leads to bugs. + +**Discussion #34** ("Clean way of Postelizing callbacks") proposes a more principled solution: use an explicit marker (e.g., a `Literal` type or wrapper class) to signal "this function takes `self`", instead of inferring it from the signature. + +**Proposed fix (Issue #12)**: A `FirstArgIsMapping` literal to mark functions that need the store instance as their first argument, removing the need for signature inspection. + +**Status**: Open. The design for fixing this cleanly without breaking changes is actively discussed. + +### 1b. `self` not being the wrapped instance (Issue #18) + +When a method inside a `wrap_kvs`-decorated class calls `self[key]`, `self` is the unwrapped instance (the inner class), not the outer wrapped class. This means the transform pipeline is bypassed for in-class `self[k]` calls. + +**Workaround** shown in issue: re-apply `wrap_kvs` to `self` inside the method, or pass the wrapped instance explicitly. + +**Impact**: Affects any class that uses `wrap_kvs` as a class decorator and then uses `self[k]` internally. + +### 1c. Recursively applying wrappers (Issue #10) + +`wrap_kvs` and all wrappers only apply to the "top level" of a store. If the store contains nested stores (a store of stores), the wrap doesn't propagate to values: + +```python +s = add_path_access({'a': {'b': {'c': 42}}}) +s['a', 'b', 'c'] # works (top-level wrap applied) +s['a']['b', 'c'] # fails (returned value is plain dict, not wrapped) +``` + +The issue proposes a `conditional_data_trans` pattern to recursively apply wrapping to values that match a condition (e.g., "if the value is a Mapping, wrap it too"). A prototype is shown: + +```python +add_path_access_if_mapping = conditional_data_trans( + condition=instance_checker(Mapping), + data_trans=add_path_access, +) +``` + +**Status**: Partially implemented, but the general mechanism for recursively applying wrappers across levels is still a design open question. + +--- + +## Theme 2: The Builtin Codec Ecosystem + +### Discussion #42 / Issue #42: Quick access to builtin codecs + +A repeated need: users want ready-to-use codecs for common Python stdlib operations (json, pickle, csv, gzip, base64, etc.) without having to manually construct `wrap_kvs(store, obj_of_data=json.loads, data_of_obj=json.dumps)` every time. + +**Resolution**: `dol.kv_codecs.ValueCodecs` and `KeyCodecs` namespaces were added. This became one of the most used parts of the library: + +```python +from dol import ValueCodecs, KeyCodecs +store = Pipe(KeyCodecs.suffixed('.pkl'), ValueCodecs.pickle())(dict) +``` + +### Issue #47: Simpler "affix" key codecs + +`KeyCodecs.suffixed()` uses `KeyTemplate` internally, which is overkill for simple prefix/suffix operations. Proposal: use simpler string methods (slice, `startswith`/`endswith`) for these common cases. + +**Status**: Open. A minor efficiency/simplicity improvement. + +--- + +## Theme 3: Key Transformation Framework + +### Discussion #27: The need for a key transformation framework + +The KV abstraction works well for flat key spaces, but real backends often have structured keys (paths, composite keys, namespace prefixes). Multiple discussions converge on the need for a proper key transformation framework: + +- **KeyPath**: tuples/strings to represent hierarchical paths +- **KeyTemplate**: parse/format structured key strings (`'{user}/{date}.json'`) +- **Prefix filtering**: show only keys starting with a prefix (subpath filtering) +- **Issue #43**: Request for `KeyTemplate` as the "swiss army knife" of key wrappers + +**Discussion #32** (Subpath filtering in path-keyed stores): A common use case is "give me a sub-store for keys starting with X". Related to filesystem navigation patterns. + +**Discussion #21**: Cleanup and centralization of path access functionality — multiple overlapping implementations exist (`path_get`, `_path_get`, `KeyPath`, `KeyTemplate`). + +**Status**: `KeyTemplate` exists and works. `KeyPath` exists. Subpath filtering exists via `filt_iter`. But no unified "path store API" has been formalized. + +--- + +## Theme 4: Caching and Performance + +### Issue #50: Stacking `cache_this` decorators + +`cache_this` is powerful, but stacking multiple `@cache_this` decorators on the same method causes problems (cache invalidation, key conflicts). Discussion of how to compose caching correctly. + +### Issue #56: Fast `update` and synching + +`store.update(other_store)` uses the generic Python MutableMapping implementation: iterate over `other_store`, write each item. For stores with millions of items or remote backends, this is very slow. + +The proposal: enable backend-specific fast sync mechanisms. For example, `sshdol` has `ssh_files.sync_to(local_files)` using `rsync`. The challenge: how to expose this via the standard `update` interface when the two sides may know nothing about each other. + +**Design tension**: maintaining the clean Mapping interface vs. allowing optimized protocol negotiation between stores. A possible approach: detect if both stores have a shared "fast sync" protocol (duck typing or registration). + +**Status**: Open. No general solution. Backend-specific workarounds exist. + +--- + +## Theme 5: Context Managers and Transaction-Like Semantics + +### Discussion #49: Context managers in dol + +Two recurring patterns where context managers are needed: + +1. **Connection lifecycle**: stores that need `connect()`/`disconnect()` (databases, remote APIs). The KV interface hides the connection, but someone has to manage it. + +2. **Batching/transactions**: write operations accumulate in a buffer and are sent as a batch when the context exits. Useful for performance and atomicity. + +The discussion notes the tension: exposing context managers breaks the "just use it like a dict" simplicity. Solutions proposed: +- `flush_on_exit` decorator (already in `caching.py`) +- Store-level `__enter__`/`__exit__` that batch writes +- Explicit "session" objects that wrap stores + +**Status**: `flush_on_exit` exists. No general transaction/batching pattern is standardized. + +--- + +## Theme 6: Composite and Hierarchical Stores + +### Discussion #25: Composite Stores + +"A store made of other stores" — a recurring architectural need. Variations: +- **Fan-out store**: writes go to all sub-stores, reads come from the first that has the key +- **Layered store**: like ChainMap, reads fall through to next store on miss +- **Segmented store**: different keys go to different backends +- **Nested store**: values are themselves stores + +**Status**: `FanoutReader`, `FanoutPersister`, `CascadedStores`, `FlatReader` exist. No general "store mesh" framework. + +### Discussion #19: Permute levels of nested mappings + +Need to flip/reorder the levels of nested dicts — analogous to a `groupby` for nested structures. Example: a `dict[user][date]` that needs to be accessed as `dict[date][user]`. + +### Discussion #20: Generalize `FlatReader` to multiple levels + +`FlatReader` currently flattens only two levels. Requested: arbitrary-depth flattening. + +--- + +## Theme 7: Batch/Paging/Chunking Operations + +### Discussion #29: Paging tools + +Chunked iteration (reading 1000 items at a time), batch writes, streaming reads — these come up constantly with large data stores and remote backends. The Mapping interface doesn't have a natural "paging" concept (`__iter__` always yields all keys). + +Proposals: +- `chunked_items(store, chk_size)` utility +- Stores that implement a `_page_` method that `filt_iter` can delegate to +- Discussion of how to push filtering/pagination down to the backend + +**Status**: Utility functions exist in the ecosystem, but no standard in `dol` core. + +--- + +## Theme 8: Interface Extension and Customization + +### Discussion #34 / Issue #12: When transform functions need access to `self` + +The standard `obj_of_data(data)` signature is sufficient for most transforms. But sometimes the transform needs context from the store (e.g., its configuration, its root path). The current approach of inferring this from the signature is problematic. + +**Proposed design**: A `FirstArgIsMapping` literal marker, so users explicitly opt in to the `(self, data)` calling convention: + +```python +from dol.trans import FirstArgIsMapping + +wrap_kvs(store, obj_of_data=FirstArgIsMapping(lambda self, data: self.root / data)) +``` + +### Discussion #24: Include hooks for optimized operations + +The idea of "hooks" — special methods on the store that dol's tooling checks before falling back to the generic implementation. Example: if a store has a `_filter_` method, `filt_iter` should use it instead of Python-level iteration. This would enable pushing operations like filtering and sorting down to the backend. + +This is analogous to how `__len__` makes `len()` efficient even when `__iter__` would work. + +--- + +## Theme 9: Documentation and Discoverability + +### Issue #1: Documentation ideas (open, long-running) + +A running wishlist for documentation improvements: +- More examples of `wrap_kvs` combinations +- Step-by-step tutorials for common patterns (migrate data between backends, add serialization layer) +- Better documentation of the `kv_walk` utility + +### Issue #22: `kv_walk` docs and recipes + +`kv_walk` — recursive iteration over nested stores — is powerful but underdocumented. Multiple use cases need recipes. + +### Discussion #51: dol examples and applications in the wild + +A collection thread for real-world usages and code snippets. + +### Discussion #48: Conversations with AI about dol + +A thread recording Q&A with AI chatbots about dol. Interesting as a signal of where the AI knowledge gaps are (and where documentation needs improvement). + +### Discussion #55: AI-enhanced assistance + +Proposals for improving AI agent assistance with dol — relevant to the `CLAUDE.md` and llms.txt effort. + +--- + +## Theme 10: Cross-Platform and Compatibility + +### Issue #58, #52: Windows compatibility + +Several tests fail on Windows due to: +- Path separator differences (`/` vs `\`) +- Temp file handling +- Regex patterns with backslashes (Issue #40: `re.error: incomplete escape \U at position 2`) + +### Issue #59 (CLOSED): Python 3.12 compatibility + +Fixed. dol now works with Python 3.12. + +--- + +## Summary: Open Design Questions + +| Question | Location | Status | +|---------|----------|--------| +| How should transform functions signal they need `self`? | Issue #12, Discussion #34 | Open | +| How to support fast `update`/sync between heterogeneous stores? | Issue #56 | Open | +| How to handle context managers / transactions generically? | Discussion #49 | Partial | +| Should wrappers propagate recursively to nested values? | Issue #10 | Partial | +| How to push filtering/pagination to the backend? | Discussion #24, #29 | Open | +| How to unify the path access/key transformation utilities? | Discussion #21, #27 | Partial | + +--- + +## Notable Closed Issues (Design Completions) + +| Issue | What Was Done | +|-------|--------------| +| #42 | `ValueCodecs` / `KeyCodecs` namespaces created | +| #43 | `KeyTemplate` implemented in `paths.py` | +| #36 | `appendable.py` module with `Extendible` pattern | +| #7 | `__hash__` added to `Store` | +| #8 | `FlatReader` refactored and stabilized | +| #47 | Simpler affix codecs (partially addressed) | +| #59 | Python 3.12 compatibility fixed | diff --git a/misc/docs/python_design.md b/misc/docs/python_design.md new file mode 100644 index 00000000..b227340d --- /dev/null +++ b/misc/docs/python_design.md @@ -0,0 +1,539 @@ +# dol: Python Design and Architecture + +This document describes the Python-specific implementation of dol's design. For the language-agnostic concepts, see [general_design.md](general_design.md). + +--- + +## Class Hierarchy + +``` +collections.abc.Collection (ABC: __iter__, __contains__, __len__) + │ +dol.base.Collection (adds head(), default __len__/__contains__ via iteration) + │ +dol.base.KvReader (aka Reader) (adds __getitem__, keys(), values(), items(); removes __reversed__) + │ +dol.base.KvPersister (aka Persister) (adds __setitem__, __delitem__; disables clear()) + │ +dol.base.Store (adds 4 transform hooks; wraps an inner store) +``` + +All classes inherit from `collections.abc` ABCs, so they satisfy `isinstance` checks and abc registration. + +### Key design notes on the hierarchy + +- `Collection.head()` — returns `next(iter(self.items()))` or `next(iter(self))`. Useful for quick inspection without knowing any key. +- `KvReader.__reversed__` — explicitly raises `NotImplementedError`. Rationale: not all backends have a natural order; forcing the interface to pretend otherwise would be misleading. +- `KvPersister.clear = _disabled_clear_method` — clear is disabled by default, because wiping a persistent store accidentally is catastrophic. Subclasses can re-enable it explicitly. +- `MappingViewMixin` — provides pluggable `KeysView`, `ValuesView`, `ItemsView` classes. Override the *class attribute* (e.g., `MyStore.KeysView = MyKeysView`) to customize view behavior without overriding `.keys()`. + +--- + +## The Store Class: Transform Hooks + +`Store` is the central class. It wraps an inner store object (`self.store`) and intercepts reads/writes through 4 hook methods: + +```python +class Store(KvPersister): + _id_of_key = static_identity_method # outer key → inner key + _key_of_id = static_identity_method # inner key → outer key + _data_of_obj = static_identity_method # outer value → stored data + _obj_of_data = static_identity_method # stored data → outer value + + def __getitem__(self, k): + _id = self._id_of_key(k) + data = self.store[_id] + return self._obj_of_data(data) + + def __setitem__(self, k, obj): + _id = self._id_of_key(k) + data = self._data_of_obj(obj) + self.store[_id] = data + + def __iter__(self): + yield from (self._key_of_id(_id) for _id in self.store) +``` + +The hooks default to identity (no-op), so `Store(dict())` behaves exactly like a dict. You inject transforms by: +1. Subclassing and overriding hook methods +2. Assigning callables directly to hook names on the instance or class +3. Using `wrap_kvs` (the recommended approach for most cases) + +**The naming convention** `X_of_Y` means "get X given Y" — identical to mathematical function notation. This is explicit about directionality: `id_of_key` converts a key to an id; `key_of_id` converts an id to a key. + +--- + +## `wrap_kvs`: The Core Transformation Function + +Located in `dol/trans.py:1801`. The most important function in the library. + +```python +@store_decorator +def wrap_kvs( + store=None, + *, + # Key transforms + key_of_id=None, # outgoing: inner_id → outer_key (for __iter__) + id_of_key=None, # incoming: outer_key → inner_id (for __getitem__, __setitem__, __delitem__) + # Value transforms + obj_of_data=None, # outgoing: stored_data → python_obj (for __getitem__) + data_of_obj=None, # incoming: python_obj → stored_data (for __setitem__) + # Key-conditioned value transforms + preset=None, # (key, obj) → data [on write, when value transform depends on key] + postget=None, # (key, data) → obj [on read, when value transform depends on key] + # Codec shortcuts + key_codec=None, # Codec with .encoder (id_of_key) and .decoder (key_of_id) + value_codec=None, # Codec with .encoder (data_of_obj) and .decoder (obj_of_data) + key_encoder=None, # alias for id_of_key + key_decoder=None, # alias for key_of_id + value_encoder=None, # alias for data_of_obj + value_decoder=None, # alias for obj_of_data + # Method transforms (advanced) + outcoming_key_methods=(), + outcoming_value_methods=(), + ingoing_key_methods=(), + ingoing_value_methods=(), + # Naming + name=None, + wrapper=None, # defaults to Store +): +``` + +### How `wrap_kvs` works + +It creates a new class (or wraps an instance) by applying the given transforms to the appropriate dunder methods. The `@store_decorator` decorator makes it work in 4 modes (see below). + +### `obj_of_data` vs `postget` + +| Feature | `obj_of_data` | `postget` | +|---------|--------------|-----------| +| Signature | `(data) → obj` | `(key, data) → obj` | +| Knows the key? | No | Yes | +| Use when | Same transform for all values | Transform depends on key (e.g., file extension) | + +Same distinction applies to `data_of_obj` vs `preset`. + +### Examples + +```python +from dol import wrap_kvs +import json, pickle + +# 1. Add JSON serialization +JsonStore = wrap_kvs(dict, obj_of_data=json.loads, data_of_obj=json.dumps) + +# 2. Add key prefix +PrefixedStore = wrap_kvs(dict, + id_of_key=lambda k: f"user:{k}", + key_of_id=lambda _id: _id[len("user:"):], +) + +# 3. Extension-based deserialization (key-conditioned) +MultiFormatStore = wrap_kvs(dict, + postget=lambda k, v: json.loads(v) if k.endswith('.json') else pickle.loads(v), + preset=lambda k, v: json.dumps(v) if k.endswith('.json') else pickle.dumps(v), +) + +# 4. Using codec shortcuts +from dol.trans import ValueCodec +pickle_codec = ValueCodec(encoder=pickle.dumps, decoder=pickle.loads) +PickleStore = wrap_kvs(dict, value_codec=pickle_codec) + +# 5. Stacking layers (the "Russian dolls" pattern) +store = dict() +store = wrap_kvs(store, id_of_key=lambda k: k + '.json', key_of_id=lambda _id: _id[:-5]) +store = wrap_kvs(store, obj_of_data=json.loads, data_of_obj=json.dumps) +``` + +--- + +## `store_decorator`: The Meta-Decorator + +Located in `dol/trans.py:130`. Enables writing a class-transforming function once and using it in 4 ways: + +```python +@store_decorator +def my_deco(store=None, *, some_param='default'): + # Transform the store class or instance + ... + return transformed_store +``` + +The 4 usage modes: + +```python +# 1. Class decorator (no parens, uses defaults) +@my_deco +class MyStore(dict): ... + +# 2. Class decorator factory (with params) +@my_deco(some_param='custom') +class MyStore(dict): ... + +# 3. Instance decorator (wraps existing instance in a Store) +s = dict() +s_wrapped = my_deco(s) + +# 4. Instance decorator factory +wrap_with_custom = my_deco(some_param='custom') +s_wrapped = wrap_with_custom(s) +``` + +When decorating an **instance** (modes 3 and 4), `store_decorator` automatically wraps it in `Store` first, so the decorator always receives a class. + +### `double_up_as_factory` + +A related utility that upgrades a plain decorator to also work as a factory: + +```python +@double_up_as_factory +def my_deco(func=None, *, multiplier=2): + def wrapper(x): return func(x) * multiplier + return wrapper + +# Direct use: my_deco(f) +# Factory use: my_deco(multiplier=3)(f) +# As class deco: @my_deco(multiplier=3) +``` + +Constraint: first arg must default to `None`; all other args must be keyword-only. This is enforced at decoration time. + +--- + +## Codec Abstraction + +Located in `dol/trans.py:3362`. A `Codec` is a dataclass pairing an encoder and decoder: + +```python +@dataclass +class Codec(Generic[DecodedType, EncodedType]): + encoder: Callable[[DecodedType], EncodedType] + decoder: Callable[[EncodedType], DecodedType] + + def compose_with(self, other): ... # chain two codecs + def invert(self): ... # swap encoder/decoder + __add__ = compose_with + __invert__ = invert +``` + +**Subclasses** are callable and apply the codec to a store: + +```python +class ValueCodec(Codec): + def __call__(self, obj): + return wrap_kvs(obj, data_of_obj=self.encoder, obj_of_data=self.decoder) + +class KeyCodec(Codec): + def __call__(self, obj): + return wrap_kvs(obj, id_of_key=self.encoder, key_of_id=self.decoder) + +class KeyValueCodec(Codec): + def __call__(self, obj): + return wrap_kvs(obj, preset=self.encoder, postget=self.decoder) +``` + +Usage: + +```python +from dol.trans import ValueCodec +import json + +json_codec = ValueCodec(encoder=json.dumps, decoder=json.loads) +MyStore = json_codec(dict) # wrap dict with json serialization +``` + +The `kv_codecs.py` module provides ready-made codec factories in two namespaces: + +```python +from dol import ValueCodecs, KeyCodecs + +# Codec factories +pickle_codec = ValueCodecs.pickle() # ValueCodec(encoder=pickle.dumps, decoder=pickle.loads) +json_codec = ValueCodecs.json() +gzip_codec = ValueCodecs.gzip() +csv_codec = ValueCodecs.csv() + +suffix_codec = KeyCodecs.suffixed('.pkl') # adds/strips .pkl from keys + +# Compose with + +full_codec = ValueCodecs.pickle() + ValueCodecs.gzip() # pickle then gzip + +# Apply to store +MyStore = Pipe(KeyCodecs.suffixed('.pkl'), ValueCodecs.pickle())(dict) +``` + +--- + +## `Pipe`: Function Composition + +Located in `dol/util.py`. Chains functions left-to-right: + +```python +from dol import Pipe + +f = Pipe(json.dumps, str.encode, gzip.compress) +# f(obj) == gzip.compress(str.encode(json.dumps(obj))) +``` + +Codecs support `+` as `Pipe` for composition: + +```python +ValueCodecs.str_to_bytes() + ValueCodecs.gzip() +# = ValueCodec where encoder = gzip(str_to_bytes(x)) and decoder = str_from_bytes(gunzip(x)) +``` + +--- + +## `Sig`: Signature Calculus + +Located in `dol/signatures.py`. Rich signature manipulation: + +```python +from dol.signatures import Sig + +sig = Sig(my_func) +sig.names # list of parameter names +sig.defaults # dict of {name: default} +sig.annotations # dict of {name: type} + +# Arithmetic on signatures +new_sig = Sig(f) + ['extra_param'] + Sig(g) # merge signatures +new_sig = Sig(f) - ['verbose'] # remove parameter + +# Apply a signature to a function +@Sig(some_other_func) +def my_func(*args, **kwargs): ... +# my_func now has the signature of some_other_func +``` + +`Sig` is used internally throughout dol to: +- Compose signatures of transform functions for `wrap_kvs` +- Build the 4-way decorator signature in `store_decorator` +- Generate codec signatures in `kv_codecs.py` + +--- + +## Delegation Pattern + +`Store` uses the delegation pattern: it holds a reference to an inner store (`self.store`) and delegates all storage operations to it. Attribute access falls through via `__getattr__`: + +```python +def __getattr__(self, attr): + return getattr(object.__getattribute__(self, "store"), attr) +``` + +The `DelegatedAttribute` descriptor makes delegation explicit and works with pickling: + +```python +class DelegatedAttribute: + def __get__(self, instance, owner): + return getattr(getattr(instance, self.delegate_name), self.attr_name) + def __set__(self, instance, value): + setattr(getattr(instance, self.delegate_name), self.attr_name, value) +``` + +`delegator_wrap(delegator, obj)` creates a class/instance that delegates to `obj` via `delegator`. Used by `Store.wrap = classmethod(partial(delegator_wrap, delegation_attr='store'))`. + +--- + +## Caching Patterns (`caching.py`) + +### `cache_this` — property/method caching + +```python +from dol import cache_this + +class MyClass: + @cache_this + def expensive_property(self): # no args → cached_property behavior + return compute_expensive() + + @cache_this(cache={}) # explicit cache dict + def expensive_method(self, x, y): + return compute(x, y) + + @cache_this(cache='my_cache', ignore={'verbose'}) + def parameterized(self, data, mode='fast', verbose=False): + ... +``` + +The cache can be any Mapping — including a dol store, enabling persistent or distributed caches. + +### `store_cached` — function memoization + +```python +from dol import store_cached +import shelve + +@store_cached(shelve.open('my_cache')) # persisted cache +def slow_computation(x, y): + return ... +``` + +### `cache_vals` — store-level caching + +```python +from dol import cache_vals + +# Add an in-memory cache layer in front of a slow store +FastStore = cache_vals(SlowStore, cache=dict) +``` + +### `WriteBackChainMap` + +A ChainMap where writes go to the first (fast) store and reads fall through in order. Useful for layered cache hierarchies. + +--- + +## Composition Stores (`sources.py`) + +### `FlatReader` — flatten a store of stores + +```python +from dol.sources import FlatReader + +outer = {'A': {'x': 1, 'y': 2}, 'B': {'z': 3}} +flat = FlatReader(outer, key_func=lambda outer_k, inner_k: f"{outer_k}/{inner_k}") +list(flat) # ['A/x', 'A/y', 'B/z'] +``` + +### `FanoutReader` / `FanoutPersister` + +Reads/writes broadcast to multiple stores simultaneously. + +```python +from dol.sources import FanoutPersister + +s = FanoutPersister(local_store, remote_store) +s['key'] = value # writes to both stores +s['key'] # reads from first store that has the key +``` + +### `CascadedStores` + +Writes go to all stores; reads come from the first store that has the key. + +--- + +## Path Navigation (`paths.py`) + +For hierarchical/nested stores: + +```python +from dol import path_get, path_set, KeyPath, mk_relative_path_store + +d = {'a': {'b': {'c': 42}}} +path_get(d, ('a', 'b', 'c')) # 42 +path_set(d, ('a', 'b', 'd'), 99) + +# Convert a path store (full paths as keys) to a relative path store +RelativeStore = mk_relative_path_store(root='/data/users') +s = RelativeStore() +s['john/profile.json'] # reads /data/users/john/profile.json + +# KeyTemplate for structured key parsing +from dol.paths import KeyTemplate +kt = KeyTemplate('{user}/{year}/{month}.json') +kt.key_to_dict('john/2024/01.json') # {'user': 'john', 'year': '2024', 'month': '01'} +kt.dict_to_key({'user': 'john', 'year': '2024', 'month': '01'}) # 'john/2024/01.json' +``` + +--- + +## Design Critique and Alternatives + +### 1. ABC Inheritance vs. Protocols + +**Current approach**: Classes inherit from `collections.abc.Mapping`, `MutableMapping`, etc. + +**Pros**: +- `isinstance()` checks work +- Free implementations of derived methods (`get`, `update`, `__eq__`, etc.) +- ABCs document the contract clearly + +**Cons**: +- Structural subtyping not supported — you must inherit, not just implement the interface +- Python 3.8+ `typing.Protocol` (structural typing) would allow any class with the right methods to be used without inheritance +- Multiple inheritance from several ABCs creates MRO complexity + +**Alternative**: Use `Protocol` for type hints while keeping the ABC base classes for runtime behavior. This is additive (not breaking) and would improve type-checker experience. + +### 2. Disabling `clear()` via Assignment + +**Current approach**: `KvPersister.clear = _disabled_clear_method` + +**Pros**: Clear signal that "this is dangerous"; forces explicit re-enabling + +**Cons**: +- Surprising to anyone who calls `dict(store)` — it calls `.update()` and `.clear()`, which would fail +- Violates Liskov Substitution Principle (LSP) — KvPersister claims to be a MutableMapping but breaks one of its methods +- `isinstance(store, MutableMapping)` is True but `.clear()` raises + +**Alternative**: Return without error but log a warning, or document why this decision was made in the class docstring (it is, but could be more prominent). + +### 3. The `_id_of_key` Naming Convention + +**Current approach**: `_id_of_key`, `_key_of_id`, `_data_of_obj`, `_obj_of_data` (from math: `Y_of_X` means `f: X → Y`) + +**Pros**: Explicit directionality; clear what goes in and comes out + +**Cons**: +- Unfamiliar to most Python developers (unusual naming style) +- `wrap_kvs` uses the *opposite* naming: `key_of_id` (outgoing) vs `id_of_key` (incoming), which is correct but requires mental mapping +- The `_` prefix makes them look like private/internal, but they're the main customization points + +**Alternative**: `encode_key`/`decode_key`, `serialize_value`/`deserialize_value` — more conventional names. Or `key_to_id`/`id_to_key` (Python-style, verb-noun). + +### 4. `store_decorator`'s 4-Way Usage + +**Current approach**: One decorator factory that produces a decorator usable as: class-decorator, class-decorator-factory, instance-decorator, instance-decorator-factory. + +**Pros**: Maximum flexibility; no code duplication; same API works in all contexts + +**Cons**: +- The 4-way behavior is non-obvious from the signature alone +- Error messages when misused can be cryptic +- Testing all 4 modes for each decorator adds overhead + +**Alternative**: Keep the 4-way usage but add type hints that make it clear in IDEs. Or provide separate `as_class_deco` and `as_instance_deco` wrappers. + +### 5. Missing: Async Support + +**Current approach**: Entirely synchronous. + +**Cons**: Cannot be used with `async` backends (asyncio, aiohttp, aiobotocore) without blocking the event loop. + +**Alternative**: An `AsyncKvReader` / `AsyncKvPersister` hierarchy with `async def __aiter__`, `async def __agetitem__`, etc. This would be additive and the sync hierarchy could stay as-is. + +### 6. No Generic Type Parameters on Classes + +**Current approach**: `KvReader` has no type parameters. + +**Cons**: Type checkers cannot infer key/value types. + +**Alternative**: `KvReader[KT, VT]`, `KvPersister[KT, VT]`, `Store[KT, VT]` — would improve IDE autocompletion and static analysis. Could be done without breaking changes using `Generic[KT, VT]`. + +### 7. `wrap_kvs` vs Direct Subclassing + +`wrap_kvs` is powerful but creates anonymous classes at runtime, which has implications: +- `type(store).__name__` may not be meaningful +- Pickling can be tricky (though dol handles this via `__reduce__`) +- Debugging stack traces show generic names + +For performance-critical code or when pickling is needed, direct subclassing is still more reliable. + +--- + +## Key Idioms Summary + +| Idiom | Where | What it does | +|-------|-------|-------------| +| `Y_of_X` naming | `base.py`, `trans.py` | Explicit directionality for transform functions | +| `@store_decorator` | `trans.py` | 4-way usage for class/instance decorators | +| `@double_up_as_factory` | `trans.py` | Decorator works both directly and as factory | +| `Sig` arithmetic | `signatures.py` | Merge/subtract/compose function signatures | +| `Codec` + `__add__` | `trans.py` | Chain encode/decode pairs with `+` operator | +| `Pipe` | `util.py` | Left-to-right function composition | +| `cache_this` | `caching.py` | Pluggable property/method caching | +| `static_identity_method` | `util.py` | No-op hook default (works as static or instance method) | +| `wrap = classmethod(delegator_wrap)` | `base.py` | Class carries its own wrapping method |