Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ icechunk = { version = "2.0.0", default-features = false }
ndarray = "0.17.2"
numpy = { version = "0.29", features = ["half"] }
object_store = "0.13.2"
pyo3 = { version = "0.29", features = ["macros", "abi3-py311"] }
pyo3 = { version = "0.29", features = ["macros", "abi3-py311", "multiple-pymethods"] }
pyo3-arrow = "0.19"
pyo3-async-runtimes = { version = "0.29", features = ["tokio-runtime"] }
pyo3-bytes = "0.7.1"
Expand Down
112 changes: 112 additions & 0 deletions dev-docs/specs/2026-06-22-shared-pymethods-macros-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Shared `#[pymethods]` macros for sync/async Array & Group

**Date:** 2026-06-22
**Status:** Approved

## Problem

`PyArray`/`PyAsyncArray` (and `PyGroup`/`PyAsyncGroup`) duplicate a set of
metadata accessors that are byte-for-byte identical between the sync and async
variants. These accessors only read from `self.inner` and perform no I/O, so
they do not differ across the sync/async split — yet they are copy-pasted into
both `#[pymethods]` blocks. As more upstream `zarrs` `Group` methods get
exposed, this duplication will grow.

## Constraints / why not the obvious alternatives

- **No generic Rust base pyclass.** `#[pyclass]` cannot be generic; pyo3 must
register one concrete type with the interpreter. `#[pyclass] struct
PyBaseArray<S: Storage>` will not compile.
- **pyo3 inheritance does not share logic here.** A base `#[pymethods]` block
can only access *base* fields, but the storage-typed `inner` lives in the
subclass. A Rust base could provide `isinstance` but could not implement
`shape`/`dtype`/etc.
- **Python API stays unchanged.** Goal is internal Rust DRY only. A
Python-visible base class / Protocol is explicitly *not* a requirement and is
out of scope for this change (could be added later as a pure-Python layer).

## Design

Use a `macro_rules!` macro per node type that expands to a **complete, separate
`#[pymethods] impl $ty` block**. The macro is parameterized over the concrete
type (`$ty:ty`) and invoked at module level (not inside an existing
`#[pymethods]` block), so it re-expands against each concrete type and sidesteps
the generics limitation. This works for both sync and async because async's
`inner: Arc<Array<…>>` derefs to `Array<…>`, so `self.inner.shape()` etc.
compile identically.

### Why a whole `#[pymethods]` block (and `multiple-pymethods`)

A macro invocation *inside* a `#[pymethods]` block does not work: pyo3's
proc-macro processes the impl block before the `macro_rules!` invocation is
expanded, so it never sees the generated methods (`error: macros cannot be used
as items in #[pymethods] impl blocks`). The macro must therefore emit its own
`#[::pyo3::pymethods] impl $ty { … }` block.

Having two `#[pymethods]` blocks for one type (the macro-generated one plus the
hand-written type-specific one) requires the **`multiple-pymethods`** pyo3
feature. This is enabled in `Cargo.toml`. It is verified compatible with the
crate's `abi3-py311` feature, and pulls in `inventory`, which is already a
transitive dependency via pyo3 — so no meaningful new dependency.

### `src/array/shared.rs`

Defines `array_metadata_accessors!($ty)` emitting a `#[pymethods] impl $ty`
block with these 9 `#[getter]` accessors:

- `attrs`, `chunk_grid`, `codecs`, `dimension_names`, `dtype`, `metadata`,
`ndim`, `path`, `shape`

`metadata` returns `PyArrayMetadata` (the metadata newtype handles
pythonization via its `IntoPyObject` impl) rather than calling `pythonize`
inline, mirroring the group accessor.

Exported via `pub(crate) use array_metadata_accessors;`. Uses `$crate::`-rooted
paths for referenced types (`PyChunkGrid`, `PyCodecChain`, `PyDataType`,
`PyArrayMetadata`) and fully-qualified `::pyo3::` / `::pythonize::` paths so the
macro is hygienic regardless of the caller's imports.

### `src/group/shared.rs`

Defines `group_metadata_accessors!($ty)` emitting a `#[pymethods] impl $ty`
block with the storage-agnostic, no-I/O `Group` accessors:

- `attrs` (`#[getter]`, pythonized attributes map)
- `metadata` (`#[getter]`, returns `PyGroupMetadata` — the newtype's
`IntoPyObject` impl handles pythonization, so the getter just clones
`self.inner.metadata()` and `.into()`s it; no inline `pythonize`)
- `consolidated_metadata` (`#[getter]`, returns
`Option<PyConsolidatedMetadata>` — `self.inner.consolidated_metadata()`
already returns an owned `Option`, so it just `.map(Into::into)`s)
- `path` (`#[getter]`, `self.inner.path().as_str()`)

`PyConsolidatedMetadata` is added to `src/metadata.rs` via the existing
`pythonized_metadata!` macro (wrapping
`zarrs::metadata_ext::group::consolidated_metadata::ConsolidatedMetadata`,
which derives `Serialize`/`Deserialize`/`Clone`).

Grows further as more storage-agnostic `zarrs` `Group` methods are exposed.
I/O methods (`array_keys`/`group_keys`, child navigation) stay per-type.

### Wiring

- `src/array/mod.rs` and `src/group/mod.rs` each gain `mod shared;`.
- `sync.rs` / `async.rs` for both array and group `use` the macro and invoke it
at module level (between the struct's inherent `impl` and the type-specific
`#[pymethods]` block), with a one-line comment pointing at `shared.rs`. The
duplicated accessors are deleted from both files.

## Out of scope (stays per-type)

- `__repr__` — differs by type-name string (`"Array"` vs `"AsyncArray"`).
- All I/O methods — `open`/`open_async`, `retrieve_*`, `__getitem__`,
`array_keys`/`group_keys` — different signatures and bodies (sync returns a
value; async takes `py` and returns `future_into_py(...)`).
- Any Python-visible base class / Protocol.

## Success criteria

- `cargo build` succeeds; the macro-generated methods are present on all four
Python classes (verified via existing behavior / a quick attribute check).
- No change to the Python-visible API surface.
- The 9 array accessors and group `attrs` exist in exactly one place each.
20 changes: 19 additions & 1 deletion python/zarrista/_group.pyi
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from zarr_metadata import JSONValue
from zarr_metadata import ConsolidatedMetadataV3, GroupMetadataV3, JSONValue

from ._array import Array, AsyncArray
from ._store import AsyncStore, FilesystemStore, MemoryStore
Expand All @@ -12,6 +12,15 @@ class Group:
@property
def attrs(self) -> dict[str, JSONValue]:
"""The group's user attributes as a dict."""
@property
def metadata(self) -> GroupMetadataV3:
"""The group's full Zarr v3 metadata."""
@property
def consolidated_metadata(self) -> ConsolidatedMetadataV3 | None:
"""The consolidated metadata, if present in the group metadata."""
@property
def path(self) -> str:
"""The group's path in the store."""
def array_keys(self) -> list[str]:
"""Names of the direct child arrays."""
def group_keys(self) -> list[str]:
Expand All @@ -31,6 +40,15 @@ class AsyncGroup:
@property
def attrs(self) -> dict[str, JSONValue]:
"""The group's user attributes as a dict."""
@property
def metadata(self) -> GroupMetadataV3:
"""The group's full Zarr v3 metadata."""
@property
def consolidated_metadata(self) -> ConsolidatedMetadataV3 | None:
"""The consolidated metadata, if present in the group metadata."""
@property
def path(self) -> str:
"""The group's path in the store."""
async def array_keys(self) -> list[str]:
"""Names of the direct child arrays."""
async def group_keys(self) -> list[str]:
Expand Down
61 changes: 5 additions & 56 deletions src/array/async.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,16 @@
use std::sync::Arc;

use crate::array::selection::PySelection;
use crate::array::shared::array_metadata_accessors;
use crate::array::util::PyChunkIndices;
use crate::chunks::PyChunkGrid;
use crate::codec::{PyCodecChain, PyCodecOptions};
use crate::codec::PyCodecOptions;
use crate::decoded_array::DecodedArray;
use crate::dtype::PyDataType;
use crate::error::ZarristaError;
use crate::node::PyNodePath;
use crate::storage::PyAsyncStorage;
use pyo3::prelude::*;
use pyo3_async_runtimes::tokio::future_into_py;
use pyo3_bytes::PyBytes;
use pythonize::pythonize;
use pythonize::Result as PythonizeResult;
use zarrs::array::Array;
use zarrs::storage::AsyncReadableWritableListableStorageTraits;

Expand All @@ -31,6 +28,9 @@ impl PyAsyncArray {
}
}

// Metadata accessors shared with `PyArray`; see `array/shared.rs`.
array_metadata_accessors!(PyAsyncArray);

#[pymethods]
impl PyAsyncArray {
/// Read a region with numpy-style basic indexing, e.g. `await arr[0:10, :, 5]`.
Expand Down Expand Up @@ -70,51 +70,6 @@ impl PyAsyncArray {
})
}

/// The array's user attributes as a dict.
#[getter]
fn attrs<'py>(&self, py: Python<'py>) -> PythonizeResult<Bound<'py, PyAny>> {
pythonize(py, self.inner.attributes())
}

#[getter]
fn chunk_grid(&self) -> PyChunkGrid {
self.inner.chunk_grid().clone().into()
}

#[getter]
fn codecs(&self) -> PyCodecChain {
self.inner.codecs().into()
}

/// The dimension names, if any were specified.
#[getter]
fn dimension_names(&self) -> &Option<Vec<Option<String>>> {
self.inner.dimension_names()
}

/// The Zarr data-type
#[getter]
fn dtype(&self) -> PyDataType {
self.inner.data_type().clone().into()
}

#[getter]
fn metadata<'py>(&self, py: Python<'py>) -> Bound<'py, PyAny> {
pythonize(py, self.inner.metadata()).unwrap()
}

/// The number of dimensions.
#[getter]
fn ndim(&self) -> usize {
self.inner.dimensionality()
}

/// The array's path in the store.
#[getter]
fn path(&self) -> &str {
self.inner.path().as_str()
}

/// Read a region of the array as `Data`, using numpy-style basic indexing.
fn retrieve_array_subset<'py>(
&self,
Expand Down Expand Up @@ -168,10 +123,4 @@ impl PyAsyncArray {
Ok(encoded.map(PyBytes::new))
})
}

/// The array shape.
#[getter]
fn shape(&self) -> &[u64] {
self.inner.shape()
}
}
1 change: 1 addition & 0 deletions src/array/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
mod r#async;
mod selection;
mod shared;
mod sync;
mod util;

Expand Down
67 changes: 67 additions & 0 deletions src/array/shared.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
//! Shared `#[pymethods]` for `PyArray` / `PyAsyncArray`.
//!
//! These accessors only read from `self.inner` and perform no I/O, so they are
//! identical between the sync and async variants. The macro emits a separate
//! `#[pymethods]` block (requires the `multiple-pymethods` pyo3 feature).
macro_rules! array_metadata_accessors {
($ty:ty) => {
#[::pyo3::pymethods]
impl $ty {
/// The array's user attributes as a dict.
#[getter]
fn attrs<'py>(
&self,
py: ::pyo3::Python<'py>,
) -> ::pythonize::Result<::pyo3::Bound<'py, ::pyo3::PyAny>> {
::pythonize::pythonize(py, self.inner.attributes())
}

#[getter]
fn chunk_grid(&self) -> $crate::chunks::PyChunkGrid {
self.inner.chunk_grid().clone().into()
}

#[getter]
fn codecs(&self) -> $crate::codec::PyCodecChain {
self.inner.codecs().into()
}

/// The dimension names, if any were specified.
#[getter]
fn dimension_names(&self) -> &Option<Vec<Option<String>>> {
self.inner.dimension_names()
}

/// The Zarr data-type
#[getter]
fn dtype(&self) -> $crate::dtype::PyDataType {
self.inner.data_type().clone().into()
}

#[getter]
fn metadata(&self) -> $crate::metadata::PyArrayMetadata {
self.inner.metadata().clone().into()
}

/// The number of dimensions.
#[getter]
fn ndim(&self) -> usize {
self.inner.dimensionality()
}

/// The array's path in the store.
#[getter]
fn path(&self) -> &str {
self.inner.path().as_str()
}

/// The array shape.
#[getter]
fn shape(&self) -> &[u64] {
self.inner.shape()
}
}
};
}

pub(crate) use array_metadata_accessors;
Loading
Loading