Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@

# -- Options to show "Edit on GitHub" button ---------------------------------
html_context = {
"display_github": True, # Integrate GitHub
"github_user": "aneoconsulting", # Username
"github_repo": "ArmoniK.Api", # Repo name
"github_version": "main", # Version
"conf_py_path": "/.docs/", # Path in the checkout to the docs root
"display_github": True,
"github_user": "aneoconsulting",
"github_repo": "PymoniK",
"github_version": "main",
"conf_py_path": "/.docs/",
}
121 changes: 102 additions & 19 deletions .docs/development/contribution.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,104 @@
# Contributing

This doesn't differ from our other projects ([Read ArmoniK.CLI's CONTRIBUTING.md](https://github.com/aneoconsulting/ArmoniK.CLI/blob/main/CONTRIBUTING.md)).

## Open Issues

Here's a non-exhaustive list of things that are outright/partially missing from PymoniK that we'd like to see added in:

- **Unit tests, end-to-end tests** : This project doesn't have any testing associated to it, 0% code coverage. We'd like to change this.
- **More sophisticated examples** : We'd like to add even more examples and tutorials of common use cases that use ArmoniK under the hood.
- **PymoniK logger** : The session created/closed/cancelled prints and the different errors should be logged instead of being printed.
- **Local PymoniK** : Once of the big advantages of the way things have been coded is being able to switch from/to a remote/local context by just removing the `invoke` methods. This could be done even better, by adding a `local=True` flag to PymoniK that makes it so invokes are executed as regular function calls that run locally. The challenge is mainly handling `Pymonik.put`s and the map_invoke.
- **Rename `ResultHandle` to `ObjectHandle`** : The naming of `ResultHandle` was choosen because invocations return results, but as it turns out, these results are also then served as inputs. It'd be better for naming (especially since you can put things onto ArmoniK) to rename `ResultHandle` to the more generic `ObjectHandle`.
- **Cleaner sub-tasking** : Subtasking requires the user to pass in a `delegate=True` flag to invokes, this isn't particularly clean or nice. There must be a better way of doing it.
- **Tasks returning multiple results**: As of right now, a task can only return a single result object (even when you return a tuple). There should be support for cases where you'd like to return multiple results from a task and not have them be grouped up into one (multiple smaller objects being passed onto multiple tasks). We think this change would be easier to implement once sub-tasking is in place, since it also involves analyzing what the user will return. There should be pre-execution tests to check if the user is returning a different number of results in different branches of the task and that should result in a failure (invalid task).
- **`results.as_completed`** : For more sophisticated and better time-to-execute, we'd like to implement a method for `MultiResultHandle` that allows the user to for instance loop through a `MultiResultHandle` and have the code execute as the result is done/retrieved. Moreover, as a side-effect, having this feature would allow for the usage of `tqdm` to create progress bars which is also really nice.
- **Intermediate Objects** : Support being able to create/download ArmoniK Objects (Intermediate results) within tasks. The download part would require `GetDirectData` in the Python API.
- **Remote to local error propagation** : Supply an additional "error name" to created tasks, when a result creation fails, we create this result, locally we can retrieve the remote stack trace using `my_result.error()` if we try-except, either that are we enrich the local result failure grpc exception with the remote one.
- **Test JIT-ing** : jitting tasks using namba/taichi if they're pure for additional performance. (Should just test if it'd work as intended..)
- **More sophisticated result deserialization**: Right now it's just first-level depickling, it'd be nice to be able to pass in a dict that has ResultHandle values for example and be able to dynamically fetch those but that'd add a lot of complexity to data dependencies that'd need to be handled. There has to be a nice way of doing this and it's worth exploring
- **PymoniK Visualizer**: With the current implementation of invoke/map_invoke, we can make it so you can surround your PymoniK context with a Grapher context that dynamically builds up a visualization of your task graph that you can then save and look at/analyze/vizualize later.
Thank you for considering a contribution. PymoniK is a small,
opinionated SDK and we'd like to keep it that way — but there's
plenty to do, and outside perspectives are welcome.

For repo-wide conventions, see ANEO's
[contribution guidelines on ArmoniK.CLI](https://github.com/aneoconsulting/ArmoniK.CLI/blob/main/CONTRIBUTING.md);
PymoniK follows the same shape.

## Before you start

- For non-trivial changes, open an issue to discuss the approach
before writing code. Saves rounds.

## What we'd particularly like help with

Roughly in priority order — none of these are claimed; happy to
talk through any of them.

### Production-readiness

- **`pymonik image build` CLI.** Read the user's `pyproject.toml`,
render a Dockerfile from a template, run `docker build`, print the
tag. The most-asked-for missing piece.
- **OIDC / bearer-token credentials.** The `Credentials` class only
handles mTLS. Ship a `BearerCredentials(token_provider=...)` that
plugs into the gRPC channel via the metadata callback.
- **`pymonik doctor` CLI.** Hits the cluster's `Versions` and
`Health` services, reports cluster compatibility with the local
pymonik version, surfaces obvious misconfigs (events stream
reachable, partition exists, AKCONFIG sane).

### Observability

- **Wire OTel into ArmoniK upstream** so cluster-side spans (polling
agent, control plane, agent sidecar) chain into PymoniK's. The W3C
trace context already propagates; the cluster just needs to emit
spans under it. This is an upstream contribution, not a PymoniK PR.
- **Notebook display hooks.** `Future.__repr__` rendering a progress
bar in Jupyter; `FutureList` showing a per-task heatmap.

### Performance

- **Cross-session blob reuse.** Use ArmoniK's `Results.import_data` to
bind a fresh result id to data already uploaded in a prior session,
driven by a local `~/.cache/pymonik/blobs/` hash-to-opaque-id index.
- **Per-session warm subprocess** for `deps=` + `isolate=True`. Spawn
one child Python at session-open time, feed tasks through a Unix
socket. Drops per-task startup from ~500 ms to ~1 ms while
preserving subprocess isolation.

### Async core

- Drop the threading completion loop, port the events stream to
`grpc.aio`, unify `Future` on a single `anyio.Event`. The threading
bridge in `Future` is the largest piece of accidental complexity in
the codebase.

### Tests and examples

- More end-to-end tests against a `testcontainers`-spun ArmoniK.
- `hypothesis` round-trip tests for the envelope and refs.

### Documentation

- This doc tree is a fresh rewrite; it'll have rough edges. Reading
through any of the guides and filing an issue (or PR draft) for
things that confused you is genuinely valuable.
- Worked examples for fault tolerance — show what happens when a
worker pod gets evicted mid-task, and how `retries=` covers it.

## Small but appreciated

- Typo fixes, dead-link fixes, doctest fixes.
- Ruff / pyright cleanups in `_internal/`.
- More attribute coverage on existing OTel spans (anything that'd
help filter in a UI).

## What we generally don't want

- **Major API churn** — the public surface (decorator, `.spawn`,
`.map`, `Future`, `Blob`, `Materialize`) is mostly settled.
Suggest naming changes via an issue first; don't rename in a PR.
- **Adding heavy dependencies** to the runtime. The current set is
deliberate. New deps need a strong "it would be much worse to
hand-roll this" argument.
- **Hiding ArmoniK from users.** PymoniK wraps the lower-level
`armonik` package; it doesn't try to replace it. Anywhere you'd
reach for `armonik.client.*`, that should still work alongside the
PymoniK API.

## How to ship a PR (when you have permissions)

The maintainer's workflow:

1. Local commits, no force pushes to shared branches.
2. Run the fast suite: `uv run pytest -m "not slow"`.
3. Run pyright: `uv run basedpyright src/pymonik`.
4. Run ruff: `uv run ruff check && uv run ruff format`.
5. If you touched `worker.py` or anything in `_internal/`, rebuild
the worker image and restart the partition; rerun a
representative example end-to-end against the rebuilt cluster.
6. Open the PR with a clear description.
115 changes: 101 additions & 14 deletions .docs/development/development.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,118 @@
# Developing PymoniK

We'll be covering some basic information to help you in working on and developing PymoniK
This page covers what you need to know to work *on* PymoniK itself —
not on top of it.

## Requirements
## Prerequisites

We're using `uv` throughout the project, so please make sure that you have it installed. You can refer to their [official `uv` installation guide](https://docs.astral.sh/uv/getting-started/installation/)
- Python 3.11. PymoniK pins to 3.11 because cloudpickle isn't
cross-minor-compatible with the worker image — tests and
`LocalCluster` need to match what the worker runs.
- [`uv`](https://docs.astral.sh/uv/) for project management.
- Docker, if you'll touch worker images or run integration tests
against a real cluster.

## Test client
## Layout

The test client contains some basic examples of working with PymoniK, PymoniK is installed in editable mode `uv add ../pymonik --editable`, it's useful to just create a python file there for testing and then `uv run`ning it to quickly iterate on PymoniK. Keep in mind that if you make changes that affect how the worker functions (obviously like making a change to the `worker.py` file), you'll have to reload the worker image. You can do this by running the following command:
```
src/pymonik/ # the package
__init__.py # public API re-exports
client.py # PymonikClient
session.py # Session + completion loops
task.py # @task decorator, Task wrapper
future.py # Future, FutureList
options.py # TaskOpts, merge semantics
envelope.py # wire format (msgspec)
blob.py # Blob, Materialize
worker.py # pymonik-worker entrypoint
worker_session.py # from-inside-a-worker submission
context.py # pymonik.current() / WorkerContext
errors.py # PymonikError hierarchy
composition.py # gather, as_completed
testing/ # LocalCluster
cli/ # pymonik CLI (stub today)
_internal/ # not part of the public API
submit.py # shared submission pipeline
refs.py # FutureRef / BlobRef / MaterializeRef
env_builder.py # uv venv + flock for runtime deps
subprocess_dispatch.py # deps + isolate=True path
task_runner.py # subprocess child entrypoint
exec_cache.py # local result cache
query.py # fluent introspection
info.py # TaskInfo / ResultInfo / ...
channel.py # gRPC channel helpers
_otel.py # OpenTelemetry helper
_logging.py # opt-in structlog setup
examples/ # runnable examples (also CI-gated)
tests/ # pytest suite
.docs/ # this documentation (Sphinx + MyST)
worker-image/ # Dockerfile for the harmonic_snake worker
```

The `_internal/` prefix marks code that may change without notice.
Anything re-exported from `pymonik/__init__.py` is part of the public
API and follows semver-ish rules within the alpha.

```bash
kubectl rollout restart deployment/compute-plane-pymonik #(1) -n armonik #(2)
## Running tests

```sh
uv sync # one-time install
uv run pytest # everything
uv run pytest -m "not slow" # skip slow integration tests (no `uv` install)
uv run pytest tests/test_otel.py -v # one file
```

1. This should be compute-plane-(NAME OF YOUR PYMONIK PARTITION).
2. If you're deploying locally the namespace is typically armonik, otherwise use the namespace of your kubernetes cluster
The test suite is divided:

- **Fast tests** (~30) — pure unit tests, no network, no `uv venv`
builds. Run in seconds. These are what CI runs on every push.
- **Slow tests** marked `@pytest.mark.slow` — exercise the runtime
deps path with a real `uv` install. Need `uv` on `PATH`. Skip on
Windows (the subprocess wire is POSIX-only for now).

The `LocalCluster` exercises the same submission pipeline as the
real client, so most behavioural tests don't need a cluster. Only
tests that depend on cluster-side behaviour (partition scheduling,
events stream over the network) need a live ArmoniK; mark those
`@pytest.mark.e2e` and run them separately when you have a deploy.

## Type checking

## Automation Script (`automation.py`)
```sh
uv run basedpyright src/pymonik
```

The `automation.py` script at the root of the project should help you realize most of your development tasks, it also auto-installs development dependencies.
The codebase aims for `typeCheckingMode = "standard"`. New code
should be fully annotated; private helpers may skip annotations when
obvious.

For example, if you want to access the documentation offline of if you're working on it (thank you!) then you can use the `serve-docs` command.
A few upstream-typing quirks (anyio's `to_thread.run_sync` overload
resolution, armonik's `Result` field types) produce false positives
in `Session` / `WorkerSession` / `task.py`. These predate the
revamp and aren't from new changes — leave them be unless you're
fixing them upstream.

To see a list of all available commands and their general descriptions, you can run:
## Linting and formatting

```sh
uv run ruff check
uv run ruff format
```
uv run automation.py --help

Ruff replaces black + flake8 + isort. Configuration is in
`pyproject.toml` under `[tool.ruff]`.

## Working against a cluster

If you're touching code that affects the worker (anything in
`worker.py` or `_internal/`), you'll need to rebuild the image and
restart the partition:

```sh
docker build -t my-org/harmonic_snake:dev worker-image/
docker push my-org/harmonic_snake:dev # or load directly into your kind cluster
kubectl rollout restart deployment/compute-plane-pymonik -n armonik
```

For client-only changes, just `uv sync` (or run with the editable
install) and re-run your client script — no image rebuild needed.
105 changes: 103 additions & 2 deletions .docs/examples/monte_carlo.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,104 @@
# Distributed Monte Carlo for PI Estimation
# Monte Carlo: estimating π

This page hasn't been written yet, but the example code for it is in `test_client/estimate_pi.py`.
A short example that hits every basic primitive: a worker function,
`map` for fan-out, `spawn` for fan-in.

The full source lives at `examples/estimate_pi.py`.

## The idea

Monte Carlo estimation of π: throw N random points in the unit square;
the fraction inside the unit quarter-circle is approximately π/4. The
more points, the better the estimate.

It's embarrassingly parallel — every chunk of points is independent,
and the only fan-in is a sum. Perfect for ArmoniK.

## The code

```python
import random
from pymonik import PymonikClient, task


@task
def count_inside(n: int, seed: int) -> int:
"""How many of `n` random points fall inside the unit quarter circle?"""
rng = random.Random(seed)
inside = 0
for _ in range(n):
x = rng.random()
y = rng.random()
if x * x + y * y <= 1.0:
inside += 1
return inside


@task
def estimate(total_inside: int, total_points: int) -> float:
"""Combine the per-shard counts into a single π estimate."""
return 4.0 * total_inside / total_points


@task
def add_all(xs: list[int]) -> int:
return sum(xs)


def run(total_points: int = 10_000_000, num_tasks: int = 32) -> float:
points_per_task = total_points // num_tasks

with PymonikClient() as client:
with client.session(partition="pymonik") as s:
shards = count_inside.starmap(
(points_per_task, i) for i in range(num_tasks)
)
total_inside = add_all.spawn(shards)
pi = estimate.spawn(total_inside, num_tasks * points_per_task)
return pi.result(timeout=120)


if __name__ == "__main__":
print(f"π ≈ {run()}")
```

## What's happening

- `count_inside.starmap(...)` submits 32 tasks in one gRPC call. Each
one runs in parallel on whatever workers ArmoniK schedules. Returns
a `FutureList[int]`. We use `starmap` because we already have arg
tuples; if the per-task args were single values, `count_inside.map(iter)`
would be cleaner.
- `add_all.spawn(shards)` passes the `FutureList` directly. PymoniK
rewrites each upstream future as a data dependency — `add_all` won't
run until every `count_inside` has finished. The client doesn't
block.
- `estimate.spawn(total_inside, ...)` chains again: `estimate` waits
for `add_all` via the same mechanism.
- Only `pi.result(timeout=120)` blocks. By the time the client wakes
up, the entire DAG (32 + 1 + 1 = 34 tasks) has run.

## Tweaking

- **More accuracy?** Bigger `total_points`. Each shard is independent,
so increasing the count just makes the leaves heavier.
- **More parallelism?** Bigger `num_tasks`. Each shard is small enough
that the overhead of submission per task starts to matter at a few
hundred — you'll see diminishing returns.
- **Reproducible?** Drop the seed indirection and pass a fixed seed.
The worker uses Python's stdlib `random`, which is deterministic
given a seed.

## Variations worth trying

1. Replace `count_inside` with one that uses `numpy.random` for
speed — declare `client.session(deps=["numpy"])` to make numpy
available without rebuilding the image. See
[Runtime environment](../guides/runtime-environment.md).
2. Use the local exec cache to skip re-running shards:
`PymonikClient(cache=True)` + `@task(cache=True)` on
`count_inside`. Re-running the script with the same args returns
instantly on the second invocation.
3. Use `LocalCluster` to run the whole thing in-process for a unit
test — no cluster needed. See
[Local testing](../guides/local-testing.md).
Loading