Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
331 changes: 331 additions & 0 deletions ATTRIBUTION_CLARITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,331 @@
# Attribution Clarity

**Purpose:** Establish, with code-level evidence, exactly which parts of Elume are
ported from upstream sources and which parts are original Elume engineering. This
document is the authoritative reference for academic citations, licensing
conversations, and any external claim that touches Elume's relationship to
upstream work.

**Audience:** Reviewers, academic readers, license auditors, future maintainers,
and the author when answering "did you write this or did they?"

**Last reviewed against:**
- `bingreeky/MemEvolve` reference repo at `/Volumes/Asylum/repos/MemEvolve/Flash-Searcher-main/`
- LinOSS reference repo (Rusch & Rus 2024, JAX/Equinox) at `/Volumes/Asylum/repos/linoss/`

---

## TL;DR

Three categories of code in Elume relative to upstream:

1. **Direct port** — code copied (with adaptations) from an upstream Apache-2.0
source. License attribution preserved. Line citations exist in the file
header.
2. **Interface conformance** — original code that implements an upstream public
interface so Elume can be plugged into upstream's framework. No code shared.
3. **Framing-only** — original code that adopts a *conceptual framing* from an
upstream paper but shares no code with the upstream implementation. Standard
primitives, original engineering.

A given Elume module sits in exactly one of these three categories. The
boundaries are not fuzzy.

---

## Lineage Table

| Elume code | Category | Upstream source | License obligation |
|---|---|---|---|
| `elume.adapters.memevolve.shaping` (415 L) | **Direct port** | bingreeky/MemEvolve `EvolveLab/providers/dionysus_memory_provider.py` lines 37–40, 262–319; `EvolveLab/providers/entity_extractor.py` lines 84–92, 135–216, 342–346 | Apache-2.0 attribution preserved at top of file |
| `elume.adapters.memevolve.provider.ElumeMemoryProvider` (534 L) | **Interface conformance** | Implements `EvolveLab.base_memory.BaseMemoryProvider` interface | None (interface implementation; no shared code) |
| `elume.adapters.memevolve.{ingest,encode,retrieve,records}` | **Original** | None — supporting code for the adapter | None |
| `elume.evolution.engine.EvolutionEngine` (192 L) | **Framing-only** | Conceptual framing from Zhang et al. 2025 (arXiv:2512.18746); no code shared with `MemEvolve.core.memory_evolver.MemoryEvolver` | None |
| `elume.evolution.auto_evolver.AutoStrategyEvolver` (110 L) | **Framing-only** | Inspired by the multi-round evolution concept in MemEvolve; no code shared with `MemEvolve.core.auto_evolver.AutoEvolver` | None |
| `elume.evolution.operators.*` (186 L) | **Original** | Standard GA primitives | None |
| `elume.evolution.selection.fitness_tournament` (76 L) | **Original** | Standard GA tournament | None |
| `elume.linoss.*` (593 L total) | **Framing-only** | Algorithm from Rusch & Rus 2024 (arXiv:2410.03943); no code shared with the upstream JAX/Equinox reference | None |
| `linoss-dynamics` package (258 L; separate repo) | **Framing-only** | Same paper as above; pure NumPy implementation | None |
| `elume.basins.*` | **Framing-only** | Hopfield 1982; standard associative memory primitives | None |

---

## Per-Module Evidence

### 1. `elume.adapters.memevolve.shaping` — Direct port

**Status:** Code adapted from `bingreeky/MemEvolve` (Apache-2.0).

**Header in source file:**
```python
# Portions adapted from bingreeky/MemEvolve (Apache-2.0); see ATTRIBUTION.md.
"""
Sources adapted (Apache-2.0):
- EvolveLab/providers/dionysus_memory_provider.py lines 37–40, 262–319
- EvolveLab/providers/entity_extractor.py lines 84–92, 135–216, 342–346
"""
```

**Verified against upstream:**

Upstream `EvolveLab/providers/dionysus_memory_provider.py:37-40`:
```python
MEMEVOLVE_PHASE_MEMORY_TYPES = {
MemoryStatus.BEGIN: ["strategic", "procedural", "semantic"],
MemoryStatus.IN: ["episodic", "context", "semantic"],
}
```

Elume `src/elume/adapters/memevolve/shaping.py`:
```python
PHASE_MEMORY_TYPES: dict[str, list[str]] = {
"begin": ["strategic", "procedural", "semantic"],
"in": ["episodic", "context", "semantic"],
}
```

The data is identical. The variable name was de-prefixed (the surrounding module
path already says `memevolve`) and the enum keys were stringified to remove the
upstream `MemoryStatus` dependency. **This is a port. Attribution is preserved.**

Same verification holds for `PII_PATTERNS` (upstream `entity_extractor.py:84-92`)
and the tool/source patterns.

**Approved phrasing:**
- "We adapt the phase-memory-type mapping and PII patterns from MemEvolve under
Apache-2.0 (see ATTRIBUTION.md)."
- "The shaping helpers in `elume.adapters.memevolve.shaping` are ported from
bingreeky/MemEvolve."

**Prohibited phrasing:**
- "We wrote our own PII patterns." (False — they're ported.)

---

### 2. `elume.adapters.memevolve.provider.ElumeMemoryProvider` — Interface conformance

**Status:** Original code that implements upstream's `BaseMemoryProvider`
interface so Elume can be plugged into MemEvolve's benchmark suite as a
`--memory_provider` choice.

**Upstream interface** (`EvolveLab/base_memory.py:10`):
```python
class BaseMemoryProvider(ABC):
...
```

**Elume implementation:** 534 lines, original, no shared code with any upstream
provider. Distinguished from upstream's own providers (`DionysusMemoryProvider`,
`AgentKBProvider`, `VoyagerMemoryProvider`, etc.) by the `Elume` prefix.

The relationship is identical to writing a `dict`-compatible class in Python:
you implement the interface methods, you do not port `dict`.

**Approved phrasing:**
- "ElumeMemoryProvider conforms to MemEvolve's `BaseMemoryProvider` interface."
- "The first deterministic baseline in MemEvolve's `--memory_provider` list."

**Prohibited phrasing:**
- "ElumeMemoryProvider is a port of MemEvolve's provider." (False — it's
original code that implements the upstream public interface.)

---

### 3. `elume.evolution.engine.EvolutionEngine` — Framing-only

**Status:** Original Elume engineering. Adopts the conceptual *framing* from
Zhang et al. 2025 — "agent memory as an evolvable population rather than policy
weights" — but shares no code with the upstream implementation.

**Upstream class with the closest functional role**
(`MemEvolve/core/memory_evolver.py:27`):
```python
class MemoryEvolver:
"""Core memory evolution orchestrator"""
# Imports: openai, dotenv, MemorySystemCreator, PhaseAnalyzer,
# PhaseGenerator, PhaseValidator
```

Upstream `MemoryEvolver` is an **LLM-driven multi-phase orchestrator**: it calls
GPT-class models to analyze, generate, and validate memory systems. It depends
on `openai` and a multi-phase pipeline of `PhaseAnalyzer` /
`PhaseGenerator` / `PhaseValidator`.

**Elume `EvolutionEngine`** is a **pure deterministic GA stepper**: it reads
strategies from a `MemoryProvider`, applies tournament selection plus mutation
operators, and writes children back. It depends on `numpy` only. There is no
LLM call, no multi-phase pipeline, no analyzer/validator concept.

**Code shared between the two:** zero lines.

**Engineering contributions specific to Elume `EvolutionEngine`:**
- Byte-identical replay (injectable `numpy.random.Generator`, deterministic
seed)
- Immutable `Strategy` records — `frozen=True`, every mutation produces a new
child via `.evolved()` with `parent_name` set
- Provider boundary — engine holds no in-memory population; all reads/writes go
through `MemoryProvider`
- Composable `MutationOperator` / `CrossoverOperator` protocols
- Fitness scores live outside the `Strategy` model (in caller-supplied dicts)
to preserve immutability invariants

**The framing reference**, which is what Elume adopts from MemEvolve, is one
sentence at the conceptual level: *agent memory should be evolvable rather than
just trained*. Elume's implementation of that idea uses standard 1990s-era
genetic algorithm primitives (Holland-style tournament, elitism, mutation
operators) that predate MemEvolve by decades.

**Approved phrasing:**
- "We implement a MemEvolve-style evolution loop."
- "Elume adopts the evolvable-memory-population framing from Zhang et al. 2025;
the implementation is original Elume work using standard GA primitives."
- "Elume contributes the engineering substrate: byte-identical replay, immutable
lineage, provider abstraction, and composable operator protocols."

**Prohibited phrasing:**
- "We ported MemEvolve's evolution engine." (**False.** No code is shared.)
- "Elume reimplements MemoryEvolver." (**False.** Different abstraction —
upstream is LLM/multi-phase; Elume is GA/deterministic.)
- "Elume's `EvolutionEngine` is based on MemEvolve's." (**Misleading.** The
framing is, the code is not.)

---

### 4. `elume.evolution.auto_evolver.AutoStrategyEvolver` — Framing-only (renamed)

**Status:** Original Elume engineering. **Renamed from `AutoEvolver` to
`AutoStrategyEvolver` in v0.3.0** to eliminate name collision with upstream's
`MemEvolve.core.auto_evolver.AutoEvolver`, which has identical name but
opposite-weight-class semantics.

**Why the rename was necessary:**

| | Upstream `AutoEvolver` | Elume `AutoStrategyEvolver` |
|---|---|---|
| Role | Dataset benchmark orchestrator | Pure GA loop wrapper |
| Dependencies | LLMs (GPT-5), ThreadPoolExecutor, dotenv, datasets, dataset configs | numpy only |
| Constructor args | 15+ (`analysis_model_id`, `gen_model_id`, `work_root`, `dataset_name`, `run_provider`, `creativity_index`, `task_batch_x`, `top_t`, `max_workers`, `use_pareto_selection`, …) | 4 (`engine`, `fitness_fn`, `max_generations`, `stop_on_plateau`) |
| Behavior | Runs N rounds against eval datasets with concurrent workers | Loops `evolve_one_generation` until plateau |
| Lines | ~80 just for constructor signatures | 110 total |

A reader who saw `AutoEvolver(engine, fitness_fn)` and assumed it behaved like
the paper's `AutoEvolver(analysis_model_id=..., dataset_name="gaia")` would get
wildly different results. Namespace alone (`elume.evolution.auto_evolver` vs
`MemEvolve.core.auto_evolver`) handles import disambiguation but does not
prevent reader confusion when the class name appears in prose, talks, papers,
or stack traces.

**Renaming policy applied:** Where Elume's class name collides with an upstream
class of the same name with materially different semantics, the Elume name is
disambiguated. Where the name does not collide, namespace alone is sufficient
(the `pandas.DataFrame` vs `polars.DataFrame` precedent).

**Code shared between upstream `AutoEvolver` and Elume `AutoStrategyEvolver`:**
zero lines.

**Approved phrasing:**
- "Elume's `AutoStrategyEvolver` runs a deterministic generation loop over an
`EvolutionEngine`."
- "The class is a thin wrapper that evaluates fitness and detects plateaus —
it is not equivalent to MemEvolve's heavyweight `AutoEvolver`."

---

### 5. `elume.linoss` and `linoss-dynamics` — Framing-only

**Status:** Original implementations of the LinOSS algorithm from Rusch & Rus
2024 (arXiv:2410.03943). No code shared with the JAX/Equinox upstream reference
implementation.

**Upstream reference** (`/Volumes/Asylum/repos/linoss/models/LinOSS.py`):
- `class GLU(eqx.Module)`, `class LinOSSLayer(eqx.Module)`,
`class LinOSSBlock(eqx.Module)`, `class LinOSS(eqx.Module)` — Equinox neural
network layers
- `def apply_linoss_im(...)`, `def apply_linoss_imex(...)` — JAX scan operations
- Dependencies: `equinox`, `jax`, `jax.numpy`

**Elume / linoss-dynamics:**
- `def linoss_step(...)`, `def damped_linoss_step(...)`, `def linoss_step_impl`
— pure NumPy step functions
- `def energy(...)`, `def delta_energy(...)`, `def convergence_window(...)` —
energy diagnostics not present in upstream
- `class LinOSSError`, `class InvalidShapeError`, `class InvalidDampingError`,
`class UnsupportedModeError` — error hierarchy not in upstream
- Dependencies: `numpy` only

**Class name collision:** None. Upstream uses `LinOSSLayer`, `LinOSSBlock`,
`LinOSS` (Equinox modules); Elume / linoss-dynamics uses functions and error
classes. The two APIs do not overlap.

**Approved phrasing:**
- "Elume implements the LinOSS algorithm from Rusch & Rus 2024."
- "`linoss-dynamics` provides a pure-NumPy reference implementation with energy
diagnostics."

**Prohibited phrasing:**
- "Elume ports the LinOSS reference implementation." (**False.** Upstream is
JAX/Equinox; Elume is NumPy. Same algorithm, different code.)

---

## Naming Policy (canonical)

When Elume code shares a name with upstream, the policy is:

1. **If Elume code is a port** → use the upstream name (after de-prefixing for
namespace cleanliness). Lineage should be obvious.
2. **If Elume code conforms to an upstream interface** → distinguish with a
clear prefix (`Elume…`).
3. **If Elume code is original but the name happens to match upstream**:
- Same general semantics → namespace handles it (e.g. `pandas.DataFrame`
precedent).
- Materially different semantics → **rename**. The collision is the bug.

The `AutoEvolver` → `AutoStrategyEvolver` rename in v0.3.0 is the only case
where this policy required action. Namespace handles all other cases.

---

## Defensive Phrasings (paste into papers / READMEs verbatim)

### Acceptable

> Elume's evolution module is a deterministic, replay-safe genetic algorithm
> operating on immutable `Strategy` records through a provider boundary. The
> framing — agent memory as an evolvable population rather than policy weights
> — is adopted from MemEvolve (Zhang et al. 2025, arXiv:2512.18746). The
> implementation is original Elume work using standard GA primitives. What
> Elume contributes is the engineering substrate: byte-identical replay,
> immutable lineage, provider abstraction, and composable operator protocols.

> The `elume.adapters.memevolve.shaping` module includes helpers ported from
> bingreeky/MemEvolve under Apache-2.0 (see `ATTRIBUTION.md`). All other
> evolution code in Elume is original.

> Elume implements the LinOSS algorithm (Rusch & Rus 2024) in pure NumPy with
> energy diagnostics. The reference upstream implementation is in JAX/Equinox;
> no code is shared between the two.

### Unacceptable

- "Elume is a port of MemEvolve." (False.)
- "Elume reimplements the MemEvolve evolution engine." (False — only the
framing is adopted.)
- "Elume contains MemEvolve code." (Partially false — only `shaping.py`
contains ported MemEvolve code; everything else is original.)

---

## Maintenance

- When adding new code to `elume.evolution`, `elume.linoss`, `elume.basins`, or
`elume.adapters.memevolve`, update the lineage table above.
- When adopting new code from any upstream source, add the file header citation
pattern used in `shaping.py` and update this document and `ATTRIBUTION.md`.
- When a new collision-risk class name is introduced, run the audit:
`grep "^class <Name>" /Volumes/Asylum/repos/MemEvolve/`. If a match appears,
apply the naming policy above.

---

*Authored: 2026-05-05. Author: Mani Saint-Victor (drmani215@gmail.com).*
Loading
Loading