MemoryFlow audits the dynamic memory of LLM agents from declared event streams. It answers operational questions such as:
- Did the agent use newly written memory?
- Did it keep using stale, deleted, or superseded memory?
- Were memory updates verified before their deadlines?
- Were failed verifications corrected in time?
- Which metrics are valid, degraded, non-comparable, or not computable?
MemoryFlow is not a memory algorithm, vector database, embedding model, retrieval strategy, prompt template, or tracing dashboard. It is a telemetry verifier: it measures what your agent declares through deterministic memory events, and it does not claim to know hidden truth inside your memory store.
Research basis: Takahashi, K. (2026). MemoryFlow: Real-Time, Implementation-Agnostic Telemetry for Measuring Dynamic Memory Quality in LLM Agents. Zenodo. https://doi.org/10.5281/zenodo.18136347
Most agent observability tells you that retrieval happened. MemoryFlow checks whether the declared memory behavior is auditable:
- deterministic event ordering, independent of arrival order
- exact rational arithmetic for weighted metrics
- profile-specific fail-closed or downgrade behavior
- version binding with
entry_id,update_id, andcontent_digest - explicit statuses instead of silently reporting weak telemetry as valid
- no dependency on any LLM framework, vector DB, memory store, or network service
Use it as a local CLI, a CI check, a library module, or an adapter layer inside a larger evaluation system. The Python package is the reference verifier, not a framework requirement. Any agent runtime in any language can use MemoryFlow by emitting the documented JSONL event stream.
Use MemoryFlow if you build or evaluate LLM agents with mutable memory and need to know whether memory writes, reads, uses, deletions, replacements, verification failures, and corrections are auditable from declared telemetry.
Do not use MemoryFlow as a memory store, retrieval engine, embedding system, or truth oracle.
MemoryFlow does not store, retrieve, summarize, or rank memories. Systems such as vector-memory stores, graph-memory systems, or long-term agent memory frameworks decide what should be remembered. MemoryFlow asks a different question: whether the memory behavior declared by such systems is auditable, version-bound, stale-aware, correction-aware, and comparable under a declared conformance profile.
For end users after package publication:
pip install memoryflow-agent-memory-auditor
pip install "memoryflow-agent-memory-auditor[server]"
pipx install memoryflow-agent-memory-auditorFor installation directly from GitHub before a package release:
pip install "memoryflow-agent-memory-auditor @ git+https://github.com/kadubon/memoryflow-agent-memory-auditor.git"For contributors, this project uses uv.
uv sync --all-extras
uv run memoryflow --versionFor optional local HTTP endpoints:
uv sync --extra serverThe core package has no runtime dependencies and performs no outbound network calls.
Create a sample trace, validate it, and audit it:
uv run memoryflow sample --case stale-memory --out events.jsonl
uv run memoryflow validate events.jsonl
uv run memoryflow audit events.jsonl --profile P1 --format json
uv run memoryflow audit events.jsonl --profile P1 --out report.htmlInitialize a project-local config and examples:
uv run memoryflow init
uv run memoryflow audit memoryflow-examples/stale-memory.jsonl \
--config .memoryflow/config.json \
--format jsonRead a compact score:
uv run memoryflow score events.jsonl --profile P1 --format jsonCompare two traces:
uv run memoryflow diff before.jsonl after.jsonl --profile P1Explain a JSON report:
uv run memoryflow explain-report report.jsonMemoryFlow consumes JSONL. Each line is one memory event. The eight core event types are:
MOS_DECLAREMEM_WRITEMEM_REPLACEMEM_DELETEMEM_READMEM_USEMEM_VERIFYMEM_CORRECT
Every event uses this envelope:
{
"schema": "memoryflow/1.0",
"event_type": "MEM_WRITE",
"collector_id": "agent-runtime",
"collector_seq": 1,
"event_id": "evt-0001",
"obs_time": "2026-01-03T01:00:00.000Z",
"skew_budget_ms": 200
}Example write:
{
"schema": "memoryflow/1.0",
"event_type": "MEM_WRITE",
"collector_id": "agent-runtime",
"collector_seq": 2,
"event_id": "evt-0002",
"obs_time": "2026-01-03T01:00:01.000Z",
"skew_budget_ms": 200,
"entry_id": "customer-fact-17",
"content_digest": "sha256:abc123",
"update_id": "u-001",
"weight": {"num": "1", "den": "1"},
"ttl_ms": 300000,
"risk_level": 2
}Rational values are exact JSON objects such as {"num": "3", "den": "10"}.
Normative metrics do not use floating point arithmetic.
Use the weakest profile that matches the telemetry you can actually emit.
| Profile | Use When | Behavior |
|---|---|---|
P0 |
You need strict, comparable audit evidence | Fails closed on missing declares, missing version bindings, skew/order violations, and impossible transitions |
P1 |
You run production agents and can emit update ids and digests | Allows implicit creation on MEM_WRITE, but downgrades comparability where needed |
P2 |
You are onboarding legacy logs | Allows unknown-version MEM_READ; version-sensitive metrics become BEST_EFFORT or NONCOMPARABLE |
Audit output is a stable JSON certificate plus optional HTML report. Results use explicit statuses:
| Status | Meaning |
|---|---|
VALID |
Required telemetry and bindings support the result |
INVALID |
Fail-closed condition; do not trust the affected result |
DEGRADED |
Computed with a known comparability limitation |
NONCOMPARABLE |
Value exists, but should not be compared across systems/runs |
NOT_COMPUTABLE |
Required config or telemetry is absent |
BEST_EFFORT |
Legacy or P2 result with explicit limitations |
Implemented metrics include churn, uptake, read-uptake, correction latency, verified/proven write fractions (VUF/PUF), staleness/risk exposure, zombie exposure, and supersedence exposure.
The public API is small and stable enough for other tools and AI agents to call.
from pathlib import Path
from memoryflow import AuditConfig, ConformanceProfile, audit_jsonl_file
certificate = audit_jsonl_file(
Path("events.jsonl"),
config=AuditConfig(
profile=ConformanceProfile.P1,
uptake_horizon_ms=300_000,
correction_horizon_ms=300_000,
verify_deadline_ms=300_000,
),
)
data = certificate.to_dict()
print(data["status"])
print({metric["name"]: metric["status"] for metric in data["metrics"]})Emit events locally:
from memoryflow.adapters import MemoryFlowEmitter, write_jsonl
events = []
emitter = MemoryFlowEmitter("agent-runtime", skew_budget_ms=200, sink=events.append)
emitter.mos_declare("customer-fact-17")
emitter.mem_write(
"customer-fact-17",
content_digest="sha256:abc123",
update_id="u-001",
weight={"num": "1", "den": "1"},
ttl_ms=300000,
risk_level=2,
)
write_jsonl(events, "events.jsonl")The verifier is independent from the CLI, reports, adapters, dashboard, server, LLM frameworks, and vector stores.
MemoryFlow provides dependency-light helpers:
- plain JSONL reader/writer
- Python SDK emitter
- OpenTelemetry dictionary converter
- OpenInference-style custom attributes
- LangChain/LangGraph-style duck-typed callback helper
- SQLite example adapter
- generic vector metadata helper
- optional FastAPI/Uvicorn local server
Run the optional local server:
uv run --extra server memoryflow serve --host 127.0.0.1 --port 8765Server mode is unauthenticated and intended for local or private trusted networks. The CLI prints a warning if you bind to a non-loopback host.
MemoryFlow verifies declared telemetry only. It cannot prove:
- that the underlying memory store did not mutate without emitting events
- that an implementer did not fabricate structurally valid events
- that an external provenance source is true
- that a risk level is semantically correct beyond the declared risk map
This boundary is intentional. MemoryFlow makes unsupported claims explicit instead of filling gaps with inference.
- Documentation index
- Event schema
- Conformance profiles
- Metrics
- Configuration
- Integrations and module API
- Theory-to-code mapping
- Security guide
uv sync --all-extras
uv run pytest
uv run ruff check .
uv run mypy src/memoryflow
uv run bandit -q -r src/memoryflow
uv buildLicense: Apache-2.0.