Thin integration between Karpathy autoresearch (submodule at upstream/autoresearch) and AIngram for persistent experimental memory across loops.
- User guide — Full walkthrough (setup → train →
results.tsv→ remember/recall), architecture and pipeline diagram, CLI reference (includingconsolidate). - Multi-agent modes — Three concurrency modes (mock / solo / swarm) that share the AIngram memory layer, plus per-round / post-run consolidation (AIngram ≥ 1.2.1).
- program.aingram.md — Instructions for coding agents running the recall/edit/train/remember loop.
Upstream uses Python 3.10 in .python-version; this integration targets Python 3.11+ because AIngram requires it. Training has been smoke-tested on 3.11+ in many environments; use the same interpreter family for both the memory CLI and training when possible.
| Piece | Behavior |
|---|---|
uv run train.py (once) |
Runs one timed experiment (default 5 minutes wall-clock training per prepare.py), prints val_bpb and friends, then exits. It does not loop forever and does not call AIngram. |
| Full “autoresearch” | A coding agent (or you manually) follows program.aingram.md: recall → edit train.py → commit → run training → log results.tsv → autoresearch-memory remember → repeat (LOOP FOREVER in that doc). |
| AIngram | Written only when you run autoresearch-memory remember (or remember-from-log after capturing output). Skipping that step leaves the DB empty even if training succeeded. |
For a quick manual run, redirect output and push one memory entry:
cd upstream/autoresearch
uv run train.py > run.log 2>&1
cd ../..
autoresearch-memory remember-from-log --log upstream/autoresearch/run.log --cwd upstream/autoresearch(Full loops should still use results.tsv + remember as in program.aingram.md.)
- Python 3.11+
- Git
- uv (for upstream
train.py/prepare.py) — optional but matches upstream docs - GPU setup per upstream autoresearch
git clone https://github.com/jaybizz/aingram-ar autoresearch-aingram
cd autoresearch-aingram
git submodule update --init --recursiveThis repo uses two separate Python environments:
| Location | Purpose | Typical Python |
|---|---|---|
Repo root .venv |
autoresearch-memory + AIngram (pip install -e .) |
3.11+ |
upstream/autoresearch/.venv |
Karpathy train.py / prepare.py (uv sync) |
3.10 (per upstream) |
Do not keep the root .venv activated when you run uv sync or uv run inside upstream/autoresearch. If VIRTUAL_ENV points at the parent repo, uv will warn that it does not match the submodule’s project .venv and will ignore the active env.
Before uv sync in the submodule: run deactivate (or open a new terminal), cd upstream/autoresearch, then uv sync. Use a separate terminal tab for root (pip / autoresearch-memory) vs upstream (uv run train.py) if that helps.
cd /path/to/autoresearch-aingram
python -m venv .venv
# Windows:
.venv\Scripts\activate
# Unix:
# source .venv/bin/activate
pip install -e .Or with uv:
uv venv && uv pip install -e .This installs autoresearch-memory and pulls in aingram.
- Default file:
<integration-root>/.aingram/autoresearch.db(created on first use; cwd is the process cwd, so run CLI from the integration root unless you override). - Override: set
AINGRAM_DBto an absolute path to the SQLite file.
AIngram stores an optional graph in the same SQLite file. Each remember enqueues entity-extraction work, but nothing processes that queue until you run a worker. This repo wires that through autoresearch-memory graph-drain (uses GLiNER on CPU; first run may download urchade/gliner_medium-v2.1).
From the integration root (with root .venv / uv run):
# Build/update graph nodes (and optional edges if Ollama is up)
autoresearch-memory graph-drain
# optional: also ask Ollama for relationship JSON when ≥2 entities appear in one entry
autoresearch-memory graph-drain --with-ollama
# Inspect (stock AIngram CLI; same --db path)
uv run aingram --no-telemetry --db .aingram/autoresearch.db entities
uv run aingram --no-telemetry --db .aingram/autoresearch.db graph "val_bpb"If you use a custom DB path, set AINGRAM_DB or pass --db to both graph-drain and aingram.
AIngram ≥ 1.2.1 can consolidate the SQLite store (decay, contradiction detection, merge/synthesis). The autoresearch-memory consolidate command runs that on demand. Mock and solo orchestration call consolidation after each iteration; swarm calls it once after the run — see docs/MULTI_AGENT.md for defaults (deberta vs llm) and --consolidation-backend. Ordinary single-agent remember / recall loops do not consolidate unless you run consolidate yourself.
From a shell where the integration root .venv is not active (see Two virtual environments above):
cd upstream/autoresearch
uv sync # or follow upstream README
uv run prepare.py --num-shards 10 # first-time data/tokenizer
uv run train.pyTinyStories (Karpathy tinystories-gpt4-clean, recommended for smaller GPUs): the submodule prepare.py can switch caches when AUTORESEARCH_DATASET=tinystories (or uv run prepare.py --dataset tinystories; the flag is read from argv before imports). Data and tokenizer go under ~/.cache/autoresearch/data_tinystories and tokenizer_tinystories. Run train.py with the same env so import prepare picks the same paths.
# Windows PowerShell
$env:AUTORESEARCH_DATASET = "tinystories"
uv run prepare.py --dataset tinystories
uv run train.py# Unix
export AUTORESEARCH_DATASET=tinystories
uv run prepare.py --dataset tinystories
uv run train.pyIf you see: VIRTUAL_ENV=... does not match the project environment path .venv, you still have the wrong venv activated. Run deactivate, then cd upstream/autoresearch and uv sync again — the warning should disappear.
Run experiments from upstream/autoresearch as in the upstream README.
Stock Karpathy train.py loads Flash Attention 3 via the kernels package; Hugging Face artifacts do not include a working Windows build variant, so you may see FileNotFoundError: ... torch29-cu128-x86_64-windows.
This repo’s submodule copy of upstream/autoresearch/train.py is patched to:
- Use
torch.nn.functional.scaled_dot_product_attentionwith sliding-window + GQA semantics on Windows (and whenAUTORESEARCH_DISABLE_FA3=1on Linux). - Skip
torch.compileon the model and on the fused AdamW/Muon optimizer steps (both use Inductor and require Triton, which Windows CUDA builds do not ship). SetAUTORESEARCH_DISABLE_COMPILE=1on Linux to force the same eager behavior.
Upstream also lists jsegov/autoresearch-win-rtx for Windows-focused work. If you git submodule update to a clean upstream commit, re-apply or rebase these train.py changes (or maintain a fork of the submodule).
Upstream defaults (train.py) assume a large GPU (Karpathy’s README targets H100-class). Activation memory scales with DEVICE_BATCH_SIZE * MAX_SEQ_LEN (plus model width, depth, and optimizer state). DEVICE_BATCH_SIZE = 128 with MAX_SEQ_LEN = 2048 from prepare.py is often too much for 8 GB and can CUDA OOM.
Primary fix: lower DEVICE_BATCH_SIZE in train.py until training runs (try 8–32 before changing the model). Global tokens per optimizer step are fixed by TOTAL_BATCH_SIZE (2**19 by default); microbatch tokens are DEVICE_BATCH_SIZE * MAX_SEQ_LEN. Gradient accumulation is TOTAL_BATCH_SIZE // (DEVICE_BATCH_SIZE * MAX_SEQ_LEN), so you must keep:
TOTAL_BATCH_SIZE % (DEVICE_BATCH_SIZE * MAX_SEQ_LEN) == 0.
Example: MAX_SEQ_LEN = 2048, DEVICE_BATCH_SIZE = 16 → 16 * 2048 = 32768 tokens per microbatch; 524288 / 32768 = 16 accumulation steps (valid).
If still tight: reduce DEPTH, set WINDOW_PATTERN = "L" (full causal only; cheaper than "SSSL"). See upstream README “smaller compute” for more knobs.
Optional (prepare.py): lowering MAX_SEQ_LEN and EVAL_TOKENS frees the most activation memory but changes the setup; re-run prepare.py / align tokenizer as needed. For fully autonomous agent runs, program.aingram.md keeps prepare.py read-only unless the human explicitly asks for a laptop memory profile.
Docs note: Mentions of “RTX 3080 12 GB” in the platform documentation refer to AIngram embedding search scale, not a promise that Karpathy train.py defaults fit that GPU.
Onboard AMD NPU: train.py uses PyTorch CUDA on the NVIDIA GPU only. The integrated NPU is not used by this stack; seeing 0% NPU in Task Manager during training is expected. AIngram embeddings use ONNX (CPU/CUDA by default); targeting an NPU execution provider would be separate experimentation.
Use program.aingram.md instead of upstream/autoresearch/program.md when you want recall/remember steps. Copy it into the agent prompt or replace program.md locally in the upstream folder if your workflow expects that filename.
From the integration root with venv activated:
set AINGRAM_DB=%CD%\.aingram\test.db
# Unix: export AINGRAM_DB="$PWD/.aingram/test.db"
autoresearch-memory remember --commit deadbeef --val-bpb 1.0 --status keep --description "cli smoke" --lesson "smoke test ok"
autoresearch-memory recall "smoke test" --limit 3If you use uv at the integration root:
uv lockCommit uv.lock when present. If uv is unavailable, pip install -e . still works using pyproject.toml only.
Contributions are welcome. See CONTRIBUTING.md for development setup, test commands, code style, and PR guidelines.
MIT for integration scaffolding. See LICENSE. Upstream autoresearch and AIngram have their own licenses.