Skip to content

bozbuilds/aingram-AR

Repository files navigation

autoresearch + AIngram

License: MIT Python 3.11+ PyPI

Thin integration between Karpathy autoresearch (submodule at upstream/autoresearch) and AIngram for persistent experimental memory across loops.

Documentation

  • User guide — Full walkthrough (setup → train → results.tsv → remember/recall), architecture and pipeline diagram, CLI reference (including consolidate).
  • Multi-agent modes — Three concurrency modes (mock / solo / swarm) that share the AIngram memory layer, plus per-round / post-run consolidation (AIngram ≥ 1.2.1).
  • program.aingram.md — Instructions for coding agents running the recall/edit/train/remember loop.

Upstream uses Python 3.10 in .python-version; this integration targets Python 3.11+ because AIngram requires it. Training has been smoke-tested on 3.11+ in many environments; use the same interpreter family for both the memory CLI and training when possible.

What train.py does vs full autoresearch

Piece Behavior
uv run train.py (once) Runs one timed experiment (default 5 minutes wall-clock training per prepare.py), prints val_bpb and friends, then exits. It does not loop forever and does not call AIngram.
Full “autoresearch” A coding agent (or you manually) follows program.aingram.md: recall → edit train.py → commit → run training → log results.tsvautoresearch-memory remember → repeat (LOOP FOREVER in that doc).
AIngram Written only when you run autoresearch-memory remember (or remember-from-log after capturing output). Skipping that step leaves the DB empty even if training succeeded.

For a quick manual run, redirect output and push one memory entry:

cd upstream/autoresearch
uv run train.py > run.log 2>&1
cd ../..
autoresearch-memory remember-from-log --log upstream/autoresearch/run.log --cwd upstream/autoresearch

(Full loops should still use results.tsv + remember as in program.aingram.md.)

Prerequisites

  • Python 3.11+
  • Git
  • uv (for upstream train.py / prepare.py) — optional but matches upstream docs
  • GPU setup per upstream autoresearch

Clone

git clone https://github.com/jaybizz/aingram-ar autoresearch-aingram
cd autoresearch-aingram
git submodule update --init --recursive

Two virtual environments (important)

This repo uses two separate Python environments:

Location Purpose Typical Python
Repo root .venv autoresearch-memory + AIngram (pip install -e .) 3.11+
upstream/autoresearch/.venv Karpathy train.py / prepare.py (uv sync) 3.10 (per upstream)

Do not keep the root .venv activated when you run uv sync or uv run inside upstream/autoresearch. If VIRTUAL_ENV points at the parent repo, uv will warn that it does not match the submodule’s project .venv and will ignore the active env.

Before uv sync in the submodule: run deactivate (or open a new terminal), cd upstream/autoresearch, then uv sync. Use a separate terminal tab for root (pip / autoresearch-memory) vs upstream (uv run train.py) if that helps.

Install the memory CLI (integration root)

cd /path/to/autoresearch-aingram
python -m venv .venv
# Windows:
.venv\Scripts\activate
# Unix:
# source .venv/bin/activate

pip install -e .

Or with uv:

uv venv && uv pip install -e .

This installs autoresearch-memory and pulls in aingram.

Database path

  • Default file: <integration-root>/.aingram/autoresearch.db (created on first use; cwd is the process cwd, so run CLI from the integration root unless you override).
  • Override: set AINGRAM_DB to an absolute path to the SQLite file.

Knowledge graph (entities + relationships)

AIngram stores an optional graph in the same SQLite file. Each remember enqueues entity-extraction work, but nothing processes that queue until you run a worker. This repo wires that through autoresearch-memory graph-drain (uses GLiNER on CPU; first run may download urchade/gliner_medium-v2.1).

From the integration root (with root .venv / uv run):

# Build/update graph nodes (and optional edges if Ollama is up)
autoresearch-memory graph-drain
# optional: also ask Ollama for relationship JSON when ≥2 entities appear in one entry
autoresearch-memory graph-drain --with-ollama

# Inspect (stock AIngram CLI; same --db path)
uv run aingram --no-telemetry --db .aingram/autoresearch.db entities
uv run aingram --no-telemetry --db .aingram/autoresearch.db graph "val_bpb"

If you use a custom DB path, set AINGRAM_DB or pass --db to both graph-drain and aingram.

Memory consolidation (contradictions and decay)

AIngram ≥ 1.2.1 can consolidate the SQLite store (decay, contradiction detection, merge/synthesis). The autoresearch-memory consolidate command runs that on demand. Mock and solo orchestration call consolidation after each iteration; swarm calls it once after the run — see docs/MULTI_AGENT.md for defaults (deberta vs llm) and --consolidation-backend. Ordinary single-agent remember / recall loops do not consolidate unless you run consolidate yourself.

Upstream training (submodule)

From a shell where the integration root .venv is not active (see Two virtual environments above):

cd upstream/autoresearch
uv sync   # or follow upstream README
uv run prepare.py --num-shards 10   # first-time data/tokenizer
uv run train.py

TinyStories (Karpathy tinystories-gpt4-clean, recommended for smaller GPUs): the submodule prepare.py can switch caches when AUTORESEARCH_DATASET=tinystories (or uv run prepare.py --dataset tinystories; the flag is read from argv before imports). Data and tokenizer go under ~/.cache/autoresearch/data_tinystories and tokenizer_tinystories. Run train.py with the same env so import prepare picks the same paths.

# Windows PowerShell
$env:AUTORESEARCH_DATASET = "tinystories"
uv run prepare.py --dataset tinystories
uv run train.py
# Unix
export AUTORESEARCH_DATASET=tinystories
uv run prepare.py --dataset tinystories
uv run train.py

If you see: VIRTUAL_ENV=... does not match the project environment path .venv, you still have the wrong venv activated. Run deactivate, then cd upstream/autoresearch and uv sync again — the warning should disappear.

Run experiments from upstream/autoresearch as in the upstream README.

Windows (native CUDA)

Stock Karpathy train.py loads Flash Attention 3 via the kernels package; Hugging Face artifacts do not include a working Windows build variant, so you may see FileNotFoundError: ... torch29-cu128-x86_64-windows.

This repo’s submodule copy of upstream/autoresearch/train.py is patched to:

  1. Use torch.nn.functional.scaled_dot_product_attention with sliding-window + GQA semantics on Windows (and when AUTORESEARCH_DISABLE_FA3=1 on Linux).
  2. Skip torch.compile on the model and on the fused AdamW/Muon optimizer steps (both use Inductor and require Triton, which Windows CUDA builds do not ship). Set AUTORESEARCH_DISABLE_COMPILE=1 on Linux to force the same eager behavior.

Upstream also lists jsegov/autoresearch-win-rtx for Windows-focused work. If you git submodule update to a clean upstream commit, re-apply or rebase these train.py changes (or maintain a fork of the submodule).

Laptop / ~8 GB VRAM (e.g. RTX 4060 Laptop)

Upstream defaults (train.py) assume a large GPU (Karpathy’s README targets H100-class). Activation memory scales with DEVICE_BATCH_SIZE * MAX_SEQ_LEN (plus model width, depth, and optimizer state). DEVICE_BATCH_SIZE = 128 with MAX_SEQ_LEN = 2048 from prepare.py is often too much for 8 GB and can CUDA OOM.

Primary fix: lower DEVICE_BATCH_SIZE in train.py until training runs (try 8–32 before changing the model). Global tokens per optimizer step are fixed by TOTAL_BATCH_SIZE (2**19 by default); microbatch tokens are DEVICE_BATCH_SIZE * MAX_SEQ_LEN. Gradient accumulation is TOTAL_BATCH_SIZE // (DEVICE_BATCH_SIZE * MAX_SEQ_LEN), so you must keep:

TOTAL_BATCH_SIZE % (DEVICE_BATCH_SIZE * MAX_SEQ_LEN) == 0.

Example: MAX_SEQ_LEN = 2048, DEVICE_BATCH_SIZE = 1616 * 2048 = 32768 tokens per microbatch; 524288 / 32768 = 16 accumulation steps (valid).

If still tight: reduce DEPTH, set WINDOW_PATTERN = "L" (full causal only; cheaper than "SSSL"). See upstream README “smaller compute” for more knobs.

Optional (prepare.py): lowering MAX_SEQ_LEN and EVAL_TOKENS frees the most activation memory but changes the setup; re-run prepare.py / align tokenizer as needed. For fully autonomous agent runs, program.aingram.md keeps prepare.py read-only unless the human explicitly asks for a laptop memory profile.

Docs note: Mentions of “RTX 3080 12 GB” in the platform documentation refer to AIngram embedding search scale, not a promise that Karpathy train.py defaults fit that GPU.

Onboard AMD NPU: train.py uses PyTorch CUDA on the NVIDIA GPU only. The integrated NPU is not used by this stack; seeing 0% NPU in Task Manager during training is expected. AIngram embeddings use ONNX (CPU/CUDA by default); targeting an NPU execution provider would be separate experimentation.

Agent instructions

Use program.aingram.md instead of upstream/autoresearch/program.md when you want recall/remember steps. Copy it into the agent prompt or replace program.md locally in the upstream folder if your workflow expects that filename.

CLI smoke test

From the integration root with venv activated:

set AINGRAM_DB=%CD%\.aingram\test.db
# Unix: export AINGRAM_DB="$PWD/.aingram/test.db"

autoresearch-memory remember --commit deadbeef --val-bpb 1.0 --status keep --description "cli smoke" --lesson "smoke test ok"
autoresearch-memory recall "smoke test" --limit 3

Lockfile

If you use uv at the integration root:

uv lock

Commit uv.lock when present. If uv is unavailable, pip install -e . still works using pyproject.toml only.

Contributing

Contributions are welcome. See CONTRIBUTING.md for development setup, test commands, code style, and PR guidelines.

License

MIT for integration scaffolding. See LICENSE. Upstream autoresearch and AIngram have their own licenses.

About

Fork of karpathy/autoresearch that's able to be run on consumer hardware, tested on a laptop w/ 4060 and 8GB VRAM

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages