Your agents deserve fresh context.

Star us ❤️ → · · ·

CocoIndex turns codebases, meeting notes, inboxes, Slack, PDFs, and videos into live, continuously fresh context for your AI agents and LLM apps to reason over effectively — with minimal incremental processing. Get your production AI agent ready in 10 minutes with reliable, continuously fresh data — no stale batches, no context gap

Incremental · only the delta · Any scale · parallel by default · Declarative · Python, 5 min

Built with CocoIndex ❤️

See all 20+ examples · updated every week →

Get started

pip install -U --pre cocoindex     # v1 is in preview — the --pre flag is required

Declare what should be in your target — CocoIndex keeps it in sync forever, recomputing only the Δ.

import cocoindex as coco
from cocoindex.connectors import localfs, postgres
from cocoindex.ops.text import RecursiveSplitter

@coco.fn(memo=True)                          # ← cached by hash(input) + hash(code)
async def index_file(file, table):
    for chunk in RecursiveSplitter().split(await file.read_text()):
        table.declare_row(text=chunk.text, embedding=embed(chunk.text))

@coco.fn
async def main(src):
    table = await postgres.mount_table_target(PG, table_name="docs")
    table.declare_vector_index(column="embedding")
    await coco.mount_each(index_file, localfs.walk_dir(src).items(), table)

coco.App(coco.AppConfig(name="docs"), main, src="./docs").update_blocking()

Run once to backfill. Re-run anytime — only the changed files re-embed.

React — for data engineering

See the React ↔ CocoIndex mental model →

Incremental engine for long-horizon agents

Data transformation for any engineer, designed for AI workloads —
with a smart incremental engine for always-fresh, explainable data.

Why incremental?

Your agents are only as good as the data they see.
Batch pipelines drift stale. CocoIndex stays live — and only runs the Δ.

$Why incremental? — one illustration combining the four core benefits of CocoIndex's incremental engine. Sub-second fresh (mint): a stopwatch ticking under a second, source changes propagate to the target in under a second so agents see the world as it is, not as it was yesterday. 10× cheaper at scale (yellow): a 10,000-row corpus block where only a thin Δ 0.1% column re-runs and 99.9% stays cached — you skip the other 99.9% of your corpus and pay a fraction of the compute, embedding, and LLM bill. Explainable by default (coral): a lineage thread links a source byte (handbook.md L42) to a target vector — every vector, row, or graph node in the target traces back to its exact source byte for debuggable, auditable, regulator-friendly AI pipelines. Production-grade (purple): a shield stamped with the Rust crab surrounded by retry loops, back-off dots, a DLQ tray, and a no-data-loss check — Rust core with retries, exponential back-off, dead-letter queues, and no-data-loss guarantees, production-ready for long-horizon AI agents. Keywords: incremental indexing, Δ-only reprocessing, sub-second freshness, low-latency RAG, cost-efficient embeddings, data lineage, retrieval-augmented generation, Rust core, retries, back-off, dead letters, no data loss, long-horizon agents.$

What can you build?

See all 20+ examples · updated every week →

Working starters from the examples tree — clone, plug your source, ship.

Share what you build — a banner with a trail of tiny hearts rising from the bottom behind the text, inviting the CocoIndex community to share projects built with the framework

Building something with CocoIndex? We want to see it.
Tag @cocoindex_io on X or drop a link in #showcase on Discord. We'll boost it. 🥥

Community

Read the CocoIndex blog — engineering deep dives, release notes, RAG and knowledge graph tutorials, and case studies

Follow @cocoindex_io on X (formerly Twitter) for release notes, demos, launches, and AI data pipeline updates

We are so excited to meet you.
Every typo fix, new connector, doc tweak, or full-on rewrite makes CocoIndex better.
Come hang out — big PRs and small ones, both welcome.

📝 Read the contributing guide · 🐛 good first issues · 💬 Say hi on Discord

CocoIndex Enterprise

Large corpus — built for enterprise scale.

Incremental compute is the only way to keep large corpora fresh without re-embedding them every cycle.
CocoIndex scales from a single repo to petabyte-scale stores — parallel by default, delta-only by design.

Process once. Reconcile forever.

When a source changes, CocoIndex identifies the affected records, propagates the change
across joins and lookups, updates the target, and retires stale rows —
without touching anything that didn't change.

Built on a Rust engine.

The core is Rust — production-grade from day zero.
Parallel chunking, zero-copy transforms where possible, and failure isolation
so one bad record doesn't stall the flow.

Name		Name	Last commit message	Last commit date
Latest commit History 1,699 Commits
.cargo		.cargo
.claude		.claude
.github		.github
dev		dev
docs		docs
examples		examples
python		python
rust		rust
skills/cocoindex		skills/cocoindex
.env.lib_debug		.env.lib_debug
.gitignore		.gitignore
.lycheeignore		.lycheeignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
about.hbs		about.hbs
about.toml		about.toml
opencode.json		opencode.json
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Your agents deserve fresh context.