Star usΒ β€οΈΒ βΒ
CocoIndex turns codebases, meeting notes, inboxes, Slack, PDFs, and videos into live, continuously fresh context for your AI agents and LLM apps to reason over effectively β with minimal incremental processing. Get your production AI agent ready in 10 minutes with reliable, continuously fresh data β no stale batches, no context gap
Incremental Β· only the delta Β Β·Β Any scale Β· parallel by default Β Β·Β Declarative Β· Python, 5 min
Deutsch | English | EspaΓ±ol | franΓ§ais | ζ₯ζ¬θͺ | νκ΅μ΄ | PortuguΓͺs | Π ΡΡΡΠΊΠΈΠΉ | δΈζ
See all 20+ examples Β· updated every week β
pip install -U --pre cocoindex # v1 is in preview β the --pre flag is requiredDeclare what should be in your target β CocoIndex keeps it in sync forever, recomputing only the Ξ.
import cocoindex as coco
from cocoindex.connectors import localfs, postgres
from cocoindex.ops.text import RecursiveSplitter
@coco.fn(memo=True) # β cached by hash(input) + hash(code)
async def index_file(file, table):
for chunk in RecursiveSplitter().split(await file.read_text()):
table.declare_row(text=chunk.text, embedding=embed(chunk.text))
@coco.fn
async def main(src):
table = await postgres.mount_table_target(PG, table_name="docs")
table.declare_vector_index(column="embedding")
await coco.mount_each(index_file, localfs.walk_dir(src).items(), table)
coco.App(coco.AppConfig(name="docs"), main, src="./docs").update_blocking()Run once to backfill. Re-run anytime β only the changed files re-embed.
See the React β CocoIndex mental model β
Data transformation for any engineer, designed for AI workloads β
with a smart incremental engine for always-fresh, explainable data.
Your agents are only as good as the data they see.
Batch pipelines drift stale. CocoIndex stays live β and only runs the Ξ.
See all 20+ examples Β· updated every week β
Working starters from the examples tree β clone, plug your source, ship.
Building something with CocoIndex? We want to see it.
Tag @cocoindex_io on X or drop a link in #showcase on Discord. We'll boost it. π₯₯
|
|
|
|
|
We are so excited to meet you.
Every typo fix, new connector, doc tweak, or full-on rewrite makes CocoIndex better.
Come hang out β big PRs and small ones, both welcome.
π Read the contributing guide Β Β·Β π good first issues Β Β·Β π¬ Say hi on Discord
Incremental compute is the only way to keep large corpora fresh without re-embedding them every cycle.
CocoIndex scales from a single repo to petabyte-scale stores β parallel by default, delta-only by design.
When a source changes, CocoIndex identifies the affected records, propagates the change
across joins and lookups, updates the target, and retires stale rows β
without touching anything that didn't change.
The core is Rust β production-grade from day zero.
Parallel chunking, zero-copy transforms where possible, and failure isolation
so one bad record doesn't stall the flow.
Apache 2.0 Β· Β© CocoIndex contributors π₯₯