Architecture

Overview

Engram is an event-sourced memory system for AI agents. It uses a two-layer design where the episode log is the source of truth and any derived structures (graphs, indices) are materialized views that can be rebuilt from scratch.

Problem Statement

Existing AI memory systems (Graphiti, Memento, etc.) couple write reliability to LLM API availability by performing entity extraction at write time. This creates fragile systems where:

Writes fail silently when LLM endpoints are unavailable
Multiple model variants produce inconsistent entity resolution
The derived knowledge graph becomes the source of truth instead of raw data
Infrastructure changes break the write path

Engram solves this by keeping the episode store as the foundation — no LLM in the write path. Every write succeeds if the database is up.

Layer 1: Episode Store (Core)

Data Model

INSTALL vss;
LOAD vss;

CREATE TABLE episodes (
    id VARCHAR PRIMARY KEY,
    content TEXT NOT NULL,
    name VARCHAR,
    source VARCHAR NOT NULL,
    source_model VARCHAR,
    source_description TEXT,
    group_id VARCHAR DEFAULT 'default',
    tags VARCHAR[],
    embedding FLOAT[768],
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    valid_at TIMESTAMP,
    expired_at TIMESTAMP,
    metadata JSON
);

-- Vector similarity index (HNSW)
CREATE INDEX idx_episodes_embedding ON episodes USING HNSW (embedding);

-- Standard indices for common query patterns
CREATE INDEX idx_episodes_created_at ON episodes (created_at DESC);
CREATE INDEX idx_episodes_group_id ON episodes (group_id);
CREATE INDEX idx_episodes_valid_at ON episodes (valid_at);
CREATE INDEX idx_episodes_expired_at ON episodes (expired_at);
CREATE INDEX idx_episodes_source ON episodes (source);

Write Path

Receive episode text + metadata
Generate embedding via Ollama (e.g., nomic-embed-text, 768 dimensions)
Insert into DuckDB
If embedding service is unavailable: insert with NULL embedding, queue for retry
Return success to caller immediately

Search

Three search modes, selectable via the search_mode parameter:

Vector (default): Finds memories by meaning. Uses HNSW vector index with cosine similarity — "deployment preferences" matches memories about CI/CD even without that exact phrase.
Keyword: Finds memories by exact words. Uses DuckDB's FTS extension (BM25 scoring) on content and name fields. No embedding required — works even when Ollama is down.
Hybrid: Combines both approaches with configurable weighting (alpha, default 0.7 favoring semantic). BM25 scores are min-max normalized to [0,1] before combining with cosine similarity.

All modes support additional filters:

Temporal: Filter by created_at, valid_at, expired_at ranges
Tag-based: List containment queries
Combined: All of the above in a single query

When a search query is received in vector or hybrid mode, the query text is embedded and results are ranked by array_cosine_similarity(). If embedding generation fails (e.g., Ollama is down), vector search falls back to chronological ordering and hybrid degrades to keyword-only.

Layer 2: Derived Knowledge Graph (Future)

A periodic batch process that reads episodes and builds entity/relationship structures. Not currently implemented. The episode store alone with semantic search provides the majority of the value.

When built:

Runs as a background job, not in the write path
Can use any graph backend
Failures don't lose data — just means the graph is stale until the next successful run
Can be rebuilt from scratch at any time from the episode log
Entity resolution happens here, with human review capability

MCP Server

Go service using the official MCP SDK, exposing tools over SSE:

Tool	Description	LLM Required
`add_memory`	Store a new episode	No
`search`	Semantic + temporal + tag search	No
`get_episodes`	Retrieve by time range, source, or group	No
`update_episode`	Modify metadata/tags/expiration	No
`get_status`	Health check	No

Episodes can be marked as expired but not deleted. This prevents accidental memory loss.

Transport

Engram uses a server-first architecture to avoid DuckDB's single-writer file lock:

engram serve (default) — HTTP server owning the DuckDB database, exposing MCP over SSE at /mcp/sse, REST API at /api/v1/*, health probes at /health and /ready. This is the only process that touches the database.
engram stdio — Thin stateless proxy that bridges stdin/stdout JSON-RPC to the server via SSE. For clients that don't support SSE natively (e.g., Claude Desktop). Uses mcp-go/client/transport.SSE for robust endpoint discovery and session management.

SSE is the primary transport. Clients that support it (Cursor, Claude Code) connect directly to http://localhost:3490/mcp/sse. The stdio proxy is a compatibility shim for clients that only speak stdio.

Infrastructure

Database: DuckDB with VSS and FTS extensions — single-file, portable, HNSW indexing for vector search, BM25 indexing for full-text search, native LIST and JSON support
Application: Go with official MCP SDK — single static binary, cross-platform
Embeddings: Ollama — local generation, OpenAI-compatible /v1/embeddings endpoint, no external API costs
Default port: 3490 (configurable via ENGRAM_PORT)

Deployment Options

Native binary: Single executable + DuckDB file. Run engram serve as a background service. See MCP Integration Guide for platform-specific instructions.

Docker container: Multi-stage build using Debian Bookworm (glibc compatibility). See the Dockerfile.

Kubernetes: Deployment with PersistentVolume for the .duckdb file. Requires ingress configuration for SSE support (no buffering, long timeouts).

Design Principles

Writes never fail (if the database is up)
No LLM in the write path — embeddings only, and those are retryable
Episode log is source of truth — everything else is derived
Rebuild over repair — if derived data is wrong, rebuild from episodes
Simple over clever — vector search covers 80% of use cases without a graph
Multi-tenant by default — group_id supports multiple users/contexts
Observable — every episode records which client and model wrote it

Current Limitations

Hardcoded embedding dimension: Schema uses FLOAT[768] (tied to nomic-embed-text)
FTS index rebuild scales linearly: DuckDB FTS doesn't support incremental updates, so the full-text index is rebuilt lazily (on the next keyword/hybrid search after a write). This is imperceptible under 1K episodes, takes 1–5 seconds at 1K–10K, and may need a different strategy beyond 10K.

Future Roadmap

Layer 2 knowledge graph with entity extraction (v3)
Memory consolidation and summarization via Dreamer service (v3)
Support for multiple embedding models and dimensions
Batch embedding generation for bulk imports

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Overview

Problem Statement

Layer 1: Episode Store (Core)

Data Model

Write Path

Search

Layer 2: Derived Knowledge Graph (Future)

MCP Server

Transport

Infrastructure

Deployment Options

Design Principles

Current Limitations

Future Roadmap

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture

Overview

Problem Statement

Layer 1: Episode Store (Core)

Data Model

Write Path

Search

Layer 2: Derived Knowledge Graph (Future)

MCP Server

Transport

Infrastructure

Deployment Options

Design Principles

Current Limitations

Future Roadmap