PaperMem

PaperMem is a local-first memory system and explainable dashboard for AI agents and chatbots, built for Georgia Tech CSE6242 Data and Visual Analytics. It combines an Electron desktop shell, a React graph-and-chat interface, a FastAPI memory backend, Postgres + pgvector for retrieval, and Neo4j for logical relation graphs.

The current implementation is optimized for two memory sources:

conversation history generated inside the app
uploaded documents such as PDF, Markdown, and plain text files

The system is designed to make retrieval visible. Every chat response is backed by retrieved memory units, graph expansion, and evidence cards that the user can inspect directly in the UI.

What The System Does

stores chat turns and uploaded file chunks as unified memory units
embeds all memory units into pgvector for semantic retrieval
extracts lightweight semantic structure with an LLM-first pipeline plus local fallback
builds a logical knowledge graph in Neo4j using claims, concepts, entities, and relations
streams answers from the backend to the chat UI
renders a large interactive memory graph with semantic zoom and retrieval-driven overlays
supports a desktop quick-capture workflow through Electron

Current Architecture

User
  |
  v
Electron desktop shell
  |- main window: PaperMem dashboard
  |- floating upload capsule
  |- desktop integrations / IPC bridge
  |
  v
React frontend (Vite + Tailwind + vis-network)
  |- project sidebar
  |- chat sessions and streaming messages
  |- evidence / focus display
  |- semantic zoom graph
  |
  v
FastAPI backend
  |- project + chat session APIs
  |- file ingestion APIs
  |- chat streaming API
  |- graph retrieval APIs
  |
  +--> Postgres + pgvector
  |     |- projects
  |     |- chat sessions / messages
  |     |- source files
  |     |- memory units + embeddings
  |     |- retrieval / reasoning records
  |
  +--> Neo4j
        |- semantic nodes
        |- logical relations
        |- overlay graph for retrieval visualization

End-To-End Flow

1. File Ingestion

A user uploads a file from the Electron capsule or desktop UI.
The backend parses the file locally.
The file is chunked into memory-sized segments.
Each chunk is embedded and stored in Postgres.
The semantic extraction service produces claims, concepts, entities, and display labels.
The graph service writes semantic nodes and logical edges into Neo4j.
The Electron shell sends an event back to the renderer so the graph can refresh automatically.

2. Conversation Ingestion

A user sends a chat message in the dashboard.
The backend embeds the query and retrieves nearby memory units from pgvector.
The backend expands the retrieved set using logical graph relations in Neo4j.
The backend streams an answer from the configured LLM.
The user turn and assistant turn are persisted as memory units.
The conversation is folded back into the same memory + graph pipeline used for files.

3. Visualization

The frontend requests a graph snapshot for the active project.
The backend computes graph scores and returns graph nodes and edges.
The frontend applies client-side semantic zoom.
Important nodes remain visible first, while lower-priority nodes appear as the user zooms in.
Retrieval-related nodes and evidence appear in the chat area for the corresponding assistant response.

Frontend

The frontend lives in frontend and is a single-page React application served by Vite.

Main Responsibilities

render the split layout: graph area + chat area
manage project selection and chat sessions
stream assistant responses in real time
display evidence and retrieval focus for each assistant message
render the graph with vis-network
apply semantic zoom behavior client-side
refresh the graph after uploads or manual user action

Important Files

frontend/src/App.jsx
- main application state
- chat message rendering
- evidence / focus UI
- project and session management
- graph refresh and semantic zoom logic
frontend/src/index.css
- global layout rules
- Tailwind entrypoint
- shared utility styles such as hidden scrollbars
frontend/src/main.jsx
- app bootstrapping and error boundary mounting

Frontend Graph Model

The graph is not rendered as a raw dump of every memory unit. Instead, the frontend applies a layered view:

high-importance nodes stay visible at low zoom
bridge / gatekeeper nodes are preserved to keep communities connected
node size reflects local importance rather than all nodes shrinking uniformly
the graph can be refreshed after new ingestion or further conversation

The UI also distinguishes between:

the global project memory map
retrieval-driven overlays relevant to the current conversation

Electron Shell

The Electron shell lives in electron.

Responsibilities

launch the desktop window that hosts the PaperMem dashboard
expose a preload bridge so the renderer can call trusted native features
manage the floating upload capsule
notify the frontend when file ingestion completes
keep the desktop experience available outside a plain browser workflow

Important Files

electron/main.js
- window lifecycle
- IPC handlers
- upload completion notifications
electron/preload.js
- safe bridge exposed on window.paperMem
electron/dropzone.html
- floating upload capsule UI

Backend

The backend lives in backend and is a FastAPI service that owns memory ingestion, retrieval, graph updates, and chat streaming.

Main Responsibilities

manage projects, sessions, messages, and files
parse uploaded files locally
chunk and embed text
persist memory units into Postgres
build and query logical graphs in Neo4j
retrieve evidence for chat queries
stream assistant responses and persist retrieval metadata

Important Files

backend/app/main.py
- FastAPI application
- API routes for projects, sessions, files, graphs, and chat
backend/app/config.py
- central settings model
- all required environment configuration
backend/app/database.py
- SQLAlchemy engine/session setup
- pgvector extension initialization
backend/app/models.py
- SQLAlchemy models for messages, memory units, files, and retrieval events
backend/app/schemas.py
- request / response schema definitions
backend/app/openrouter_client.py
- LLM client wrapper
backend/app/reasoner_agent.py
- answer generation layer

Backend Services

backend/app/services/file_parser.py
- parses PDF / Markdown / TXT locally
- sanitizes text for storage
backend/app/services/embedding_service.py
- generates embeddings
- normalizes vector dimensions to the configured size
backend/app/services/semantic_service.py
- extracts claims, concepts, entities, and display labels
- uses an LLM-first approach with local fallback
backend/app/services/graph_service.py
- upserts semantic nodes and logical relations in Neo4j
- builds graph payloads for the frontend
backend/app/services/memory_service.py
- orchestrates ingestion, persistence, retrieval, evidence creation, and trace metadata

Storage Layer

Postgres + pgvector

Postgres is the system of record for structured application data and semantic retrieval.

It stores:

projects
chat sessions
chat messages
source file records
memory units
embeddings
retrieval events
reasoning trace metadata

The key design choice is to unify file chunks and conversation turns under the same MemoryUnit abstraction, so the same retrieval and graph-building pipeline can operate on both.

Neo4j

Neo4j stores the semantic graph used for explainable reasoning and large-scale graph visualization.

The graph contains:

semantic nodes such as claims, concepts, and entities
links from memory units to semantic nodes
logical relations such as support, causality, contradiction, elaboration, and association
retrieval overlays used to visualize the current reasoning context

This graph is intentionally more logical than provenance-heavy so that the UI feels closer to a mind map than a file tree.

Memory Pipeline

The current memory pipeline is intentionally lighter than the original GauzRag pipeline.

Ingestion Pipeline

parse or receive text
sanitize text
chunk text
embed chunks
select a deterministic subset of chunks for LLM semantic labeling
extract semantic bundles with either LLM labels or local fallback labels
store memory units in Postgres
write semantic nodes and relations to Neo4j

Retrieval Pipeline

embed query
optionally scope retrieval to a selected source file
combine lexical exact-match recall with semantic nearest-neighbor search in pgvector
expand through adjacent source chunks and graph relations
collect evidence anchors under a context budget
stream grounded answer generation
persist retrieval metadata and reasoning records

This keeps the system generalizable and fast enough for interactive use, while still surfacing semantic structure for visualization.

Semantic Extraction

The semantic extraction stage produces:

claims
concepts
entities
relation candidates
display_label values for graph-friendly rendering

PaperMem uses a hybrid semantic-labeling strategy so ingestion remains practical for long papers, reports, notes, contracts, and deployed environments.

Every uploaded file chunk becomes a MemoryUnit, receives an embedding, is stored in Postgres, and is written into the Neo4j graph. The system does not drop unsampled chunks from graph retrieval. Instead, chunks differ only in how their semantic labels are produced:

roughly one fifth of file chunks receive higher-quality LLM semantic labels
the remaining chunks receive fast local fallback labels
all chunks still participate in vector retrieval, graph rendering, graph expansion, and evidence selection

The LLM-labeled subset is selected with deterministic MMR-style sampling rather than pure randomness or a paper-specific section heuristic. PaperMem first keeps the first and last chunk, then scores candidate chunks using local salience signals such as length, numeric values, acronyms, named entities, heading-like text, and words such as summary, result, method, limitation, risk, requirement, decision, experiment, and evaluation. It then uses chunk embeddings to prefer candidates that cover different topics from chunks already selected. A small position-spread term prevents the selected chunks from collapsing into one nearby part of the document when embeddings are similar.

This is useful because user uploads are not always papers. A position-only strategy assumes structures like intro, method, results, and conclusion, which can fail for meeting notes, product specs, legal documents, or mixed research material. The MMR strategy instead tries to spend LLM calls on chunks that are both information-dense and semantically diverse, while local fallback keeps the rest of the graph complete.

The local fallback path is rule-based and runs without remote LLM calls. It splits text into candidate claims, extracts entities from uppercase/title-like spans, extracts concepts from frequent non-stopword tokens, looks for relation cues such as because, therefore, however, and for example, and creates compact display labels. These labels are rougher than LLM labels but are fast enough to apply to every chunk.

The main controls are:

SEMANTIC_LLM_FILE_SAMPLE_RATIO, default 0.20, controls the target fraction of file chunks that receive LLM labels
SEMANTIC_LLM_FILE_SAMPLE_MIN, default 8, keeps small files from being under-labeled
SEMANTIC_LLM_FILE_SAMPLE_MAX, default 0, means no maximum cap
SEMANTIC_LLM_SELECTION_STRATEGY=mmr enables salience plus embedding-diversity selection
SEMANTIC_LLM_CONCURRENCY and SEMANTIC_LLM_TIMEOUT_SECONDS bound ingestion latency
RELATION_LINK_TOP_K bounds cross-chunk relation linking work

Evaluation

The LLM-as-a-judge benchmark lives in llm_as_judge, with the downloaded paper set in eval_paper2.

The current benchmark uses 10 long arXiv papers about LLMs. GPT-5.4 generated 50 paper-specific questions and golden-standard answers, five per paper. The systems under test answer the same questions:

papermem: the PaperMem backend with source-aware memory retrieval, lexical + vector recall, adjacent chunk expansion, graph expansion, and GPT-4o mini answer generation
rag_gpt4o_mini: a traditional vector-RAG baseline over the same 10 papers, using GPT-4o mini
bare_gpt4o_mini: GPT-4o mini without retrieval context

GPT-5.4 then judges each answer against the golden answer on:

accuracy: factual match to the golden answer
recall: coverage of required details
faithfulness: whether the answer avoids unsupported or contradictory claims
completeness: whether the answer fully resolves the question
specificity: whether the answer uses paper-specific details rather than generic statements
overall: holistic score from 0 to 5
time_efficiency: latency-normalized utility score

Latest 50-question average metrics:

system	n	accuracy	recall	faithfulness	completeness	specificity	overall	time_efficiency	avg_latency_seconds	avg_context_chars
bare_gpt4o_mini	50	0.04	0.04	4.80	0.04	0.04	1.02	4.989	1.245	0
papermem	50	4.28	4.26	4.00	4.32	4.38	4.24	0.679	9.471	14000
rag_gpt4o_mini	50	4.08	4.18	3.64	4.22	4.28	4.10	2.068	3.391	13997

PaperMem outperforms the traditional RAG baseline on overall answer quality, accuracy, faithfulness, completeness, and specificity. The main tradeoff is latency: PaperMem spends more time assembling explainable memory evidence and graph-expanded context.

To reproduce:

python -m llm_as_judge.run_eval --download --generate-questions
python -m llm_as_judge.run_eval --run-systems --systems papermem,rag,bare --papermem-ingest --judge

Graph Visualization Strategy

PaperMem is built to visualize more data than a naive full-graph rendering can handle.

Semantic Zoom

The graph uses semantic zoom rather than simple geometric scaling:

zoomed-out views show a selective subgraph
node retention is driven by graph importance signals
more nodes appear as the user zooms in
bridge nodes are preserved so communities do not visually disconnect too early

Importance Heuristics

The frontend and backend rely on graph-theoretic signals including:

centrality-style scores
betweenness-like bridge behavior
local connectivity / degree
retrieval relevance

This is inspired by the CSE6242 graph algorithms material and is used to keep the large graph readable.

API Surface

The most important backend capabilities exposed to the app are:

project creation / listing / deletion
chat session creation / activation / deletion
message retrieval for a session
file ingestion
graph retrieval for a project
streaming chat responses

The chat stream returns not only text tokens, but also retrieval metadata that the frontend can attach to the assistant reply, such as:

evidence items
retrieval focus terms
graph overlay data

Local Development

Prerequisites

Python 3.11+ (3.13 recommended; matches production). On macOS, use Homebrew Python—do not create the venv with Xcode’s bundled Python or python3 may not land on your PATH after activate.
Node.js 18+
Docker Desktop

1. Start Databases

docker compose up -d

This starts:

Postgres on 127.0.0.1:5432
Neo4j on 127.0.0.1:7474 and 127.0.0.1:7687

2. Configure Backend

Create the virtualenv with a known interpreter (adjust the path if your Homebrew prefix differs):

cd backend
/opt/homebrew/bin/python3.13 -m venv .venv_local
source .venv_local/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
cp env.example .env

After source .venv_local/bin/activate, use python and pip (not bare python3 from Xcode) so which python resolves to .venv_local/bin/python.

If you ever see python: not found after activating, open a new terminal and run source .venv_local/bin/activate again, or invoke the venv explicitly: ./.venv_local/bin/python -m uvicorn ....

Fill in the required keys in backend/.env:

LLM_API_KEY=
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_MODEL=openai/gpt-4o-mini

EMBEDDING_API_KEY=
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=256

NEO4J_URI=bolt://127.0.0.1:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=papermemneo4j
NEO4J_REQUIRED=false

POSTGRES_HOST=127.0.0.1
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DATABASE=papermem

For Railway or AuraDB Free deployments, keep NEO4J_REQUIRED=false unless the API should refuse to start without Neo4j. AuraDB Free instances can pause and spend time in RESUMING; during that window the Bolt hostname may fail DNS resolution. With NEO4J_REQUIRED=false, PaperMem starts with Postgres/vector retrieval and automatically disables graph calls until the process is restarted against an available Neo4j endpoint.

3. Start Backend

cd backend
source .venv_local/bin/activate
python -m uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload

4. Install Desktop / Frontend Dependencies

npm install

5. Start Electron + Frontend

npm run dev

This script does both:

starts the Vite renderer on 127.0.0.1:5173
waits for the renderer, then launches Electron

Repository Layout

.
├── backend/
├── electron/
├── frontend/
├── package.json
├── package-lock.json
├── README.md
├── docker-compose.yml
└── ...

Desktop installers (DMG / EXE)

You do not need a separate public website for the Electron app: npm run build:renderer produces frontend/dist, which electron-builder packs into the desktop installer. Point the UI at your deployed API with the same URL for both the renderer and the main process:

export VITE_API_BASE_URL="https://YOUR-APP.up.railway.app"
npm run dist:mac    # macOS → release/*.dmg
npm run dist:win    # Windows → release/*.exe (run on Windows or use CI)

scripts/write-electron-api-base.js writes electron/api-base.json from that URL so Electron IPC calls (/extract, file ingest) hit the same host as the React app.

To publish builds on GitHub: push a version tag (v1.0.0). The workflow .github/workflows/release-desktop.yml builds macOS and Windows artifacts and attaches them to a Release. Set variable PAPERMEM_API_BASE_URL on the production environment (repository Settings → Environments → production), or the workflow falls back to a default Railway URL.

Note: Chat and graph work against a remote API. The Electron dropzone uploads file bytes with POST /files/ingest_upload (multipart), so ingestion works against a deployed backend. The JSON-only POST /files/ingest (local path on the API machine) remains useful for server-side debugging or when the API runs on the same machine as the files.

Git Notes

The repository intentionally ignores local/demo-heavy assets such as:

generated virtual environments
local .env files
untracked demo directories like papermem-demo/
large local demo media such as demo.gif
local PDF test assets unless already tracked in git history

Team

Georgia Tech CSE6242 Team, Spring 2026

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
backend		backend
electron		electron
eval_paper2		eval_paper2
frontend		frontend
llm_as_judge		llm_as_judge
scripts		scripts
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

PaperMem

What The System Does

Current Architecture

End-To-End Flow

1. File Ingestion

2. Conversation Ingestion

3. Visualization

Frontend

Main Responsibilities

Important Files

Frontend Graph Model

Electron Shell

Responsibilities

Important Files

Backend

Main Responsibilities

Important Files

Backend Services

Storage Layer

Postgres + pgvector

Neo4j

Memory Pipeline

Ingestion Pipeline

Retrieval Pipeline

Semantic Extraction

Evaluation

Graph Visualization Strategy

Semantic Zoom

Importance Heuristics

API Surface

Local Development

Prerequisites

1. Start Databases

2. Configure Backend

3. Start Backend

4. Install Desktop / Frontend Dependencies

5. Start Electron + Frontend

Repository Layout

Desktop installers (DMG / EXE)

Git Notes

Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages