PaperMem is a local-first memory system and explainable dashboard for AI agents and chatbots, built for Georgia Tech CSE6242 Data and Visual Analytics. It combines an Electron desktop shell, a React graph-and-chat interface, a FastAPI memory backend, Postgres + pgvector for retrieval, and Neo4j for logical relation graphs.
The current implementation is optimized for two memory sources:
- conversation history generated inside the app
- uploaded documents such as PDF, Markdown, and plain text files
The system is designed to make retrieval visible. Every chat response is backed by retrieved memory units, graph expansion, and evidence cards that the user can inspect directly in the UI.
- stores chat turns and uploaded file chunks as unified memory units
- embeds all memory units into
pgvectorfor semantic retrieval - extracts lightweight semantic structure with an LLM-first pipeline plus local fallback
- builds a logical knowledge graph in Neo4j using claims, concepts, entities, and relations
- streams answers from the backend to the chat UI
- renders a large interactive memory graph with semantic zoom and retrieval-driven overlays
- supports a desktop quick-capture workflow through Electron
User
|
v
Electron desktop shell
|- main window: PaperMem dashboard
|- floating upload capsule
|- desktop integrations / IPC bridge
|
v
React frontend (Vite + Tailwind + vis-network)
|- project sidebar
|- chat sessions and streaming messages
|- evidence / focus display
|- semantic zoom graph
|
v
FastAPI backend
|- project + chat session APIs
|- file ingestion APIs
|- chat streaming API
|- graph retrieval APIs
|
+--> Postgres + pgvector
| |- projects
| |- chat sessions / messages
| |- source files
| |- memory units + embeddings
| |- retrieval / reasoning records
|
+--> Neo4j
|- semantic nodes
|- logical relations
|- overlay graph for retrieval visualization
- A user uploads a file from the Electron capsule or desktop UI.
- The backend parses the file locally.
- The file is chunked into memory-sized segments.
- Each chunk is embedded and stored in
Postgres. - The semantic extraction service produces claims, concepts, entities, and display labels.
- The graph service writes semantic nodes and logical edges into
Neo4j. - The Electron shell sends an event back to the renderer so the graph can refresh automatically.
- A user sends a chat message in the dashboard.
- The backend embeds the query and retrieves nearby memory units from
pgvector. - The backend expands the retrieved set using logical graph relations in
Neo4j. - The backend streams an answer from the configured LLM.
- The user turn and assistant turn are persisted as memory units.
- The conversation is folded back into the same memory + graph pipeline used for files.
- The frontend requests a graph snapshot for the active project.
- The backend computes graph scores and returns graph nodes and edges.
- The frontend applies client-side semantic zoom.
- Important nodes remain visible first, while lower-priority nodes appear as the user zooms in.
- Retrieval-related nodes and evidence appear in the chat area for the corresponding assistant response.
The frontend lives in frontend and is a single-page React application served by Vite.
- render the split layout: graph area + chat area
- manage project selection and chat sessions
- stream assistant responses in real time
- display evidence and retrieval focus for each assistant message
- render the graph with
vis-network - apply semantic zoom behavior client-side
- refresh the graph after uploads or manual user action
frontend/src/App.jsx- main application state
- chat message rendering
- evidence / focus UI
- project and session management
- graph refresh and semantic zoom logic
frontend/src/index.css- global layout rules
- Tailwind entrypoint
- shared utility styles such as hidden scrollbars
frontend/src/main.jsx- app bootstrapping and error boundary mounting
The graph is not rendered as a raw dump of every memory unit. Instead, the frontend applies a layered view:
- high-importance nodes stay visible at low zoom
- bridge / gatekeeper nodes are preserved to keep communities connected
- node size reflects local importance rather than all nodes shrinking uniformly
- the graph can be refreshed after new ingestion or further conversation
The UI also distinguishes between:
- the global project memory map
- retrieval-driven overlays relevant to the current conversation
The Electron shell lives in electron.
- launch the desktop window that hosts the PaperMem dashboard
- expose a preload bridge so the renderer can call trusted native features
- manage the floating upload capsule
- notify the frontend when file ingestion completes
- keep the desktop experience available outside a plain browser workflow
electron/main.js- window lifecycle
- IPC handlers
- upload completion notifications
electron/preload.js- safe bridge exposed on
window.paperMem
- safe bridge exposed on
electron/dropzone.html- floating upload capsule UI
The backend lives in backend and is a FastAPI service that owns memory ingestion, retrieval, graph updates, and chat streaming.
- manage projects, sessions, messages, and files
- parse uploaded files locally
- chunk and embed text
- persist memory units into
Postgres - build and query logical graphs in
Neo4j - retrieve evidence for chat queries
- stream assistant responses and persist retrieval metadata
backend/app/main.py- FastAPI application
- API routes for projects, sessions, files, graphs, and chat
backend/app/config.py- central settings model
- all required environment configuration
backend/app/database.py- SQLAlchemy engine/session setup
pgvectorextension initialization
backend/app/models.py- SQLAlchemy models for messages, memory units, files, and retrieval events
backend/app/schemas.py- request / response schema definitions
backend/app/openrouter_client.py- LLM client wrapper
backend/app/reasoner_agent.py- answer generation layer
backend/app/services/file_parser.py- parses PDF / Markdown / TXT locally
- sanitizes text for storage
backend/app/services/embedding_service.py- generates embeddings
- normalizes vector dimensions to the configured size
backend/app/services/semantic_service.py- extracts claims, concepts, entities, and display labels
- uses an LLM-first approach with local fallback
backend/app/services/graph_service.py- upserts semantic nodes and logical relations in Neo4j
- builds graph payloads for the frontend
backend/app/services/memory_service.py- orchestrates ingestion, persistence, retrieval, evidence creation, and trace metadata
Postgres is the system of record for structured application data and semantic retrieval.
It stores:
- projects
- chat sessions
- chat messages
- source file records
- memory units
- embeddings
- retrieval events
- reasoning trace metadata
The key design choice is to unify file chunks and conversation turns under the same MemoryUnit abstraction, so the same retrieval and graph-building pipeline can operate on both.
Neo4j stores the semantic graph used for explainable reasoning and large-scale graph visualization.
The graph contains:
- semantic nodes such as claims, concepts, and entities
- links from memory units to semantic nodes
- logical relations such as support, causality, contradiction, elaboration, and association
- retrieval overlays used to visualize the current reasoning context
This graph is intentionally more logical than provenance-heavy so that the UI feels closer to a mind map than a file tree.
The current memory pipeline is intentionally lighter than the original GauzRag pipeline.
- parse or receive text
- sanitize text
- chunk text
- embed chunks
- select a deterministic subset of chunks for LLM semantic labeling
- extract semantic bundles with either LLM labels or local fallback labels
- store memory units in
Postgres - write semantic nodes and relations to
Neo4j
- embed query
- optionally scope retrieval to a selected source file
- combine lexical exact-match recall with semantic nearest-neighbor search in
pgvector - expand through adjacent source chunks and graph relations
- collect evidence anchors under a context budget
- stream grounded answer generation
- persist retrieval metadata and reasoning records
This keeps the system generalizable and fast enough for interactive use, while still surfacing semantic structure for visualization.
The semantic extraction stage produces:
claimsconceptsentities- relation candidates
display_labelvalues for graph-friendly rendering
PaperMem uses a hybrid semantic-labeling strategy so ingestion remains practical for long papers, reports, notes, contracts, and deployed environments.
Every uploaded file chunk becomes a MemoryUnit, receives an embedding, is stored in Postgres, and is written into the Neo4j graph. The system does not drop unsampled chunks from graph retrieval. Instead, chunks differ only in how their semantic labels are produced:
- roughly one fifth of file chunks receive higher-quality LLM semantic labels
- the remaining chunks receive fast local fallback labels
- all chunks still participate in vector retrieval, graph rendering, graph expansion, and evidence selection
The LLM-labeled subset is selected with deterministic MMR-style sampling rather than pure randomness or a paper-specific section heuristic. PaperMem first keeps the first and last chunk, then scores candidate chunks using local salience signals such as length, numeric values, acronyms, named entities, heading-like text, and words such as summary, result, method, limitation, risk, requirement, decision, experiment, and evaluation. It then uses chunk embeddings to prefer candidates that cover different topics from chunks already selected. A small position-spread term prevents the selected chunks from collapsing into one nearby part of the document when embeddings are similar.
This is useful because user uploads are not always papers. A position-only strategy assumes structures like intro, method, results, and conclusion, which can fail for meeting notes, product specs, legal documents, or mixed research material. The MMR strategy instead tries to spend LLM calls on chunks that are both information-dense and semantically diverse, while local fallback keeps the rest of the graph complete.
The local fallback path is rule-based and runs without remote LLM calls. It splits text into candidate claims, extracts entities from uppercase/title-like spans, extracts concepts from frequent non-stopword tokens, looks for relation cues such as because, therefore, however, and for example, and creates compact display labels. These labels are rougher than LLM labels but are fast enough to apply to every chunk.
The main controls are:
SEMANTIC_LLM_FILE_SAMPLE_RATIO, default0.20, controls the target fraction of file chunks that receive LLM labelsSEMANTIC_LLM_FILE_SAMPLE_MIN, default8, keeps small files from being under-labeledSEMANTIC_LLM_FILE_SAMPLE_MAX, default0, means no maximum capSEMANTIC_LLM_SELECTION_STRATEGY=mmrenables salience plus embedding-diversity selectionSEMANTIC_LLM_CONCURRENCYandSEMANTIC_LLM_TIMEOUT_SECONDSbound ingestion latencyRELATION_LINK_TOP_Kbounds cross-chunk relation linking work
The LLM-as-a-judge benchmark lives in llm_as_judge, with the downloaded paper set in eval_paper2.
The current benchmark uses 10 long arXiv papers about LLMs. GPT-5.4 generated 50 paper-specific questions and golden-standard answers, five per paper. The systems under test answer the same questions:
papermem: the PaperMem backend with source-aware memory retrieval, lexical + vector recall, adjacent chunk expansion, graph expansion, and GPT-4o mini answer generationrag_gpt4o_mini: a traditional vector-RAG baseline over the same 10 papers, using GPT-4o minibare_gpt4o_mini: GPT-4o mini without retrieval context
GPT-5.4 then judges each answer against the golden answer on:
accuracy: factual match to the golden answerrecall: coverage of required detailsfaithfulness: whether the answer avoids unsupported or contradictory claimscompleteness: whether the answer fully resolves the questionspecificity: whether the answer uses paper-specific details rather than generic statementsoverall: holistic score from 0 to 5time_efficiency: latency-normalized utility score
Latest 50-question average metrics:
| system | n | accuracy | recall | faithfulness | completeness | specificity | overall | time_efficiency | avg_latency_seconds | avg_context_chars |
|---|---|---|---|---|---|---|---|---|---|---|
| bare_gpt4o_mini | 50 | 0.04 | 0.04 | 4.80 | 0.04 | 0.04 | 1.02 | 4.989 | 1.245 | 0 |
| papermem | 50 | 4.28 | 4.26 | 4.00 | 4.32 | 4.38 | 4.24 | 0.679 | 9.471 | 14000 |
| rag_gpt4o_mini | 50 | 4.08 | 4.18 | 3.64 | 4.22 | 4.28 | 4.10 | 2.068 | 3.391 | 13997 |
PaperMem outperforms the traditional RAG baseline on overall answer quality, accuracy, faithfulness, completeness, and specificity. The main tradeoff is latency: PaperMem spends more time assembling explainable memory evidence and graph-expanded context.
To reproduce:
python -m llm_as_judge.run_eval --download --generate-questions
python -m llm_as_judge.run_eval --run-systems --systems papermem,rag,bare --papermem-ingest --judgePaperMem is built to visualize more data than a naive full-graph rendering can handle.
The graph uses semantic zoom rather than simple geometric scaling:
- zoomed-out views show a selective subgraph
- node retention is driven by graph importance signals
- more nodes appear as the user zooms in
- bridge nodes are preserved so communities do not visually disconnect too early
The frontend and backend rely on graph-theoretic signals including:
- centrality-style scores
- betweenness-like bridge behavior
- local connectivity / degree
- retrieval relevance
This is inspired by the CSE6242 graph algorithms material and is used to keep the large graph readable.
The most important backend capabilities exposed to the app are:
- project creation / listing / deletion
- chat session creation / activation / deletion
- message retrieval for a session
- file ingestion
- graph retrieval for a project
- streaming chat responses
The chat stream returns not only text tokens, but also retrieval metadata that the frontend can attach to the assistant reply, such as:
- evidence items
- retrieval focus terms
- graph overlay data
- Python 3.11+ (3.13 recommended; matches production). On macOS, use Homebrew Python—do not create the venv with Xcode’s bundled Python or
python3may not land on yourPATHafteractivate. - Node.js 18+
- Docker Desktop
docker compose up -dThis starts:
Postgreson127.0.0.1:5432Neo4jon127.0.0.1:7474and127.0.0.1:7687
Create the virtualenv with a known interpreter (adjust the path if your Homebrew prefix differs):
cd backend
/opt/homebrew/bin/python3.13 -m venv .venv_local
source .venv_local/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
cp env.example .envAfter source .venv_local/bin/activate, use python and pip (not bare python3 from Xcode) so which python resolves to .venv_local/bin/python.
If you ever see python: not found after activating, open a new terminal and run source .venv_local/bin/activate again, or invoke the venv explicitly: ./.venv_local/bin/python -m uvicorn ....
Fill in the required keys in backend/.env:
LLM_API_KEY=
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_MODEL=openai/gpt-4o-mini
EMBEDDING_API_KEY=
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=256
NEO4J_URI=bolt://127.0.0.1:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=papermemneo4j
NEO4J_REQUIRED=false
POSTGRES_HOST=127.0.0.1
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DATABASE=papermemFor Railway or AuraDB Free deployments, keep NEO4J_REQUIRED=false unless the API should refuse to start without Neo4j. AuraDB Free instances can pause and spend time in RESUMING; during that window the Bolt hostname may fail DNS resolution. With NEO4J_REQUIRED=false, PaperMem starts with Postgres/vector retrieval and automatically disables graph calls until the process is restarted against an available Neo4j endpoint.
cd backend
source .venv_local/bin/activate
python -m uvicorn app.main:app --host 127.0.0.1 --port 8000 --reloadnpm installnpm run devThis script does both:
- starts the Vite renderer on
127.0.0.1:5173 - waits for the renderer, then launches Electron
.
├── backend/
├── electron/
├── frontend/
├── package.json
├── package-lock.json
├── README.md
├── docker-compose.yml
└── ...
You do not need a separate public website for the Electron app: npm run build:renderer produces frontend/dist, which electron-builder packs into the desktop installer. Point the UI at your deployed API with the same URL for both the renderer and the main process:
export VITE_API_BASE_URL="https://YOUR-APP.up.railway.app"
npm run dist:mac # macOS → release/*.dmg
npm run dist:win # Windows → release/*.exe (run on Windows or use CI)scripts/write-electron-api-base.js writes electron/api-base.json from that URL so Electron IPC calls (/extract, file ingest) hit the same host as the React app.
To publish builds on GitHub: push a version tag (v1.0.0). The workflow .github/workflows/release-desktop.yml builds macOS and Windows artifacts and attaches them to a Release. Set variable PAPERMEM_API_BASE_URL on the production environment (repository Settings → Environments → production), or the workflow falls back to a default Railway URL.
Note: Chat and graph work against a remote API. The Electron dropzone uploads file bytes with POST /files/ingest_upload (multipart), so ingestion works against a deployed backend. The JSON-only POST /files/ingest (local path on the API machine) remains useful for server-side debugging or when the API runs on the same machine as the files.
The repository intentionally ignores local/demo-heavy assets such as:
- generated virtual environments
- local
.envfiles - untracked demo directories like
papermem-demo/ - large local demo media such as
demo.gif - local PDF test assets unless already tracked in git history
Georgia Tech CSE6242 Team, Spring 2026