Skip to content

Commit 467e062

Browse files
claudemarcelsamyn
authored andcommitted
fix: Improve memory storage to prevent incorrect facts and redundancies
This commit addresses several issues with memory extraction and storage: 1. **Distinguish user facts from assistant statements**: Updated the extraction prompt to only extract facts explicitly stated by the user, not speculative statements or assumptions made by the assistant. This prevents the system from storing assistant guesses as if they were user-provided facts. 2. **Enhanced graph cleanup for redundancy removal**: Improved the cleanup prompt with detailed instructions for identifying and removing: - Redundant/duplicate nodes - Speculative or incorrect information - Outdated information - Contradictions (keeping most recent/specific information) 3. **Atlas-based contradiction detection**: Modified the cleanup process to use the User Atlas as the source of truth. When user corrections are made (reflected in the atlas), the nightly cleanup now detects and removes contradicting graph nodes. 4. **Fixed missing import**: Added missing `env` import in routes/dream.ts These changes ensure that: - Only user-stated facts are extracted and stored - The nightly cleanup actively removes redundant and incorrect information - User corrections (via atlas updates) propagate to the graph during cleanup - The memory system maintains higher accuracy and data quality over time
1 parent 9fbd221 commit 467e062

3 files changed

Lines changed: 82 additions & 13 deletions

File tree

src/lib/extract-graph.ts

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -122,17 +122,28 @@ Extract the graph from the following ${sourceType}:
122122
${content}
123123
</${sourceType}>
124124
125+
CRITICAL EXTRACTION RULES - READ CAREFULLY:
126+
127+
When extracting from conversations:
128+
- ONLY extract facts that the USER explicitly stated, confirmed, or provided
129+
- DO NOT extract speculative statements, suggestions, or assumptions made by the assistant
130+
- DO NOT treat assistant's questions as facts (e.g., "Are you working on X?" is NOT a fact that the user is working on X)
131+
- DO NOT extract assistant's interpretations unless the user confirmed them
132+
- If the assistant makes a statement and the user corrects it, ONLY extract the user's correction as fact
133+
- Prioritize user's own statements about themselves, their experiences, preferences, and circumstances
134+
- Be especially cautious with information only mentioned by the assistant - verify if the user confirmed it
135+
125136
Extract, for example, the following elements:
126-
1. People mentioned (real or fictional)
127-
2. Locations discussed
128-
3. Events that occurred or were mentioned
129-
4. Objects or items of significance
130-
5. Emotions expressed or discussed
131-
6. Concepts or ideas explored
132-
7. Media mentioned (books, movies, articles, etc.)
133-
8. Temporal references (dates, times, periods)
134-
9. The assistant's emotions and feelings
135-
10. The assistant's internal insights or discoveries about the user
137+
1. People mentioned by the user (real or fictional)
138+
2. Locations the user discussed or mentioned
139+
3. Events that the user stated occurred or experienced
140+
4. Objects or items the user mentioned as significant
141+
5. Emotions the user expressed or discussed
142+
6. Concepts or ideas the user explored or mentioned
143+
7. Media the user mentioned (books, movies, articles, etc.)
144+
8. Temporal references the user provided (dates, times, periods)
145+
9. The user's preferences, goals, projects, and plans
146+
10. Facts the user stated about other people or entities
136147
137148
For each element, create a node with:
138149
- A unique temporary ID (format: "temp_[type]_[number]", e.g., "temp_person_1") if it's a NEW node.
@@ -142,6 +153,7 @@ For each element, create a node with:
142153
143154
Then, link these nodes with edges.
144155
- Edges are mainly used to represent "facts" about nodes. For example, if you have a Person node and an Event node, you can create an edge from the Person node to the Event node to represent the fact that the person participated in the event.
156+
- ONLY create edges for facts explicitly stated by the user, not assistant assumptions
145157
- Edges are unique by source node, target node, and edge type
146158
- In the edge description for facts, give a succinct description of the fact. Add some minimal context to aid retrieval, but keep it concise.
147159
- Ideally, edges link to already-existing nodes. If the node isn't existing, create it.
@@ -153,11 +165,11 @@ Rules of the graph:
153165
- Omit unnecessary details in node names, eg. "John Doe" instead of "John Doe (person)"
154166
- Nodes are independent of context and represent a *single* thing. Bad example: "John - the person taking a walk". Good example: "John" (Person node, no description) linked to [PARTICIPATED_IN] "John's walk on 2025-05-18" (Event node), linked to [OCCURRED_ON] "2025-05-18" (Temporal node).
155167
- Don't create nodes for things that should be represented by edges.
156-
168+
- Avoid redundant or duplicate information - if a fact is already represented, don't create another node or edge for it
157169
158170
Then create edges between these nodes to represent their relationships using the appropriate edge types.
159171
160-
Focus on extracting the most significant and meaningful information. Quality is more important than quantity.`;
172+
Focus on extracting the most significant and meaningful information that the USER provided. Quality and accuracy are more important than quantity.`;
161173

162174
const completion = await client.beta.chat.completions.parse({
163175
messages: [{ role: "user", content: prompt }],

src/lib/jobs/cleanup-graph.ts

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,12 @@ async function proposeGraphCleanup(
369369
modelId: string,
370370
): Promise<CleanupProposal> {
371371
const client = await createCompletionClient(userId);
372+
373+
// Fetch user atlas for context about user corrections and current state
374+
const { getAtlas } = await import("../atlas");
375+
const db = await useDatabase();
376+
const { description: userAtlas } = await getAtlas(db, userId);
377+
372378
const nodesList = temp.nodes
373379
.map(
374380
(n) =>
@@ -381,9 +387,59 @@ async function proposeGraphCleanup(
381387
`<edge source="${e.sourceTemp}" target="${e.targetTemp}" type="${e.type}">${e.description}</edge>`,
382388
)
383389
.join("\n");
384-
const prompt = `You are a graph cleaning assistant. Given this subgraph, propose merges (pairs of temp IDs to merge), deletes (temp IDs to remove), additions (new edges, each with a concise description of its meaning), and any new nodes. Delete any unclear or non-useful nodes entirely and ensure all edges linked to a deleted node are removed.
390+
const prompt = `You are a graph cleaning assistant. Your task is to analyze this subgraph and propose improvements to ensure accuracy, remove redundancies, and maintain data quality.
391+
${
392+
userAtlas
393+
? `
394+
**User Atlas Context:**
395+
The following is the current User Atlas, which represents the most up-to-date factual information about the user. Use this to identify any nodes in the graph that contradict or are outdated compared to the atlas.
396+
397+
<user_atlas>
398+
${userAtlas}
399+
</user_atlas>
400+
`
401+
: ""
402+
}
403+
404+
**Critical Cleaning Rules:**
405+
406+
1. **Check Against User Atlas**: If a User Atlas is provided above, use it as the source of truth:
407+
- Delete any nodes that contradict information in the atlas
408+
- Remove nodes that represent outdated information superseded by the atlas
409+
- The atlas reflects user corrections and the most current factual information
410+
411+
2. **Remove Redundant Nodes**: Identify and merge nodes that represent the same entity or concept. Look for:
412+
- Duplicate entities with slightly different labels (e.g., "John Smith" and "John")
413+
- Multiple nodes describing the same event or concept
414+
- Nodes that could be consolidated without losing information
415+
416+
3. **Delete Incorrect or Speculative Information**: Remove nodes that represent:
417+
- Speculative or assumed information (not explicitly stated facts)
418+
- Outdated information that has been superseded or contradicted
419+
- Unclear or non-useful nodes with vague descriptions
420+
- Nodes that don't represent factual information about the user
421+
422+
4. **Identify Contradictions**: When you find contradicting information:
423+
- Keep the most recent or most specific information
424+
- Delete older or vaguer contradicting nodes
425+
- Prefer user-stated facts over inferred information
426+
- Always prefer atlas information over graph nodes when they conflict
427+
428+
5. **Improve Connections**: Add missing edges between related nodes that should be connected
429+
430+
6. **Remove Redundant Edges**: Don't create edges that duplicate existing relationships
431+
432+
**Your Response Should Include:**
433+
- **merges**: Pairs of temp IDs where nodes are duplicates (remove will be merged into keep)
434+
- **deletes**: Temp IDs of nodes to completely remove (speculative, outdated, or unclear)
435+
- **additions**: New edges to add (with concise, factual descriptions)
436+
- **newNodes**: Any new nodes needed (only if genuinely missing and factual)
437+
438+
**Important**: Be aggressive about removing redundant and speculative information. Quality and accuracy are more important than quantity.
439+
385440
Nodes:
386441
${nodesList}
442+
387443
Edges:
388444
${edgesList}
389445
`;

src/routes/dream.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ import { defineEventHandler, readBody } from "h3";
33
import { CleanupGraphParams } from "~/lib/jobs/cleanup-graph";
44
import { batchQueue, DreamJobData, flowProducer } from "~/lib/queues";
55
import { dreamRequestSchema, dreamResponseSchema } from "~/lib/schemas/dream";
6+
import { env } from "~/utils/env";
67

78
export default defineEventHandler(async (event) => {
89
const { userId, assistantId, assistantDescription } =

0 commit comments

Comments
 (0)