From 57253d0f02144489f44dc0b172fc678f339d580b Mon Sep 17 00:00:00 2001 From: Lobsterdog Contributors Date: Thu, 21 May 2026 10:46:51 -0600 Subject: [PATCH 1/2] =?UTF-8?q?fix:=20remove=20introspection=20framework?= =?UTF-8?q?=20=E2=80=94=20failed=20addition?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Remove the entire introspection/self_assessment system: - Deleted internal/introspect/ and internal/skillpatch/ packages - Removed introspect, learn, track-review CLI commands - Removed self_assessment memory type from store, retriever, migrations - Stripped dream REM behavioral insight generation (themes only now) - Removed SkillPatchDreamHook, ReviewOutcomeTracker, TestOutcomeTracker - Removed proposed-changes.md support, skill patch validation - Cleaned all documentation, skills, and plugin references The introspection loop never closed: patterns were detected but never changed behavior. Without calibration data and a promotion pipeline, the system was sophisticated self-documentation, not self-improvement. --- README.md | 17 +- cmd/llmem/main.go | 334 ----- cmd/llmem/main_test.go | 34 - docs/API.md | 176 +-- docs/CLI.md | 88 +- docs/CONFIGURATION.md | 12 - docs/DREAM.md | 121 +- docs/INSTALLATION.md | 2 +- docs/INTEGRATIONS.md | 22 +- docs/RERANKING.md | 5 +- internal/config/config.go | 199 +-- internal/config/config_test.go | 8 +- internal/dream/dream.go | 444 +------ internal/dream/dream_test.go | 1169 +---------------- internal/extract/extract.go | 13 +- internal/introspect/introspect.go | 387 ------ internal/introspect/introspect_test.go | 660 ---------- internal/ollama/ollama.go | 2 +- internal/retriever/retriever.go | 14 +- internal/skillpatch/skillpatch.go | 410 ------ internal/skillpatch/skillpatch_test.go | 567 -------- internal/store/migration_test.go | 6 +- internal/store/models.go | 1 - internal/store/store_test.go | 2 +- internal/taxonomy/taxonomy.go | 117 +- internal/taxonomy/taxonomy_test.go | 150 +-- migrations/003_register_default_types.sql | 3 +- plugins/agent/hooks/hooks.json | 2 +- .../introspection-review-tracker/SKILL.md | 114 -- plugins/agent/skills/introspection/SKILL.md | 159 --- plugins/agent/skills/llmem-setup/SKILL.md | 3 +- plugins/agent/skills/llmem/SKILL.md | 69 +- plugins/opencode/llmem.js | 24 - skills/introspection-review-tracker/SKILL.md | 114 -- skills/introspection/SKILL.md | 159 --- skills/llmem-setup/SKILL.md | 20 +- skills/llmem/SKILL.md | 69 +- 37 files changed, 219 insertions(+), 5477 deletions(-) delete mode 100644 cmd/llmem/main_test.go delete mode 100644 internal/introspect/introspect.go delete mode 100644 internal/introspect/introspect_test.go delete mode 100644 internal/skillpatch/skillpatch.go delete mode 100644 internal/skillpatch/skillpatch_test.go delete mode 100644 plugins/agent/skills/introspection-review-tracker/SKILL.md delete mode 100644 plugins/agent/skills/introspection/SKILL.md delete mode 100644 skills/introspection-review-tracker/SKILL.md delete mode 100644 skills/introspection/SKILL.md diff --git a/README.md b/README.md index bfe491c..1f5fc85 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and coding conventi | [Providers](docs/PROVIDERS.md) | Embedding/generation providers, fallback chains, configuration | | [CLI Reference](docs/CLI.md) | All `llmem` commands and options | | [Python API](docs/API.md) | MemoryStore, Retriever, extension points, database schema, module reference | -| [Go API](docs/API.md#go-api) | Go packages — store, config, dream, extract, introspect, ollama, paths, session, systemd, taxonomy, urlvalidate | +| [Go API](docs/API.md#go-api) | Go packages — store, config, dream, extract, ollama, paths, session, systemd, taxonomy, urlvalidate | | [Integrations](docs/INTEGRATIONS.md) | OpenCode, Copilot CLI, custom tools, session hooks | | [Configuration](docs/CONFIGURATION.md) | config.yaml reference, path resolution, dream settings | | [Search Reranking](docs/RERANKING.md) | Multi-signal reranking, signal weights, type priority | @@ -81,7 +81,7 @@ See below for per-platform setup details. ### Go (memory store library) -The Go implementation provides the core memory store as a pure-Go library with no CGo dependency, plus a full CLI, dream cycle, session hooks, introspection, and extraction: +The Go implementation provides the core memory store as a pure-Go library with no CGo dependency, plus a full CLI, dream cycle, session hooks, and extraction: ```bash go get github.com/MichielDean/LLMem @@ -146,10 +146,8 @@ LLMem ships four skills focused on memory management. They load on-demand via th | Skill | Description | |-------|-------------| -| **llmem** | Manage LLMem memories — add, search, consolidate, dream, introspect, and track review outcomes. | +| **llmem** | Manage LLMem memories — add, search, consolidate, and dream. | | **llmem-setup** | Install and configure LLMem — plugin deployment, provider setup, skill registration. | -| **introspection** | Operational reference for the introspection framework — self-assessment, sampajanna checks, error taxonomy. | -| **introspection-review-tracker** | Reference for the automated ReviewOutcomeTracker hook that persists review findings as self_assessment memories. | ## Templates @@ -187,7 +185,7 @@ llmem init llmem stats # Verify skills are deployed -ls ~/.agents/skills/llmem ~/.agents/skills/introspection +ls ~/.agents/skills/llmem # Verify plugin deployed (OpenCode) ls ~/.config/opencode/plugins/llmem.js @@ -216,11 +214,6 @@ llmem search "testing" llmem search "testing" --type fact --limit 5 --json llmem search "testing" --include-code --json -# Index a codebase -llmem learn ./src -llmem learn ./src --strategy fixed --window-size 30 --overlap 5 -llmem learn ./src --no-embed - # List all memories llmem list llmem list --type decision --all @@ -269,7 +262,7 @@ go test ./... 1349 Python tests and 142 JavaScript tests covering all providers, session adapters (OpenCode, Copilot, none), URL validation, configuration, security, session hooks, CLI commands, and edge cases. -Go tests covering store operations, FTS5 search, vector search, hybrid retrieval, embedding engine, metrics, URL validation, migrations, type validation, import/export, config, dream cycle, extraction, introspection, session hooks, path validation, systemd unit generation, and taxonomy. +Go tests covering store operations, FTS5 search, vector search, hybrid retrieval, embedding engine, metrics, URL validation, migrations, type validation, import/export, config, dream cycle, extraction, session hooks, path validation, systemd unit generation, and taxonomy. ## Makefile diff --git a/cmd/llmem/main.go b/cmd/llmem/main.go index b6291ef..6f79763 100644 --- a/cmd/llmem/main.go +++ b/cmd/llmem/main.go @@ -4,22 +4,17 @@ import ( "context" "encoding/json" "fmt" - "io" "log/slog" "os" "path/filepath" - "strings" "time" "github.com/MichielDean/LLMem/internal/config" "github.com/MichielDean/LLMem/internal/dream" "github.com/MichielDean/LLMem/internal/embed" "github.com/MichielDean/LLMem/internal/extract" - "github.com/MichielDean/LLMem/internal/introspect" - "github.com/MichielDean/LLMem/internal/ollama" "github.com/MichielDean/LLMem/internal/paths" "github.com/MichielDean/LLMem/internal/store" - "github.com/MichielDean/LLMem/internal/taxonomy" "github.com/spf13/cobra" ) @@ -52,9 +47,6 @@ func main() { initCmd(), metricsCmd(), dreamCmd(), - introspectCmd(), - learnCmd(), - trackReviewCmd(), backfillEmbeddingsCmd(), ) @@ -119,18 +111,6 @@ func openEmbeddingEngine() *embed.EmbeddingEngine { return engine } -// openOllamaClient creates an OllamaClient for session hook introspection. -// Returns nil on failure — the coordinator gracefully handles a nil client -// by falling back to degraded introspection in OnEnding (plain-text summary, no LLM). -func openOllamaClient() *ollama.OllamaClient { - client, err := ollama.NewOllamaClient(ollama.OllamaClientConfig{}) - if err != nil { - slog.Debug("llmem: failed to create Ollama client, falling back to degraded introspection", "error", err) - return nil - } - return client -} - func addCmd() *cobra.Command { var ( typeVal string @@ -709,13 +689,6 @@ func dreamCmd() *cobra.Command { dreamerCfg := cfg.DreamerConfig() dreamerCfg.Store = ms - // Wire SkillPatcher for direct skill patching after dream - sp, spErr := cfg.NewSkillPatcher(ms) - if spErr != nil { - slog.Warn("llmem: dream: could not create skill patcher, skipping patch validation", "error", spErr) - } - dreamerCfg.SkillPatcher = sp - d, err := dream.NewDreamer(dreamerCfg) if err != nil { return err @@ -759,274 +732,6 @@ func dreamCmd() *cobra.Command { return cmd } -func introspectCmd() *cobra.Command { - var ( - noLLM bool - timeoutVal string - ) - cmd := &cobra.Command{ - Use: "introspect [description]", - Short: "Analyze a failure and store a self_assessment memory", - Long: "Analyze a failure and store a self_assessment memory.\n" + - "The description is a free-form summary of what went wrong.\n" + - "The LLM infers category, context, and proposed fix from your description.\n" + - "When the LLM is unavailable, the description is stored directly.", - Args: cobra.ExactArgs(1), - RunE: func(cmd *cobra.Command, args []string) error { - whatHappened := args[0] - - var timeout time.Duration - if timeoutVal != "" { - parsed, err := time.ParseDuration(timeoutVal) - if err != nil { - return fmt.Errorf("llmem: introspect: invalid --timeout duration %q: %w", timeoutVal, err) - } - timeout = parsed - } else { - cfg, cfgErr := loadConfig() - if cfgErr != nil { - slog.Debug("llmem: introspect: could not load config for timeout default, using 5m", "error", cfgErr) - } else { - timeout = cfg.CallModelTimeoutDuration() - } - } - - ms, err := openStore() - if err != nil { - return err - } - defer ms.Close() - - result, err := introspect.IntrospectFailure(context.Background(), ms, introspect.IntrospectFailureParams{ - WhatHappened: whatHappened, - NoLLM: noLLM, - Timeout: timeout, - }) - if err != nil { - return err - } - - fmt.Printf("Stored self_assessment: %s\n", result.MemoryID) - - if result.ProposedUpdate != "" && result.Category != "" { - patchSkillAfterIntrospect(ms, result.Category, result.ProposedUpdate) - } - - if noLLM { - fmt.Fprintln(os.Stderr, "WARNING: LLM enrichment disabled (--no-llm flag)") - return nil - } - - if result.LLMStatus == introspect.Skipped { - fmt.Fprintln(os.Stderr, "WARNING: LLM enrichment skipped — stored raw fields (Ollama unavailable)") - } - - return nil - }, - } - cmd.Flags().BoolVar(&noLLM, "no-llm", false, "Skip LLM enrichment, store raw fields only (exit code 0)") - cmd.Flags().StringVar(&timeoutVal, "timeout", "", "LLM call timeout (e.g. \"120s\", \"2m\"). Must be >= 10s.") - return cmd -} - -func learnCmd() *cobra.Command { - var ( - noLLM bool - timeoutVal string - ) - cmd := &cobra.Command{ - Use: "learn [wrong] --right [correct]", - Short: "Learn a lesson from a wrong→right correction (alias for introspect)", - Long: "Learn a lesson from a wrong→right correction. " + - "Delegates to introspect with a combined description. " + - "Consider using 'introspect' directly — it handles both analysis and correction.", - Args: cobra.ExactArgs(1), - RunE: func(cmd *cobra.Command, args []string) error { - wrongVal := args[0] - - var timeout time.Duration - if timeoutVal != "" { - parsed, err := time.ParseDuration(timeoutVal) - if err != nil { - return fmt.Errorf("llmem: learn: invalid --timeout duration %q: %w", timeoutVal, err) - } - timeout = parsed - } else { - cfg, cfgErr := loadConfig() - if cfgErr != nil { - slog.Debug("llmem: learn: could not load config for timeout default, using 5m", "error", cfgErr) - } else { - timeout = cfg.CallModelTimeoutDuration() - } - } - - ms, err := openStore() - if err != nil { - return err - } - defer ms.Close() - - rightVal, _ := cmd.Flags().GetString("right") - contextVal, _ := cmd.Flags().GetString("context") - - // Build a combined description for introspect - description := "WRONG: " + wrongVal - if rightVal != "" { - description += "\nRIGHT: " + rightVal - } - if contextVal != "" { - description += "\nContext: " + contextVal - } - - result, err := introspect.IntrospectFailure(context.Background(), ms, introspect.IntrospectFailureParams{ - WhatHappened: description, - NoLLM: noLLM, - Timeout: timeout, - }) - if err != nil { - return err - } - - fmt.Printf("Stored self_assessment: %s\n", result.MemoryID) - - // Patch skill file if ProposedUpdate and Category are available - if result.ProposedUpdate != "" && result.Category != "" { - patchSkillAfterIntrospect(ms, result.Category, result.ProposedUpdate) - } - - if noLLM { - fmt.Fprintln(os.Stderr, "WARNING: LLM enrichment disabled (--no-llm flag)") - return nil - } - - if result.LLMStatus == introspect.Skipped { - fmt.Fprintln(os.Stderr, "WARNING: LLM enrichment skipped — stored raw fields (Ollama unavailable)") - ms.Close() - os.Exit(2) - } - - return nil - }, - } - cmd.Flags().String("right", "", "What is correct (the fix)") - cmd.Flags().String("context", "", "Context") - cmd.Flags().BoolVar(&noLLM, "no-llm", false, "Skip LLM enrichment, store raw fields only (exit code 0)") - cmd.Flags().StringVar(&timeoutVal, "timeout", "", "LLM call timeout (e.g. \"120s\", \"2m\"). Must be >= 10s.") - return cmd -} - -func trackReviewCmd() *cobra.Command { - var ( - singleVal bool - batchVal bool - cleanVal bool - findingsVal string - ) - cmd := &cobra.Command{ - Use: "track-review", - Short: "Persist code review findings as self_assessment memories", - RunE: func(cmd *cobra.Command, args []string) error { - ms, err := openStore() - if err != nil { - return err - } - defer ms.Close() - - if cleanVal { - // Invalidate all self_assessment memories with source="track-review" - // This is a bulk invalidation — fetch and invalidate each - memories, err := ms.Search(context.Background(), store.SearchParams{ - Type: "self_assessment", - ValidOnly: true, - Limit: 10000, - }) - if err != nil { - return err - } - count := 0 - for _, m := range memories { - if m.Source == "track-review" { - ok, err := ms.Invalidate(context.Background(), m.ID, "track-review clean") - if err != nil { - return fmt.Errorf("llmem: track-review: invalidate %s: %w", m.ID, err) - } - if ok { - count++ - } - } - } - fmt.Printf("Invalidated %d track-review memories\n", count) - } - - if singleVal || batchVal { - var input []byte - if findingsVal != "" { - resolvedFindings, rerr := filepath.Abs(findingsVal) - if rerr != nil { - return fmt.Errorf("llmem: track-review: resolve findings path: %w", rerr) - } - if paths.IsBlockedPath(resolvedFindings) { - return fmt.Errorf("llmem: track-review: findings path targets a blocked system directory: %s", resolvedFindings) - } - input, err = os.ReadFile(resolvedFindings) - if err != nil { - return fmt.Errorf("llmem: track-review: read findings: %w", err) - } - } else { - stat, _ := os.Stdin.Stat() - if stat.Mode()&os.ModeCharDevice != 0 { - return fmt.Errorf("llmem: track-review: provide --findings or pipe input") - } - input, err = io.ReadAll(os.Stdin) - if err != nil { - return fmt.Errorf("llmem: track-review: read stdin: %w", err) - } - } - - lines := strings.Split(strings.TrimSpace(string(input)), "\n") - count := 0 - for _, line := range lines { - line = strings.TrimSpace(line) - if line == "" { - continue - } - // Parse "Category: value" lines - parsed := taxonomy.ParseSelfAssessment(line) - category := parsed["Category"] - if category == "" { - category = "REVIEW_PASSED" - } - if _, ok := taxonomy.ErrorTaxonomy[category]; !ok { - slog.Warn("llmem: track-review: unknown category, proceeding anyway", "category", category) - } - - id, err := ms.Add(context.Background(), store.AddParams{ - Type: "self_assessment", - Content: line, - Source: "track-review", - Confidence: 0.9, - Metadata: map[string]any{"category": category, "source": "track-review"}, - }) - if err != nil { - slog.Warn("llmem: track-review: failed to store finding", "error", err) - continue - } - count++ - _ = id - } - fmt.Printf("Stored %d track-review findings\n", count) - } - - return nil - }, - } - cmd.Flags().BoolVar(&singleVal, "single", false, "Store a single finding") - cmd.Flags().BoolVar(&batchVal, "batch", false, "Store multiple findings") - cmd.Flags().BoolVar(&cleanVal, "clean", false, "Invalidate all existing track-review memories") - cmd.Flags().StringVar(&findingsVal, "findings", "", "Path to findings file (or stdin)") - return cmd -} - func backfillEmbeddingsCmd() *cobra.Command { var ( batchSize int @@ -1119,43 +824,4 @@ func defaultIfEmpty(val, defaultVal string) string { return defaultVal } return val -} - -// patchSkillAfterIntrospect attempts to patch the relevant skill file after -// introspection produces a self-assessment with a proposed update. -// Patching failure is degraded behavior, not a fatal error — the memory was still stored. -func patchSkillAfterIntrospect(ms *store.MemoryStore, category, proposedUpdate string) { - cfg, cfgErr := loadConfig() - if cfgErr != nil { - slog.Warn("llmem: introspect: could not load config for skill patching", "error", cfgErr) - return - } - - // Category "REVIEW_PASSED" needs no patch - if category == "REVIEW_PASSED" { - slog.Debug("llmem: introspect: REVIEW_PASSED needs no skill patch") - return - } - - sp, spErr := cfg.NewSkillPatcher(ms) - if spErr != nil { - slog.Warn("llmem: introspect: could not create skill patcher", "error", spErr) - return - } - if sp == nil { - slog.Debug("llmem: introspect: skill patcher not available, skipping patch") - return - } - - // Get the category description from taxonomy for context - categoryDescription := category - if desc, ok := taxonomy.ErrorTaxonomy[category]; ok { - categoryDescription = desc - } - - if err := sp.Patch(context.Background(), category, proposedUpdate, categoryDescription); err != nil { - slog.Warn("llmem: introspect: skill patching failed (memory was still stored)", "category", category, "error", err) - } else { - slog.Info("llmem: introspect: patched skill file", "category", category) - } } \ No newline at end of file diff --git a/cmd/llmem/main_test.go b/cmd/llmem/main_test.go deleted file mode 100644 index 054e64b..0000000 --- a/cmd/llmem/main_test.go +++ /dev/null @@ -1,34 +0,0 @@ -package main - -import ( - "testing" -) - -// TestIntrospectCmd_Flags tests that --no-llm and --timeout flags are registered -// and that removed flags (--auto, --text, --session, --model, --base-url) are gone. -func TestIntrospectCmd_Flags(t *testing.T) { - cmd := introspectCmd() - - noLLMFlag := cmd.Flags().Lookup("no-llm") - if noLLMFlag == nil { - t.Error("expected --no-llm flag to be registered on introspect command") - } - if noLLMFlag != nil && noLLMFlag.DefValue != "false" { - t.Errorf("expected --no-llm default false, got %q", noLLMFlag.DefValue) - } - - timeoutFlag := cmd.Flags().Lookup("timeout") - if timeoutFlag == nil { - t.Error("expected --timeout flag to be registered on introspect command") - } - if timeoutFlag != nil && timeoutFlag.DefValue != "" { - t.Errorf("expected --timeout default empty, got %q", timeoutFlag.DefValue) - } - - for _, name := range []string{"auto", "text", "session", "model", "base-url"} { - f := cmd.Flags().Lookup(name) - if f != nil { - t.Errorf("expected --%s flag to be removed from introspect command", name) - } - } -} \ No newline at end of file diff --git a/docs/API.md b/docs/API.md index 7e6caa3..84707ae 100644 --- a/docs/API.md +++ b/docs/API.md @@ -627,7 +627,7 @@ Code ref paths must be relative (no leading `/`) and must not contain `..` trave | `llmem.url_validate` | `is_safe_url()`, `safe_urlopen()`, `_strip_credentials()`, `validate_base_url()`, `_NoRedirectHandler`, `_extract_url_string()` (mirrors `memory.url_validate`), DNS rebinding protection | | `llmem.paths` | `validate_session_id()`, `get_context_dir()`, `_validate_write_path()`, `BLOCKED_SYSTEM_PREFIXES`, home/write path checks | | `llmem.registry` | `register_session_hook()`, `get_registered_session_hooks()`, `VALID_SESSION_EVENT_TYPES` | -| `llmem.taxonomy` | `ERROR_TAXONOMY`, `REVIEW_SEVERITY_TAXONOMY`, `SELF_ASSESSMENT_FIELDS`, `ERROR_TAXONOMY_KEYS` | +| `llmem.taxonomy` | `ERROR_TAXONOMY`, `REVIEW_SEVERITY_TAXONOMY`, `ERROR_TAXONOMY_KEYS` | | `llmem.metrics` | `compute_metrics()`, `anisotropy()`, `similarity_range()`, `discrimination_gap()`, `cosine_similarity()`, `bytes_to_vec()`, `EmbeddingMetrics` dataclass, warning thresholds, `METRICS_MAX_EMBEDDINGS` | | `llmem.store` | `MemoryStore` with `export_all(limit=)`, `import_memories()` validation, brute-force/embedding caps, dimension validation, inbox methods (`add_to_inbox`, `get_from_inbox`, `list_inbox`, `remove_from_inbox`, `update_inbox_attention_score`, `consolidate`), capacity eviction, `get_embeddings_with_types(limit=)`, `count_embeddings()` | | `llmem.code_index` | `CodeIndex` — manages `code_chunks` table, FTS5/vec virtual tables, add/search/remove operations | @@ -823,7 +823,7 @@ err := ms.RegisterMemoryType("my_custom_type") // Get the default types types := store.DefaultRegisteredTypes() -// ["fact", "decision", "preference", "event", "project_state", "procedure", "conversation", "self_assessment"] +// ["fact", "decision", "preference", "event", "project_state", "procedure", "conversation"] // Get valid relation types relTypes := store.ValidRelationTypes() @@ -1060,7 +1060,7 @@ weighted := retriever.ComputeWeightedSignal(signals) // Get default type priority map (returns defensive copy). priorities := retriever.DefaultTypePriority() -// map[decision:1.2 preference:1.1 procedure:1.1 fact:1.0 project_state:1.0 self_assessment:1.0 event:0.9] +// map[decision:1.2 preference:1.1 procedure:1.1 fact:1.0 project_state:1.0 event:0.9] ``` #### Reranking Signals @@ -1080,8 +1080,7 @@ The final score is: `rrf_score * (1 - blend) + weighted_signal * blend` |------|----------|-|------|----------| | decision | 1.2 | | fact | 1.0 | | preference | 1.1 | | project_state | 1.0 | -| procedure | 1.1 | | self_assessment | 1.0 | -| | | | event | 0.9 | +| procedure | 1.1 | | event | 0.9 | ### Embedding Metrics (internal/metrics) @@ -1163,9 +1162,6 @@ dreamerCfg := cfg.DreamerConfig() // DreamerConfig for dream.NewDreamer() dreamCfg := cfg.DreamConfigResolved() sessionCfg := cfg.SessionConfigResolved() -// Create a SkillPatcher from config (returns nil if store is nil) -sp, err := cfg.NewSkillPatcher(ms) // *skillpatch.SkillPatcher, nil ms → nil, nil - // Write config YAML (with file permissions 0600) written, err := config.WriteConfigYAML(path, configMap, false) // false = don't overwrite ``` @@ -1176,7 +1172,6 @@ written, err := config.WriteConfigYAML(path, configMap, false) // false = don't type Config struct { Memory MemoryConfig Dream DreamConfig - SkillPatch SkillPatchConfig OpenCode OpenCodeConfig Session SessionConfig } @@ -1201,22 +1196,14 @@ type DreamConfig struct { BoostAmount float64 DiaryPath string ReportPath string - BehavioralThreshold int - BehavioralLookbackDays int AutoLinkThreshold float64 StaleProcedureDays int - OllamaURL string - Model string } type SessionConfig struct { Adapter string DebounceSeconds int } - -type SkillPatchConfig struct { - Dir string // Root directory for skill files. Defaults to paths.GetSkillDir() (~/.config/llmem/skills/). -} ``` #### Validation @@ -1255,119 +1242,6 @@ available := engine.CheckAvailable(ctx) | Model | string | `"glm-5.1:cloud"` | Extraction model name | | BaseURL | string | `"http://localhost:11434"` | Ollama API base URL (validated for SSRF) | | HTTPClient | *http.Client | nil → new client | Optional pre-configured client (for testing) | -| OllamaClient | *ollama.OllamaClient | nil → new client | Optional pre-configured client (takes precedence over BaseURL) | - -### Introspection (internal/introspect) - -The `internal/introspect` package provides failure analysis, lesson learning, and session transcript introspection (see [Dream Cycle & Extraction](DREAM.md#go) for usage). - -```go -import "github.com/MichielDean/LLMem/internal/introspect" - -// IntrospectFailure — returns IntrospectResult with MemoryID, ProposedUpdate, and Category -result, err := introspect.IntrospectFailure(ctx, ms, introspect.IntrospectFailureParams{ - WhatHappened: "null pointer dereference", - Category: "NULL_SAFETY", - Context: "handler.go:42", - CaughtBy: "self-review", - ProposedFix: "add nil check", -}) -// result.MemoryID, result.ProposedUpdate, result.Category - -// LearnLesson — returns IntrospectResult with MemoryID, ProposedUpdate, and Category -result, err := introspect.LearnLesson(ctx, ms, introspect.LearnLessonParams{ - WhatWasWrong: "used global state", - WhatIsCorrect: "inject dependency via constructor", - Context: "service.go:15", -}) -// result.MemoryID, result.ProposedUpdate, result.Category - -// IntrospectTranscript — analyze a session transcript at session end -id, err := introspect.IntrospectTranscript(ctx, ms, transcript, "session-id", ollamaClient, "glm-5.1:cloud") -// When ollamaClient is nil, falls back to degraded storage (plain-text summary, no LLM call) -``` - -`IntrospectFailure` and `LearnLesson` return an `IntrospectResult` with `MemoryID`, `ProposedUpdate`, and `Category` fields. When `ProposedUpdate` and `Category` are non-empty, callers should patch the relevant skill file using a `SkillPatcher` (see [Skill Patching](#skill-patching)). - -All three functions use LLM expansion via Ollama when available. When Ollama is unavailable, they gracefully degrade to storage-only mode (storing the raw parameters without LLM expansion). - -**IntrospectTranscript** differs from `IntrospectFailure` and `LearnLesson` in two ways: -1. It accepts a pre-configured `*ollama.OllamaClient` instead of a model/baseURL pair, reusing the session's configured Ollama connection. -2. It uses `context.Background()` for the final store operation (not the caller's `ctx`), ensuring the session-end self-assessment is persisted even if the calling context has expired during the LLM call. This is intentional — `IntrospectFailure` and `LearnLesson` pass through `ctx` because they run mid-session when the context is still alive. - -#### IntrospectAuto - -```go -result, err := introspect.IntrospectAuto(ctx, ms, "Session transcript text...", "glm-5.1:cloud", "http://localhost:11434") -// result.MemoryID, result.ProposedUpdate, result.Category -``` - -`IntrospectAuto` performs automatic introspection on arbitrary text (typically a session transcript) and stores a `self_assessment` memory. When Ollama is available, it uses the LLM to expand the introspection into a richer assessment; when unavailable, it stores the raw text directly (graceful degradation). The `model` and `baseURL` parameters default to `"glm-5.1:cloud"` and `"http://localhost:11434"` respectively when empty. - -Returns an `IntrospectAutoResult` with three fields: -- `MemoryID`: always non-empty on success -- `ProposedUpdate`: extracted from LLM-enriched content when available; empty on graceful degradation -- `Category`: extracted from LLM-enriched content when available; empty on graceful degradation - -Contract: never returns `(IntrospectAutoResult{}, nil)` — either creates a memory or returns an error. Even on LLM failure, a storage-only memory is created. - -When `ProposedUpdate` and `Category` are both non-empty, callers should patch the relevant skill file using a `SkillPatcher` (see [Skill Patching](#skill-patching)). The CLI commands `introspect`, `learn`, and the `ending` hook all perform this patching automatically. - -### Skill Patching (internal/skillpatch) - -The `internal/skillpatch` package provides direct skill file patching after introspection. When introspection produces a `ProposedUpdate` and `Category`, the relevant SKILL.md file is patched immediately — no proposed-changes.md or human approval gate. The dream cycle later validates whether the patch reduced errors in that category. - -```go -import "github.com/MichielDean/LLMem/internal/skillpatch" - -sp, err := skillpatch.NewSkillPatcher(skillpatch.SkillPatchConfig{ - SkillDir: "", // empty → paths.GetSkillDir() (~/.config/llmem/skills/) -}) -if err != nil { - log.Fatal(err) -} - -// Patch a skill file with a procedural update from introspection -err = sp.Patch(ctx, "NULL_SAFETY", "Always guard nil pointers in Go", "Missing null checks") - -// Find the skill file for a category (returns "" if not found) -path, err := sp.FindSkillFile(ctx, "ERROR_HANDLING") - -// Validate whether a patch was effective (pure function, no I/O) -validation := skillpatch.ValidatePatch("NULL_SAFETY", 10, 3) -// validation.Effective → true (errors decreased) -// validation.Flagged → false -``` - -#### Patch Behavior - -- **Category mapping**: All 10 error taxonomy categories (`NULL_SAFETY`, `ERROR_HANDLING`, `OFF_BY_ONE`, `RACE_CONDITION`, `AUTH_BYPASS`, `DATA_INTEGRITY`, `MISSING_VERIFICATION`, `EDGE_CASE`, `PERFORMANCE`, `DESIGN`) map to the `introspection` skill directory. Unknown categories use `strings.ToLower(category)` as the directory name. -- **Additive patches**: New `## Patch: CATEGORY (YYYY-MM-DD)` sections are appended to SKILL.md, never overwriting existing content. -- **Idempotent**: Duplicate patches (same `proposedUpdate` text already in the file) are silently skipped. -- **New skill files**: If no SKILL.md exists for a category, a new one is created with YAML frontmatter. -- **Security**: Category names are validated against `^[A-Za-z0-9_]+$` to prevent path traversal. YAML frontmatter values have newlines sanitized to prevent injection. Resolved paths are validated to stay within the root skill directory. - -#### SkillPatchConfig - -| Field | Type | Default | Description | -|-------|------|---------|-------------| -| SkillDir | string | `paths.GetSkillDir()` | Root directory for skill files (~/.config/llmem/skills/) | - -#### ValidatePatch - -`ValidatePatch` is a pure function that compares two integer counts (before and after error count for a category) and returns a `PatchValidation` struct: - -```go -type PatchValidation struct { - Category string - BeforeCount int - AfterCount int - Effective bool // true when AfterCount < BeforeCount - Flagged bool // true when AfterCount >= BeforeCount -} -``` - -Zero before-count returns `{Effective: false, Flagged: false}` (no baseline to compare against). ### Ollama Client (internal/ollama) @@ -1524,17 +1398,14 @@ coord, err := session.NewSessionHookCoordinator(session.SessionHookConfig{ Adapter: adapter, // nil → no_transcript on idle/ending ExtractionEngine: extractionEngine, // nil → skip extraction Embedding: embeddingEngine, // nil → store without embeddings - OllamaClient: ollamaClient, // nil → degraded introspection in OnEnding - SkillPatcher: skillPatcher, // nil → skip skill patching (graceful degradation) }) ``` When `config.yaml` has `opencode.db_path` set and the database exists, the adapter is wired into the coordinator. When the path is empty or the DB is unreachable, a nil adapter is used — `OnIdle` and `OnEnding` return `"no_transcript"` gracefully. -The CLI also provides `openExtractionEngine()`, `openEmbeddingEngine()`, and `openOllamaClient()` helper functions that return nil on failure. The coordinator gracefully degrades when any of these are nil: +The CLI also provides `openExtractionEngine()` and `openEmbeddingEngine()` helper functions that return nil on failure. The coordinator gracefully degrades when any of these are nil: - `ExtractionEngine` nil → extraction skipped, memories not extracted from transcript - `Embedding` nil → memories stored without embedding vectors -- `OllamaClient` nil → `IntrospectTranscript` produces degraded self-assessment (plain-text summary, no LLM call) #### SessionHookConfig @@ -1544,13 +1415,8 @@ type SessionHookConfig struct { Adapter SessionAdapter // Provides session content. nil → no_transcript DebounceSeconds int // Min interval between idle events. Default: 30 ContextDir string // Directory for context files. Default: paths.GetContextDir() - Model string // LLM model for introspection. Default: "glm-5.1:cloud" - BaseURL string // Ollama base URL for introspection. Default: "http://localhost:11434" ExtractionEngine *extract.ExtractionEngine // Extracts memories from transcript. nil → skip extraction Embedding *embed.EmbeddingEngine // Generates embedding vectors. nil → store without embeddings - OllamaClient *ollama.OllamaClient // Used for introspection in OnEnding. nil → degraded fallback - IntrospectModel string // LLM model name for IntrospectTranscript. Default: "glm-5.1:cloud" - SkillPatcher *skillpatch.SkillPatcher // Patches skill files after introspection. nil → skip patching (graceful degradation) } ``` @@ -1562,23 +1428,12 @@ coord, err := session.NewSessionHookCoordinator(session.SessionHookConfig{ Adapter: adapter, ExtractionEngine: extractionEngine, // nil → skip extraction Embedding: embeddingEngine, // nil → store without embeddings - OllamaClient: ollamaClient, // nil → degraded introspection in OnEnding - IntrospectModel: "glm-5.1:cloud", // optional, defaults to "glm-5.1:cloud" }) result, err := coord.OnCreated(ctx, "session-id") // "success" | "already_processed" result, err := coord.OnIdle(ctx, "session-id") // "success" | "debounced" | "no_transcript" resultType, ctxPath, err := coord.OnCompacting(ctx, "session-id") // "success" | "no_memories" result, err := coord.OnEnding(ctx, "session-id") // "success" - -// OnEndingWithIntrospect: like OnEnding, but also performs automatic introspection -// and skill patching. When the result includes ProposedUpdate and Category, -// patches the relevant skill file immediately (no human approval gate). -resultType, memoryID, err := coord.OnEndingWithIntrospect(ctx, "session-id") -// Returns: ("success", memoryID, nil) on success -// ("no_transcript", "", nil) when adapter is nil or transcript is empty -// ("success", "", nil) when introspection fails (logs warning, doesn't crash) -// ("error", "", err) on validation error ``` All methods validate session IDs via `paths.ValidateSessionID` to prevent path traversal. @@ -1589,7 +1444,7 @@ All methods validate session IDs via `paths.ValidateSessionID` to prevent path t 3. Generates embedding vectors for each memory (if `Embedding` is non-nil) 4. Stores memories and logs the extraction -**OnEnding** extracts memories the same way as OnIdle, then runs `IntrospectTranscript` to produce a session-end self-assessment. When `OllamaClient` is nil, `IntrospectTranscript` falls back to a degraded plain-text summary (no LLM call attempted) — the nil-OllamaClient guard must NOT be used, or the degradation path is bypassed. +**OnEnding** extracts memories the same way as OnIdle. ### Systemd Unit Generation (internal/systemd) @@ -1613,7 +1468,7 @@ Templates are embedded via `embed.FS`. `GenerateTimerUnit` calls `ValidateSchedu ### Taxonomy (internal/taxonomy) -The `internal/taxonomy` package provides error taxonomy constants for self_assessment memories. +The `internal/taxonomy` package provides error taxonomy constants. ```go import "github.com/MichielDean/LLMem/internal/taxonomy" @@ -1626,19 +1481,7 @@ for category, description := range taxonomy.ErrorTaxonomy { // Get ordered category keys keys := taxonomy.ErrorTaxonomyKeys() // ["NULL_SAFETY", "ERROR_HANDLING", "OFF_BY_ONE", "RACE_CONDITION", "AUTH_BYPASS", -// "DATA_INTEGRITY", "MISSING_VERIFICATION", "EDGE_CASE", "PERFORMANCE", "DESIGN", "REVIEW_PASSED"] - -// Parse a formatted self-assessment line -parsed := taxonomy.ParseSelfAssessment("NULL_SAFETY: null pointer dereference") -// map[string]string{"Category": "NULL_SAFETY", "What": "null pointer dereference"} - -// Parse a specific field from self-assessment content (used by introspect and skillpatch) -proposedUpdate := taxonomy.ParseSelfAssessmentField(content, "Proposed_update") -category := taxonomy.ParseSelfAssessmentField(content, "Category") -// Returns empty string if field not found; never returns an error - -// Get comma-separated category choices -choices := taxonomy.IntrospectCategoryChoices() +// "DATA_INTEGRITY", "MISSING_VERIFICATION", "EDGE_CASE", "PERFORMANCE", "DESIGN"] ``` #### Error Categories @@ -1654,5 +1497,4 @@ choices := taxonomy.IntrospectCategoryChoices() | `MISSING_VERIFICATION` | Skipped tests, unverified outputs | | `EDGE_CASE` | Unhandled empty input, unexpected types | | `PERFORMANCE` | N+1 queries, memory leaks | -| `DESIGN` | Architectural issues, coupling problems | -| `REVIEW_PASSED` | Clean review — positive outcome | \ No newline at end of file +| `DESIGN` | Architectural issues, coupling problems | \ No newline at end of file diff --git a/docs/CLI.md b/docs/CLI.md index bda6702..d8693e5 100644 --- a/docs/CLI.md +++ b/docs/CLI.md @@ -25,17 +25,12 @@ Commands: init Initialize the llmem memory system metrics Report embedding quality metrics dream Run the dream consolidation cycle - introspect Analyze a failure and store self_assessment memory - learn Learn a lesson from a wrong→right correction - track-review Persist review findings as self_assessment memories context Inject relevant memory context for a session hook Handle session lifecycle hook events ``` **Python-only commands** (not yet in the Go CLI): `register-type`, `types`, `note`, `inbox`, `consolidate`, `embed`, `learn` (codebase indexing), `suggest-categories`. -> The Go CLI uses `metrics` instead of `embed`, and `learn` is for wrong→right corrections (not codebase indexing). - ### `llmem add` ```bash @@ -129,56 +124,6 @@ llmem metrics Report the count of embeddings stored in the database. In the Go CLI, this command outputs the number of embedded memories. For full embedding quality metrics (anisotropy, similarity range, discrimination gap), use the Python CLI's `llmem embed` command. -### `llmem introspect` - -#### Manual mode - -```bash -llmem introspect --what-happened TEXT [--category CATEGORY] [--context CONTEXT] \ - [--caught-by WHO] [--proposed-fix FIX] [--model MODEL] [--base-url URL] -``` - -Analyze a failure and store a `self_assessment` memory. Uses LLM expansion via Ollama when available, with graceful degradation to storage-only mode when Ollama is unavailable. When the result includes a `ProposedUpdate` and `Category`, the relevant SKILL.md file is patched immediately (no human approval gate). - -- `--what-happened` (required): Description of what went wrong. -- `--category`: Error taxonomy category (e.g., `NULL_SAFETY`, `ERROR_HANDLING`). See `taxonomy.ErrorTaxonomy` for all categories. -- `--context`: Context where the failure occurred (e.g., `handler.go:42`). -- `--caught-by`: How the finding was discovered (e.g., `self-review`, `CI`). -- `--proposed-fix`: Proposed fix for the issue. -- `--model`: LLM model for introspection (default: `glm-5.1:cloud`). -- `--base-url`: Ollama base URL for introspection (default: `http://localhost:11434`). - -#### Automatic mode - -```bash -llmem introspect --auto --session SESSION_ID [--model MODEL] [--base-url URL] -llmem introspect --auto --text TEXT [--model MODEL] [--base-url URL] -``` - -Automatically introspect a session transcript or arbitrary text and store a `self_assessment` memory. When Ollama is available, uses the LLM to expand the introspection into a richer assessment; when unavailable, stores the raw structured fields directly (graceful degradation). - -- `--auto`: Enable automatic introspection mode. -- `--text`: Text to introspect. Use with `--auto`. When both `--text` and `--session` are provided, `--text` takes precedence. -- `--session`: Session ID to read transcript from (requires the OpenCode adapter). Use with `--auto`. The session ID is validated against path traversal. -- `--model`: LLM model for introspection (default: `glm-5.1:cloud`). -- `--base-url`: Ollama base URL for introspection (default: `http://localhost:11434`). - -At least one of `--text` or `--session` is required when using `--auto`. - -### `llmem learn` - -```bash -llmem learn --wrong TEXT --right TEXT [--context CONTEXT] -``` - -Learn a lesson from a wrong→right correction and store it as a `procedure` memory. Uses LLM expansion via Ollama when available, with graceful degradation to storage-only mode. When the result includes a `ProposedUpdate` and `Category`, the relevant SKILL.md file is patched immediately (no human approval gate). - -- `--wrong` (required): What was wrong. -- `--right` (required): What is correct. -- `--context`: Additional context for the correction. - -> **Note:** In the Python CLI, `llmem learn` ingests a codebase directory into the code index. In the Go CLI, `llmem learn` is for wrong→right lesson corrections. - ### `llmem dream` ```bash @@ -189,7 +134,7 @@ Run the dream consolidation cycle, which performs automated memory maintenance i - **Light phase:** Sort and deduplicate near-duplicate memories (cosine similarity ≥ `dream.similarity_threshold`). - **Deep phase:** Score, promote, decay, and merge memories. Also promotes inbox items to long-term memory (items with attention_score ≥ `dream.min_score` become permanent; lower-scored items are evicted). Decays confidence on idle memories. Boosts frequently accessed memories. Performs LLM-assisted merging of similar pairs. Auto-links memories with high cosine similarity (≥ `dream.auto_link_threshold`, default 0.85). -- **REM phase:** Extract themes from memory clusters and write a dream diary (read-only reflection). When Ollama is available, generates actionable behavioral insights via LLM with "Do" directives, "Verify" steps, and `[SKILL PATCH]` sections (Detection Rule, Checklist, Pitfall, Verification); falls back to count-based summaries when Ollama is unavailable. When a `SkillPatcher` is configured, the REM phase also validates previously applied skill patches by comparing error counts before and after each patch — patches where errors decreased are marked effective, patches where errors stayed the same or increased are flagged for review. When run with `--apply`, also appends behavioral insight and skill patch sections to `proposed-changes.md` at `~/.config/llmem/proposed-changes.md` (or `LMEM_HOME/proposed-changes.md`). Each dream run's entries are separated by a timestamp header. The file is append-only — existing content is preserved. +- **REM phase:** Extract themes from memory clusters and write a dream diary (read-only reflection). Produces type counts, word clusters, and total/active memory counts. Without `--apply`, the dream cycle runs as a **dry run** — output is prefixed with `[DRY RUN]` and no changes are written to the database. @@ -199,7 +144,7 @@ Flags: - `--phase`: Run a specific dream phase only. Choices: `light`, `deep`, `rem`. Default: all phases. - `--report PATH`: Write an HTML dream report to the given path. The path is validated — it must not target a protected system directory (e.g. `/etc`, `/var`), contain `..` traversal, or be a symlink. Paths outside the llmem home directory are allowed (e.g. custom report output locations). On validation failure, prints an error to stderr and exits with code 1. -All dream configuration (thresholds, model, schedule, etc.) is read from the `dream:` section of `config.yaml` (see [Configuration](../docs/CONFIGURATION.md)). The `dream.ollama_url` and `dream.model` fields control Ollama connectivity for behavioral insight generation; they fall back to `memory.ollama_url` and `memory.extract_model` respectively if not set. +All dream configuration (thresholds, model, schedule, etc.) is read from the `dream:` section of `config.yaml` (see [Configuration](../docs/CONFIGURATION.md)). Output is printed to stdout. On `--report` path validation errors, the error message is printed to stderr. @@ -219,39 +164,14 @@ Inject relevant memory context for a session. Used by session hooks to inject me ### `llmem hook` ```bash -llmem hook --type TYPE --session-id ID [--model MODEL] [--base-url URL] +llmem hook --type TYPE --session-id ID ``` Handle session lifecycle hook events. Supports four hook types: - `--type` (required): Hook type. Choices: `created`, `idle`, `compacting`, `ending`. - `--session-id` (required): Session ID for the hook event. Validated against path traversal attacks. -- `--model`: LLM model for introspection (default: `glm-5.1:cloud`). Used by the `ending` hook for automatic introspection. -- `--base-url`: Ollama base URL for introspection (default: `http://localhost:11434`). Used by the `ending` hook for automatic introspection. The `idle` hook processes the session's transcript, extracts memories via the extraction pipeline (chunk → dedup → LLM extract → embed → store), and generates embedding vectors for each extracted memory. It uses a debounce mechanism (via `extraction_log` table) to prevent re-extraction. When `ExtractionEngine` is not configured, extraction is skipped gracefully. -The `ending` hook extracts memories from the transcript (same pipeline as `idle`), then runs `IntrospectTranscript` to produce a session-end `self_assessment` memory. When the LLM is unavailable, `IntrospectTranscript` falls back to a degraded plain-text summary of the session (no LLM call attempted). - -The `ending` hook performs automatic introspection on the session transcript. It reads the transcript via the configured adapter, generates a `self_assessment` memory using `IntrospectAuto`, and outputs the result type and memory ID. If no adapter is configured or the transcript is empty, it returns `no_transcript`. If introspection fails but the transcript was read, it logs a warning and returns success without crashing the ending event. When the introspection result includes a `ProposedUpdate` and `Category`, the hook patches the relevant skill file immediately (no human approval gate). - -### `llmem track-review` - -```bash -llmem track-review --single --findings PATH -llmem track-review --batch --findings PATH -llmem track-review --clean -``` - -Persist review findings as `self_assessment` memories. Three modes: - -1. **Single finding** (`--single`): Store one finding from a findings file or stdin. -2. **Batch** (`--batch`): Store multiple findings from a findings file or stdin. Each line is parsed as `Category: value` and stored. -3. **Clean** (`--clean`): Invalidate all existing `self_assessment` memories with `source=track-review`. - -- `--single`: Store a single finding. -- `--batch`: Store multiple findings. -- `--clean`: Invalidate all track-review memories. -- `--findings`: Path to findings file (or stdin). The path is validated against system directory traversal. - -Every invocation with `--single` or `--batch` parses lines from input. Lines with unknown categories still produce memories with the parsed category. Empty input or unknown categories produce a `REVIEW_PASSED` memory. \ No newline at end of file +The `ending` hook extracts memories from the transcript (same pipeline as `idle`). \ No newline at end of file diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index 2dd4f30..425d444 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -14,8 +14,6 @@ LLMem looks for configuration at `~/.config/llmem/config.yaml`. If this file doe | Database | `~/.config/llmem/memory.db` | `config.yaml: memory.db` | | Config file | `~/.config/llmem/config.yaml` | — | | Dream diary | `~/.config/llmem/dream-diary.md` | `config.yaml: dream.diary_path` | -| Proposed changes | `~/.config/llmem/proposed-changes.md` | `config.yaml: dream.proposed_changes_path` | -| Skill files | `~/.config/llmem/skills/` | `config.yaml: skill_patch.dir` | **Backward compatibility:** If `~/.lobsterdog/` exists and `~/.config/llmem/` doesn't, LLMem will use the legacy path. Call `migrate_from_lobsterdog()` to copy data to the new location. @@ -32,7 +30,6 @@ memory: context_budget: 4000 auto_extract: true max_file_size: 10485760 # 10MB - call_model_timeout: 5m # Timeout for LLM calls in introspect/learn (Go duration: "5m", "120s") dream: enabled: true # (Python only — not wired in Go CLI dream command) @@ -46,18 +43,9 @@ dream: boost_amount: 0.05 diary_path: null # Auto-resolved from GetDreamDiaryPath() report_path: null # Auto-resolved from GetDreamReportPath() - proposed_changes_path: null # Auto-resolved from GetProposedChangesPath() - behavioral_threshold: 3 - behavioral_lookback_days: 30 - ollama_url: http://localhost:11434 # Ollama API URL for LLM-generated behavioral insights - model: glm-5.1:cloud # Model for behavioral insight generation - model_timeout: 5m # Timeout for LLM calls during REM behavioral insight generation (Go duration) auto_link_threshold: 0.85 # Cosine similarity threshold for auto-linking related memories stale_procedure_days: 30 # Days after which an unaccessed procedure memory decays at 2x rate -skill_patch: - dir: null # Root directory for skill files (default: ~/.config/llmem/skills/) - opencode: db_path: ~/.local/share/opencode/opencode.db context_dir: null # Auto-resolved from GetContextDir() diff --git a/docs/DREAM.md b/docs/DREAM.md index 01aa397..51d0f5f 100644 --- a/docs/DREAM.md +++ b/docs/DREAM.md @@ -8,7 +8,7 @@ The dream cycle performs automated memory maintenance during idle periods. It ca - **Light phase:** Sort and deduplicate near-duplicate memories (cosine similarity ≥ threshold). - **Deep phase:** Score, promote, decay, and merge memories. Decays confidence on idle memories. Boosts frequently accessed memories. Auto-links memories with high cosine similarity (≥ `dream.auto_link_threshold`, default 0.85) by creating `related_to` relations between them. Procedure memories older than `dream.stale_procedure_days` (default 30 days) with no recent access decay at double the normal rate — proposed-but-never-adopted procedures fade faster than confirmed ones. -- **REM phase:** Extract themes from memory clusters and write a dream diary (read-only reflection). Also extracts behavioral insights (patterns exceeding `dream.behavioral_threshold` occurrences within `dream.behavioral_lookback_days` days). When Ollama is available, uses an LLM call to generate specific, actionable procedural rules with "Do" directives and "Verify" steps; also generates `[SKILL PATCH]` sections (Detection Rule, Checklist, Pitfall, Verification). Falls back to count-based summaries when Ollama is unavailable. When a `SkillPatcher` is configured, the REM phase validates previously applied skill patches by comparing error counts before and after each patch — patches where errors decreased are marked effective, patches where errors stayed the same or increased are flagged for review. Skill patches are applied directly to SKILL.md files immediately after introspection (no proposed-changes.md or human approval gate). The dream validates whether patches reduced errors; if not, they are flagged for review. +- **REM phase:** Extract themes from memory clusters and write a dream diary (read-only reflection). Produces type counts, word clusters, and total/active memory counts. Configuration is under the `dream:` key in `config.yaml`. See [Configuration](CONFIGURATION.md) for all dream settings. @@ -40,16 +40,9 @@ dreamer, err := dream.NewDreamer(dream.DreamerConfig{ BoostThreshold: 5, // default BoostAmount: 0.05, // default AutoLinkThreshold: 0.85, // default - BehavioralThreshold: 3, // default - BehavioralLookbackDays: 30, // default StaleProcedureDays: 30, // default — procedure memories older than this decay at 2x - BaseURL: "", // defaults to "http://localhost:11434" - Model: "", // defaults to "glm-5.1:cloud" - OllamaClient: nil, // nil → created from BaseURL; takes precedence if provided DiaryPath: "", // defaults from paths.GetDreamDiaryPath() ReportPath: "", // defaults from paths.GetDreamReportPath() - ProposedChangesPath: "", // defaults from paths.GetProposedChangesPath() - SkillPatcher: nil, // nil → skip patch validation in REM phase }) if err != nil { log.Fatal(err) @@ -64,14 +57,6 @@ result, err := dreamer.Run(ctx, true, "deep") // Write dream diary (markdown with sync.Mutex for in-process concurrency) err = dreamer.WriteDiary(result) -// Write proposed-changes.md (behavioral insights + skill patches, append-only) -err = dreamer.WriteProposedChanges(ctx, result) - -// Skill patch validation happens automatically during REM phase when SkillPatcher is set. -// The REM phase compares error counts before and after each patch, marking effective -// or flagged-for-review patches. Patches are applied immediately after introspection -// (not via proposed-changes.md) — the dream validates whether patches reduced errors. - // Generate HTML dream report err = dreamer.GenerateDreamReport(result, "/path/to/report.html") ``` @@ -89,17 +74,9 @@ err = dreamer.GenerateDreamReport(result, "/path/to/report.html") | BoostThreshold | int | 5 | Access count threshold for boosting | | BoostAmount | float64 | 0.05 | Confidence boost per event | | AutoLinkThreshold | float64 | 0.85 | Cosine similarity threshold for auto-linking | -| BehavioralThreshold | int | 3 | Minimum occurrences for behavioral insight | -| BehavioralLookbackDays | int | 30 | Lookback window for behavioral insights | | StaleProcedureDays | int | 30 | Age threshold (days) for double-decay of procedure memories | -| OllamaClient | *ollama.OllamaClient | nil | Pre-configured Ollama client. Takes precedence over BaseURL. When nil, the constructor creates one from BaseURL. | -| BaseURL | string | `"http://localhost:11434"` | Ollama API base URL for behavioral insight generation. Validated for SSRF. | -| HTTPClient | *http.Client | nil | Optional pre-configured HTTP client (for testing with httptest.NewServer). Only used when OllamaClient is nil. | -| Model | string | `"glm-5.1:cloud"` | Ollama model name for behavioral insight generation | | DiaryPath | string | paths.GetDreamDiaryPath() | Path for dream diary markdown | | ReportPath | string | paths.GetDreamReportPath() | Path for HTML dream report | -| ProposedChangesPath | string | paths.GetProposedChangesPath() | Path for proposed-changes.md (behavioral insights and skill patches) | -| SkillPatcher | *skillpatch.SkillPatcher | nil | Skill patcher for validating applied patches during REM phase. When nil, patch validation is skipped. | #### DreamResult @@ -124,19 +101,10 @@ type DeepPhaseResult struct { AutoLinkedCount int } -type BehavioralInsight struct { - Category string - Count int - InsightID string - ContentSnippet string - Samples []string -} - type RemPhaseResult struct { - TotalMemories int - ActiveMemories int - Themes []string - BehavioralInsights []BehavioralInsight + TotalMemories int + ActiveMemories int + Themes []string } ``` @@ -149,8 +117,6 @@ The `hooks` module provides automatic extraction from session transcripts: - `process_file()`: Extract memories from a transcript file. - `process_session()`: Extract from an OpenCode session ID. - `process_all_session_sources()`: Process all session sources (delegates to `session_hooks.process_opencode_sessions`). -- Self-assessment extraction with structured error taxonomy. -- Correction detection for identifying mistakes. The `session_hooks` module provides `process_opencode_sessions()` — the full pipeline that reads OpenCode sessions from the SQLite database, chunks them by message boundaries, and runs extraction and embedding. @@ -191,83 +157,4 @@ result, err := coord.OnCreated(ctx, "session-id") // ("success"|"already_p result, err := coord.OnIdle(ctx, "session-id") // ("success"|"debounced"|"no_transcript", count) resultType, contextPath, err := coord.OnCompacting(ctx, "session-id") result, err := coord.OnEnding(ctx, "session-id") - -// OnEndingWithIntrospect: session.ending with automatic introspection -resultType, memoryID, err := coord.OnEndingWithIntrospect(ctx, "session-id") -``` - -The `internal/introspect` package provides failure analysis, lesson learning, and session transcript introspection: - -```go -import "github.com/MichielDean/LLMem/internal/introspect" - -// Analyze a failure and store self_assessment. Returns IntrospectResult with MemoryID, ProposedUpdate, and Category. -result, err := introspect.IntrospectFailure(ctx, ms, introspect.IntrospectFailureParams{ - WhatHappened: "null pointer dereference in handler", - Category: "NULL_SAFETY", - Context: "handler.go:42", - CaughtBy: "self-review", - ProposedFix: "add nil check before access", - Model: "glm-5.1:cloud", - BaseURL: "http://localhost:11434", -}) -// result.MemoryID, result.ProposedUpdate, result.Category - -// Learn a lesson from a wrong→right correction. Returns IntrospectResult with MemoryID, ProposedUpdate, and Category. -result, err := introspect.LearnLesson(ctx, ms, introspect.LearnLessonParams{ - WhatWasWrong: "used global state", - WhatIsCorrect: "inject dependency via constructor", - Context: "service.go:15", -}) - -// Automatic introspection from text (e.g., a session transcript) -result, err := introspect.IntrospectAuto(ctx, ms, "Session transcript text...", "glm-5.1:cloud", "http://localhost:11434") -// result.MemoryID, result.ProposedUpdate, result.Category -``` - -All three functions use LLM expansion via Ollama when available, with graceful degradation to storage-only mode when Ollama is unavailable. `IntrospectFailure` and `LearnLesson` return `IntrospectResult{MemoryID, ProposedUpdate, Category}`. `IntrospectAuto` returns `IntrospectAutoResult{MemoryID, ProposedUpdate, Category}`. `ProposedUpdate` and `Category` are populated when LLM enrichment succeeds; empty on graceful degradation. - -When `ProposedUpdate` and `Category` are both non-empty, callers should patch the relevant skill file using a `SkillPatcher` (see [Skill Patching](#skill-patching-internalskillpatch)). The CLI commands `introspect`, `learn`, and `hook --type ending` all perform this patching automatically after introspection. - -```go -// Introspect a session transcript (called by OnEnding) -id, err := introspect.IntrospectTranscript(ctx, ms, transcript, "session-id", ollamaClient, "glm-5.1:cloud") -// When ollamaClient is nil, falls back to degraded storage (plain-text summary, no LLM call) -``` - -Both `IntrospectFailure` and `LearnLesson` use LLM expansion via Ollama when available, with graceful degradation to storage-only mode when Ollama is unavailable. - -`IntrospectTranscript` analyzes a session transcript and stores a `self_assessment` memory. It accepts a pre-configured `*ollama.OllamaClient` (reusing the session's connection). When `ollamaClient` is nil, it produces a degraded memory with a plain-text summary. On LLM availability, the model generates a structured self-assessment from the transcript content. Note: `IntrospectTranscript` uses `context.Background()` for the final store operation (not the caller's `ctx`) to ensure persistence even if the calling context has expired during the LLM call. - -### Skill Patching (internal/skillpatch) - -The `internal/skillpatch` package provides direct skill file patching after introspection. When introspection produces a `ProposedUpdate` and `Category`, the relevant SKILL.md file is patched immediately — no proposed-changes.md or human approval gate. The dream cycle later validates whether the patch reduced errors in that category. - -```go -import "github.com/MichielDean/LLMem/internal/skillpatch" - -sp, err := skillpatch.NewSkillPatcher(skillpatch.SkillPatchConfig{ - SkillDir: "", // empty → paths.GetSkillDir() (~/.config/llmem/skills/) -}) - -// Patch a skill file with a procedural update from introspection -err = sp.Patch(ctx, "NULL_SAFETY", "Always guard nil pointers in Go", "Missing null checks") - -// Find the skill file for a category (returns "" if not found) -path, err := sp.FindSkillFile(ctx, "ERROR_HANDLING") - -// Validate whether a patch was effective (pure function, no I/O) -validation := skillpatch.ValidatePatch("NULL_SAFETY", 10, 3) -// validation.Effective → true (errors decreased) -// validation.Flagged → false ``` - -**Patch behavior:** -- If the category maps to a known skill directory (all 10 error categories map to `introspection`), the existing SKILL.md is patched in-place. -- If no SKILL.md exists, a new one is created with YAML frontmatter. -- Patches are additive — new `## Patch: CATEGORY (YYYY-MM-DD)` sections are appended, never overwriting existing content. -- Duplicate patches (same `proposedUpdate` text) are skipped (idempotent). -- Category names are validated against `^[A-Za-z0-9_]+$` to prevent path traversal. -- YAML frontmatter values are sanitized (newlines replaced) to prevent injection. - -**Dream validation:** When a `SkillPatcher` is provided in `DreamerConfig`, the REM phase calls `ValidatePatch` for each category with behavioral insights, comparing error counts before and after the patch was applied. Effective patches (errors decreased) are noted; ineffective patches (errors stayed the same or increased) are flagged for review via `{flagged_for_review: true}` metadata on the insight memory. diff --git a/docs/INSTALLATION.md b/docs/INSTALLATION.md index 6ad9a4d..78cde1d 100644 --- a/docs/INSTALLATION.md +++ b/docs/INSTALLATION.md @@ -4,7 +4,7 @@ How to install and set up LLMem. [Back to README](../README.md) ## Go Installation (recommended) -The Go binary provides the full CLI, dream cycle, session hooks, introspection, and extraction. +The Go binary provides the full CLI, dream cycle, session hooks, and extraction. ### Build from source diff --git a/docs/INTEGRATIONS.md b/docs/INTEGRATIONS.md index 2d4e3f7..9f28daf 100644 --- a/docs/INTEGRATIONS.md +++ b/docs/INTEGRATIONS.md @@ -11,14 +11,12 @@ Agent Session │ ├── Plugin (auto, no instructions needed) │ ├── session.created/start → llmem stats + search → inject context - │ ├── session.idle/end → llmem hook idle/ending → extract + introspect + │ ├── session.idle/end → llmem hook idle/ending → extract memories │ └── session.compacting → llmem context --compacting → preserve memories │ ├── Skills (on-demand, loaded by trigger) │ ├── llmem → CLI reference, memory types, commands - │ ├── llmem-setup → Install and configure LLMem - │ ├── introspection → Self-assessment framework, error taxonomy - │ └── introspection-review-tracker → Review outcome tracking + │ └── llmem-setup → Install and configure LLMem │ └── Custom Tools (structural, zero-instruction) ├── llmem-search → Search memories @@ -33,8 +31,6 @@ Agent Session - **No instruction pollution.** The plugin injects context automatically. Skills load on-demand. Your AGENTS.md/CLAUDE.md stays clean. - **Platform-agnostic core.** Same Go binary, same skills, same CLI across OpenCode, Claude Code, and Copilot CLI. Only the thin adapter plugin differs. -- **Single install command.** `npm install` deploys skills, plugins, and tools for your platform. -- **No per-platform instruction docs to maintain.** The plugin handles behavioral injection, not 80-line instruction blocks. ## OpenCode Integration @@ -61,7 +57,7 @@ The OpenCode plugin (`plugins/opencode/llmem.js`) handles: | Event | Action | |-------|--------| -| `session.created` | Runs `llmem stats` + `llmem search behavioral/proposed` — injects results as log context | +| `session.created` | Runs `llmem stats` + `llmem search` — injects results as log context | | `session.idle` | Runs `llmem hook idle ` — extracts memories from transcript | | `session.ending` | (not yet wired — agent-driven via skills) | | `experimental.session.compacting` | Runs `llmem context --compacting` — preserves key memories | @@ -97,8 +93,6 @@ Four skills ship with LLMem and are installed to `~/.agents/skills/`: |-------|-------------| | **llmem** | Full CLI reference, memory types, commands, dream config | | **llmem-setup** | Install, configure, and integrate LLMem into a harness | -| **introspection** | Self-assessment framework, error taxonomy, vigilance checks | -| **introspection-review-tracker** | Review outcome tracking for code reviews | ### Optional: AGENTS.md Pointer @@ -140,9 +134,7 @@ plugins/agent/ │ └── hooks.json # Session lifecycle hooks └── skills/ ├── llmem/SKILL.md - ├── llmem-setup/SKILL.md - ├── introspection/SKILL.md - └── introspection-review-tracker/SKILL.md + └── llmem-setup/SKILL.md ``` ### Installation @@ -159,8 +151,8 @@ The `hooks.json` declares: | Event | Action | |-------|--------| -| `SessionStart` | Runs `llmem stats` + behavioral + proposed searches — stdout injected as context | -| `SessionEnd` | Runs `llmem hook ending` — extracts memories and runs introspection | +| `SessionStart` | Runs `llmem stats` + search — stdout injected as context | +| `SessionEnd` | Runs `llmem hook ending` — extracts memories | | `PreCompact` | Runs `llmem context --compacting` — preserves key memories | The `SessionStart` hook's **stdout is added as context that Claude can see and act on** — this is the key mechanism for zero-config integration. @@ -218,7 +210,7 @@ Plugin-managed. Search when uncertain: `llmem search "topic"`. Add when you lear After installation, verify the skills and plugins are discoverable: ```bash -ls ~/.agents/skills/llmem ~/.agents/skills/introspection ~/.agents/skills/introspection-review-tracker +ls ~/.agents/skills/llmem ~/.agents/skills/llmem-setup # OpenCode plugin ls ~/.config/opencode/plugins/llmem.js diff --git a/docs/RERANKING.md b/docs/RERANKING.md index 5c292bf..c681cbd 100644 --- a/docs/RERANKING.md +++ b/docs/RERANKING.md @@ -27,8 +27,7 @@ final_score = rrf_score * (1 - blend) + weighted_signal * blend |------|----------|-|------|----------| | decision | 1.2 | | fact | 1.0 | | preference | 1.1 | | project_state | 1.0 | -| procedure | 1.1 | | self_assessment | 1.0 | -| | | | event | 0.9 | +| procedure | 1.1 | | event | 0.9 | | | | | conversation | 0.7 | Search results include both `_rrf_score` (raw RRF fusion score) and `_rerank_score` (blended final score). Results are sorted by `_rerank_score` descending, with ties broken by ascending memory ID. Search operations (`Retriever.search()` and `Retriever.hybrid_search()`) automatically track access — each returned result's `access_count` and `accessed_at` are updated (best-effort), keeping the recency and access frequency signals current. This Hebbian reinforcement is on by default (`track_access=True`); pass `track_access=False` to skip access tracking (useful for analytics queries that shouldn't inflate counts). @@ -112,7 +111,7 @@ weighted := retriever.ComputeWeightedSignal(signals) priorities := retriever.DefaultTypePriority() ``` -The type priority weights are identical to Python (`decision: 1.2`, `preference: 1.1`, `procedure: 1.1`, `fact: 1.0`, `project_state: 1.0`, `self_assessment: 1.0`, `event: 0.9`). `NewRetriever` makes a defensive copy of the input map to prevent caller mutation. +The type priority weights are identical to Python (`decision: 1.2`, `preference: 1.1`, `procedure: 1.1`, `fact: 1.0`, `project_state: 1.0`, `event: 0.9`). `NewRetriever` makes a defensive copy of the input map to prevent caller mutation. ### Access Tracking diff --git a/internal/config/config.go b/internal/config/config.go index 99aceac..509e91a 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -8,12 +8,9 @@ import ( "os" "path/filepath" "strings" - "time" "github.com/MichielDean/LLMem/internal/dream" "github.com/MichielDean/LLMem/internal/paths" - "github.com/MichielDean/LLMem/internal/skillpatch" - "github.com/MichielDean/LLMem/internal/store" "github.com/MichielDean/LLMem/internal/urlvalidate" "gopkg.in/yaml.v3" ) @@ -22,57 +19,39 @@ import ( // Only fields that are wired through to DreamerConfig are included. // Removed dead fields: MinScore, MinRecallCount, MinUniqueQueries, // BoostOnPromote, MergeModel, CalibrationEnabled, -// CalibrationLookbackDays, Enabled, Schedule, SkillPatchDir — these were defined in +// CalibrationLookbackDays, Enabled, Schedule — these were defined in // config but never read by any method, creating a contract violation. // Enabled and Schedule control systemd timer behaviour, not dream // algorithm parameters; they are handled by internal/systemd directly. -// SkillPatchDir was superseded by SkillPatchConfig.Dir (Config.SkillPatch.Dir). type DreamConfig struct { - SimilarityThreshold float64 `yaml:"similarity_threshold"` - DecayRate float64 `yaml:"decay_rate"` - DecayIntervalDays int `yaml:"decay_interval_days"` - DecayFloor float64 `yaml:"decay_floor"` - ConfidenceFloor float64 `yaml:"confidence_floor"` - BoostThreshold int `yaml:"boost_threshold"` - BoostAmount float64 `yaml:"boost_amount"` - DiaryPath string `yaml:"diary_path"` - ReportPath string `yaml:"report_path"` - BehavioralThreshold int `yaml:"behavioral_threshold"` - BehavioralLookbackDays int `yaml:"behavioral_lookback_days"` - AutoLinkThreshold float64 `yaml:"auto_link_threshold"` - StaleProcedureDays int `yaml:"stale_procedure_days"` - OllamaURL string `yaml:"ollama_url"` - Model string `yaml:"model"` - // ModelTimeout is the timeout for each LLM call during REM behavioral insight generation. - // Parsed as a Go duration string (e.g. "5m", "120s"). Defaults to "5m". - ModelTimeout string `yaml:"model_timeout"` -} - -// SkillPatchConfig holds skill patch settings. -type SkillPatchConfig struct { - // Dir is the root directory for skill files. Defaults to paths.GetSkillDir() if empty. - Dir string `yaml:"dir"` + SimilarityThreshold float64 `yaml:"similarity_threshold"` + DecayRate float64 `yaml:"decay_rate"` + DecayIntervalDays int `yaml:"decay_interval_days"` + DecayFloor float64 `yaml:"decay_floor"` + ConfidenceFloor float64 `yaml:"confidence_floor"` + BoostThreshold int `yaml:"boost_threshold"` + BoostAmount float64 `yaml:"boost_amount"` + DiaryPath string `yaml:"diary_path"` + ReportPath string `yaml:"report_path"` + AutoLinkThreshold float64 `yaml:"auto_link_threshold"` + StaleProcedureDays int `yaml:"stale_procedure_days"` } // Config holds the full LLMem configuration. type Config struct { - Memory MemoryConfig `yaml:"memory"` - Dream DreamConfig `yaml:"dream"` - SkillPatch SkillPatchConfig `yaml:"skill_patch"` + Memory MemoryConfig `yaml:"memory"` + Dream DreamConfig `yaml:"dream"` } // MemoryConfig holds memory store settings. type MemoryConfig struct { - DBPath string `yaml:"db"` - OllamaURL string `yaml:"ollama_url"` - EmbedModel string `yaml:"embed_model"` - ExtractModel string `yaml:"extract_model"` - ContextBudget int `yaml:"context_budget"` - AutoExtract bool `yaml:"auto_extract"` - MaxFileSize int64 `yaml:"max_file_size"` - // CallModelTimeout is the timeout for LLM calls in introspect/learn. - // Parsed as a Go duration string (e.g. "5m", "120s"). Defaults to "5m". - CallModelTimeout string `yaml:"call_model_timeout"` + DBPath string `yaml:"db"` + OllamaURL string `yaml:"ollama_url"` + EmbedModel string `yaml:"embed_model"` + ExtractModel string `yaml:"extract_model"` + ContextBudget int `yaml:"context_budget"` + AutoExtract bool `yaml:"auto_extract"` + MaxFileSize int64 `yaml:"max_file_size"` } // fmtErr wraps an error with the "llmem: config:" domain prefix. @@ -84,33 +63,26 @@ func fmtErr(format string, args ...any) error { func DefaultConfig() Config { return Config{ Memory: MemoryConfig{ - DBPath: paths.GetDBPath(), - OllamaURL: "http://localhost:11434", - EmbedModel: "nomic-embed-text", - ExtractModel: "glm-5.1:cloud", - ContextBudget: 4000, - AutoExtract: true, - MaxFileSize: 10 * 1024 * 1024, - CallModelTimeout: "5m", + DBPath: paths.GetDBPath(), + OllamaURL: "http://localhost:11434", + EmbedModel: "nomic-embed-text", + ExtractModel: "glm-5.1:cloud", + ContextBudget: 4000, + AutoExtract: true, + MaxFileSize: 10 * 1024 * 1024, }, Dream: DreamConfig{ - SimilarityThreshold: 0.92, - DecayRate: 0.05, - DecayIntervalDays: 30, - DecayFloor: 0.3, - ConfidenceFloor: 0.3, - BoostThreshold: 5, - BoostAmount: 0.05, - DiaryPath: paths.GetDreamDiaryPath(), - ReportPath: paths.GetDreamReportPath(), - BehavioralThreshold: 3, - BehavioralLookbackDays: 30, - AutoLinkThreshold: 0.85, - StaleProcedureDays: 30, - ModelTimeout: "5m", - }, - SkillPatch: SkillPatchConfig{ - Dir: paths.GetSkillDir(), + SimilarityThreshold: 0.92, + DecayRate: 0.05, + DecayIntervalDays: 30, + DecayFloor: 0.3, + ConfidenceFloor: 0.3, + BoostThreshold: 5, + BoostAmount: 0.05, + DiaryPath: paths.GetDreamDiaryPath(), + ReportPath: paths.GetDreamReportPath(), + AutoLinkThreshold: 0.85, + StaleProcedureDays: 30, }, } } @@ -167,72 +139,22 @@ func (c *Config) OllamaURL() (string, error) { return validated, nil } -// CallModelTimeoutDuration parses the CallModelTimeout config value as a time.Duration. -// Returns the default of 5 minutes if the value is empty or invalid, logging a warning. -func (c *Config) CallModelTimeoutDuration() time.Duration { - if c.Memory.CallModelTimeout == "" { - return 5 * time.Minute - } - parsed, err := time.ParseDuration(c.Memory.CallModelTimeout) - if err != nil { - slog.Warn("llmem: config: invalid call_model_timeout, using default 5m", "value", c.Memory.CallModelTimeout, "error", err) - return 5 * time.Minute - } - return parsed -} - // DreamerConfig returns a dream.DreamerConfig populated from the config. // Maps DreamConfig fields to their corresponding DreamerConfig fields. // Store must be set by the caller before passing to dream.NewDreamer. -// If OllamaURL is configured, it is passed as BaseURL so Dreamer can attempt -// to create an OllamaClient. If OllamaClient creation fails inside -// dream.NewDreamer, behavioral insights fall back to count-based summaries. func (c *Config) DreamerConfig() dream.DreamerConfig { - ollamaURL := c.Dream.OllamaURL - if ollamaURL == "" { - ollamaURL = c.Memory.OllamaURL - } - if ollamaURL == "" { - ollamaURL = "http://localhost:11434" - } - - model := c.Dream.Model - if model == "" { - model = c.Memory.ExtractModel - } - if model == "" { - model = "glm-5.1:cloud" - } - - // Parse dream model timeout from config - var modelTimeout time.Duration - if c.Dream.ModelTimeout != "" { - parsed, err := time.ParseDuration(c.Dream.ModelTimeout) - if err != nil { - slog.Warn("llmem: config: invalid dream model_timeout, using default 5m", "value", c.Dream.ModelTimeout, "error", err) - modelTimeout = 5 * time.Minute - } else { - modelTimeout = parsed - } - } - return dream.DreamerConfig{ - SimilarityThreshold: c.Dream.SimilarityThreshold, - DecayRate: c.Dream.DecayRate, - DecayIntervalDays: c.Dream.DecayIntervalDays, - DecayFloor: c.Dream.DecayFloor, - ConfidenceFloor: c.Dream.ConfidenceFloor, - BoostThreshold: c.Dream.BoostThreshold, - BoostAmount: c.Dream.BoostAmount, - AutoLinkThreshold: c.Dream.AutoLinkThreshold, - BehavioralThreshold: c.Dream.BehavioralThreshold, - BehavioralLookbackDays: c.Dream.BehavioralLookbackDays, - StaleProcedureDays: c.Dream.StaleProcedureDays, - DiaryPath: c.Dream.DiaryPath, - ReportPath: c.Dream.ReportPath, - BaseURL: ollamaURL, - Model: model, - ModelTimeout: modelTimeout, + SimilarityThreshold: c.Dream.SimilarityThreshold, + DecayRate: c.Dream.DecayRate, + DecayIntervalDays: c.Dream.DecayIntervalDays, + DecayFloor: c.Dream.DecayFloor, + ConfidenceFloor: c.Dream.ConfidenceFloor, + BoostThreshold: c.Dream.BoostThreshold, + BoostAmount: c.Dream.BoostAmount, + AutoLinkThreshold: c.Dream.AutoLinkThreshold, + StaleProcedureDays: c.Dream.StaleProcedureDays, + DiaryPath: c.Dream.DiaryPath, + ReportPath: c.Dream.ReportPath, } } @@ -243,27 +165,6 @@ func (c *Config) DreamConfigResolved() DreamConfig { return c.Dream } -// NewSkillPatcher creates a SkillPatcher using the SkillPatch config. -// The store parameter is no longer required by SkillPatcher but is retained -// for callers that check store availability before deciding to patch skills. -// Returns nil without error if ms is nil (graceful degradation for callers). -func (c *Config) NewSkillPatcher(ms *store.MemoryStore) (*skillpatch.SkillPatcher, error) { - if ms == nil { - // Graceful degradation: callers use nil check to skip patching - // when no store is available (e.g., dream cmd without a database). - return nil, nil - } - - skillDir := c.SkillPatch.Dir - if skillDir == "" { - skillDir = paths.GetSkillDir() - } - - return skillpatch.NewSkillPatcher(skillpatch.SkillPatchConfig{ - SkillDir: skillDir, - }) -} - // WriteConfigYAML writes config as YAML to the given path with 0600 permissions. // Creates parent directories with 0700 permissions. // Returns false if file exists and force is false. diff --git a/internal/config/config_test.go b/internal/config/config_test.go index e431e29..79b814c 100644 --- a/internal/config/config_test.go +++ b/internal/config/config_test.go @@ -110,13 +110,7 @@ func TestConfig_DreamerConfig(t *testing.T) { if dc.AutoLinkThreshold != 0.85 { t.Errorf("expected AutoLinkThreshold 0.85, got %f", dc.AutoLinkThreshold) } - if dc.BehavioralThreshold != 3 { - t.Errorf("expected BehavioralThreshold 3, got %d", dc.BehavioralThreshold) - } - if dc.BehavioralLookbackDays != 30 { - t.Errorf("expected BehavioralLookbackDays 30, got %d", dc.BehavioralLookbackDays) - } - // Store field should be nil (caller must set it) + // Store field should be nil (caller must set) if dc.Store != nil { t.Error("expected Store to be nil (caller must set)") } diff --git a/internal/dream/dream.go b/internal/dream/dream.go index 7f69b9c..40836bb 100644 --- a/internal/dream/dream.go +++ b/internal/dream/dream.go @@ -1,13 +1,11 @@ // Package dream provides the 3-phase dream consolidation system for LLMem. -// Phases: Light (deduplication), Deep (decay/boost/merge/auto-link), REM (themes/behavioral insights). +// Phases: Light (deduplication), Deep (decay/boost/merge/auto-link), REM (themes). package dream import ( "context" "fmt" "log/slog" - "maps" - "net/http" "os" "path/filepath" "regexp" @@ -16,11 +14,8 @@ import ( "sync" "time" - "github.com/MichielDean/LLMem/internal/ollama" "github.com/MichielDean/LLMem/internal/paths" - "github.com/MichielDean/LLMem/internal/skillpatch" "github.com/MichielDean/LLMem/internal/store" - "github.com/MichielDean/LLMem/internal/taxonomy" ) // Default configuration values. @@ -33,14 +28,7 @@ const ( defaultBoostThreshold = 5 defaultBoostAmount = 0.05 defaultAutoLinkThreshold = 0.85 - defaultBehavioralThreshold = 3 - defaultBehavioralLookbackDays = 30 defaultStaleProcedureDays = 30 - defaultDreamBaseURL = "http://localhost:11434" - defaultDreamModel = "glm-5.1:cloud" - defaultDreamModelTimeout = 5 * time.Minute - maxBehavioralSamples = 5 - maxSampleLength = 300 ) // DreamerConfig contains the configuration for creating a Dreamer. @@ -72,12 +60,6 @@ type DreamerConfig struct { // AutoLinkThreshold for auto-linking. Defaults to 0.85. AutoLinkThreshold float64 - // BehavioralThreshold for REM insights. Defaults to 3. - BehavioralThreshold int - - // BehavioralLookbackDays for REM insights. Defaults to 30. - BehavioralLookbackDays int - // StaleProcedureDays is the age threshold (in days) after which a procedure // memory with no recent access is considered stale and decays at double rate. // Defaults to 30. Zero value uses the default. @@ -88,29 +70,6 @@ type DreamerConfig struct { // ReportPath path for writing dream report. Defaults from paths.GetDreamReportPath(). ReportPath string - - // OllamaClient is an optional pre-configured OllamaClient. Takes precedence over BaseURL/HTTPClient. - // When nil, the constructor will attempt to create one from BaseURL. - OllamaClient *ollama.OllamaClient - - // BaseURL is the Ollama API base URL. Defaults to "http://localhost:11434". - // Only used when OllamaClient is nil. - BaseURL string - - // HTTPClient is an optional pre-configured HTTP client (for testing with httptest.NewServer). - // Only used when OllamaClient is nil. - HTTPClient *http.Client - - // Model is the name of the Ollama model to use for behavioral insight generation. - // Defaults to "glm-5.1:cloud". - Model string - - // ModelTimeout is the timeout for each LLM call during REM behavioral insight generation. - // Defaults to 5 minutes if zero. - ModelTimeout time.Duration - - // SkillPatcher is optional. When set, the REM phase validates patches. - SkillPatcher *skillpatch.SkillPatcher } // LightPhaseResult holds the results of the light (deduplication) phase. @@ -121,7 +80,7 @@ type LightPhaseResult struct { // DeepPhaseResult holds the results of the deep (decay/boost/merge) phase. type DeepPhaseResult struct { - DecayedCount int + DecayedCount int StaleProcedureDecayedCount int BoostedCount int InvalidatedCount int @@ -129,21 +88,11 @@ type DeepPhaseResult struct { AutoLinkedCount int } -// BehavioralInsight represents a behavioral pattern detected during REM phase. -type BehavioralInsight struct { - Category string - Count int - InsightID string - ContentSnippet string - Samples []string -} - // RemPhaseResult holds the results of the REM (reflect) phase. type RemPhaseResult struct { - TotalMemories int - ActiveMemories int - Themes []string - BehavioralInsights []BehavioralInsight + TotalMemories int + ActiveMemories int + Themes []string } // DreamResult holds the combined results of all dream phases. @@ -155,25 +104,19 @@ type DreamResult struct { // Dreamer performs 3-phase dream consolidation. type Dreamer struct { - store *store.MemoryStore - similarityThreshold float64 - decayRate float64 - decayIntervalDays int - decayFloor float64 - confidenceFloor float64 - boostThreshold int - boostAmount float64 - autoLinkThreshold float64 - behavioralThreshold int - behavioralLookbackDays int - staleProcedureDays int - diaryPath string - reportPath string - skillPatcher *skillpatch.SkillPatcher - ollama *ollama.OllamaClient - model string - modelTimeout time.Duration - mu sync.Mutex + store *store.MemoryStore + similarityThreshold float64 + decayRate float64 + decayIntervalDays int + decayFloor float64 + confidenceFloor float64 + boostThreshold int + boostAmount float64 + autoLinkThreshold float64 + staleProcedureDays int + diaryPath string + reportPath string + mu sync.Mutex } // fmtErr wraps an error with the "llmem: dream:" domain prefix. @@ -184,9 +127,6 @@ func fmtErr(format string, args ...any) error { // NewDreamer creates and initializes a Dreamer. // All config fields default to sensible values if zero. // The constructor leaves the dreamer in a fully usable state. -// If cfg.OllamaClient is nil, the constructor attempts to create one from cfg.BaseURL -// (defaulting to "http://localhost:11434"). If that also fails, ollama is nil and -// behavioral insights fall back to count-based summaries without erroring. func NewDreamer(cfg DreamerConfig) (*Dreamer, error) { if cfg.Store == nil { return nil, fmtErr("store is required") @@ -224,14 +164,6 @@ func NewDreamer(cfg DreamerConfig) (*Dreamer, error) { if autoLinkThreshold == 0 { autoLinkThreshold = defaultAutoLinkThreshold } - behavioralThreshold := cfg.BehavioralThreshold - if behavioralThreshold == 0 { - behavioralThreshold = defaultBehavioralThreshold - } - behavioralLookbackDays := cfg.BehavioralLookbackDays - if behavioralLookbackDays == 0 { - behavioralLookbackDays = defaultBehavioralLookbackDays - } staleProcedureDays := cfg.StaleProcedureDays if staleProcedureDays == 0 { staleProcedureDays = defaultStaleProcedureDays @@ -244,59 +176,20 @@ func NewDreamer(cfg DreamerConfig) (*Dreamer, error) { if reportPath == "" { reportPath = paths.GetDreamReportPath() } - model := cfg.Model - if model == "" { - model = defaultDreamModel - } - modelTimeout := cfg.ModelTimeout - if modelTimeout == 0 { - modelTimeout = defaultDreamModelTimeout - } - - // Wire OllamaClient following ExtractionEngine pattern. - // If cfg.OllamaClient is provided, use it directly. - // Otherwise, create one from cfg.BaseURL (with validation) and cfg.HTTPClient. - var client *ollama.OllamaClient - if cfg.OllamaClient != nil { - client = cfg.OllamaClient - } else { - baseURL := cfg.BaseURL - if baseURL == "" { - baseURL = defaultDreamBaseURL - } - ollamaCfg := ollama.OllamaClientConfig{ - BaseURL: baseURL, - HTTPClient: cfg.HTTPClient, - } - // Best-effort: if we cannot create the client, dream still functions - // without LLM-generated insights (graceful degradation). - created, err := ollama.NewOllamaClient(ollamaCfg) - if err != nil { - slog.Debug("llmem: dream: could not create Ollama client, behavioral insights will use count-based fallback", "error", err) - } else { - client = created - } - } return &Dreamer{ - store: cfg.Store, - similarityThreshold: similarityThreshold, - decayRate: decayRate, - decayIntervalDays: decayIntervalDays, - decayFloor: decayFloor, - confidenceFloor: confidenceFloor, - boostThreshold: boostThreshold, - boostAmount: boostAmount, - autoLinkThreshold: autoLinkThreshold, - behavioralThreshold: behavioralThreshold, - behavioralLookbackDays: behavioralLookbackDays, - staleProcedureDays: staleProcedureDays, - diaryPath: diaryPath, - reportPath: reportPath, - skillPatcher: cfg.SkillPatcher, - ollama: client, - model: model, - modelTimeout: modelTimeout, + store: cfg.Store, + similarityThreshold: similarityThreshold, + decayRate: decayRate, + decayIntervalDays: decayIntervalDays, + decayFloor: decayFloor, + confidenceFloor: confidenceFloor, + boostThreshold: boostThreshold, + boostAmount: boostAmount, + autoLinkThreshold: autoLinkThreshold, + staleProcedureDays: staleProcedureDays, + diaryPath: diaryPath, + reportPath: reportPath, }, nil } @@ -329,13 +222,6 @@ func (d *Dreamer) Run(ctx context.Context, apply bool, phase string) (*DreamResu } } - // Validate patches if SkillPatcher is configured - if apply && result.Rem != nil && len(result.Rem.BehavioralInsights) > 0 && d.skillPatcher != nil { - if err := d.validatePatches(ctx, result); err != nil { - slog.Warn("llmem: dream: failed to validate patches", "error", err) - } - } - return result, nil } @@ -493,7 +379,7 @@ func (d *Dreamer) deepPhase(ctx context.Context, apply bool, mergeCandidates []* return result } -// remPhase extracts themes and behavioral insights. +// remPhase extracts themes from active memories. func (d *Dreamer) remPhase(ctx context.Context, apply bool) *RemPhaseResult { result := &RemPhaseResult{} @@ -511,8 +397,6 @@ func (d *Dreamer) remPhase(ctx context.Context, apply bool) *RemPhaseResult { result.Themes = d.extractThemes(ctx) - result.BehavioralInsights = d.extractBehavioralInsights(ctx, apply) - return result } @@ -602,150 +486,6 @@ func (d *Dreamer) extractThemes(ctx context.Context) []string { return themes } -// extractBehavioralInsights detects recurring self_assessment patterns. -// When Ollama is available, generates actionable procedural content via LLM. -// Falls back to count-based summaries when Ollama is unavailable. -func (d *Dreamer) extractBehavioralInsights(ctx context.Context, apply bool) []BehavioralInsight { - cutoff := time.Now().UTC().AddDate(0, 0, -d.behavioralLookbackDays).Format(time.RFC3339) - - selfAssessments, err := d.store.Search(ctx, store.SearchParams{ - Type: "self_assessment", - ValidOnly: true, - Limit: 500, - }) - if err != nil { - slog.Error("llmem: dream: REM self_assessment search failed", "error", err) - return []BehavioralInsight{} - } - - // Filter recent self_assessments by category and collect up to maxBehavioralSamples per category - categoryCounts := map[string]int{} - categorySamples := map[string][]string{} - for _, m := range selfAssessments { - if m.UpdatedAt != "" && m.UpdatedAt >= cutoff { - content := m.Content - // Use taxonomy.ParseSelfAssessmentField for exact field matching - // to avoid double-counting when a category name appears in prose. - parsedCat := taxonomy.ParseSelfAssessmentField(content, "Category") - if parsedCat == "" { - continue - } - // Only count categories that are in the error taxonomy - if _, ok := taxonomy.ErrorTaxonomy[parsedCat]; !ok { - continue - } - categoryCounts[parsedCat]++ - samples := categorySamples[parsedCat] - if len(samples) < maxBehavioralSamples { - snippet := content - if len(snippet) > maxSampleLength { - snippet = snippet[:maxSampleLength] - } - categorySamples[parsedCat] = append(samples, snippet) - } - } - } - - // Determine if Ollama is available for LLM-generated insights - useLLM := false - if d.ollama != nil { - availCtx, availCancel := context.WithTimeout(ctx, 5*time.Second) - useLLM = d.ollama.IsAvailable(availCtx) - availCancel() - } - if !useLLM { - slog.Warn("llmem: dream: REM behavioral insight using count-based fallback", "reason", "ollama unavailable") - } - - var insights []BehavioralInsight - for cat, count := range categoryCounts { - if count >= d.behavioralThreshold { - contentSnippet := "" - if useLLM { - prompt := buildBehavioralInsightPrompt(cat, count, d.behavioralLookbackDays, categorySamples[cat]) - llmCtx, llmCancel := context.WithTimeout(ctx, d.modelTimeout) - response, llmErr := d.ollama.Generate(llmCtx, prompt, d.model) - llmCancel() - if llmErr != nil { - slog.Error("llmem: dream: REM behavioral insight LLM call failed, using fallback", "category", cat, "error", llmErr) - } else if response != "" { - contentSnippet = response - } - } - - // Fallback: count-based summary (preserves backward compatibility) - if contentSnippet == "" { - contentSnippet = fmt.Sprintf("Behavioral insight: %d occurrences of %s category in the last %d days. %s", count, cat, d.behavioralLookbackDays, joinSamples(categorySamples[cat])) - } - - insightID := "" - if apply { - id, err := d.store.Add(ctx, store.AddParams{ - Type: "procedure", - Content: contentSnippet, - Source: "dream_rem", - Confidence: 0.7, - Metadata: map[string]any{"proposed": true, "source": "dream_rem", "category": cat, "occurrences": count}, - }) - if err != nil { - slog.Debug("llmem: dream: failed to store REM insight", "error", err) - } else { - insightID = id - } - } - insights = append(insights, BehavioralInsight{ - Category: cat, - Count: count, - InsightID: insightID, - ContentSnippet: contentSnippet, - Samples: categorySamples[cat], - }) - } - } - - sort.Slice(insights, func(i, j int) bool { return insights[i].Category < insights[j].Category }) - return insights -} - -// buildBehavioralInsightPrompt builds an LLM prompt for generating an actionable -// behavioral rule for the given category. It does NOT call the LLM — it only -// constructs the prompt string. -func buildBehavioralInsightPrompt(category string, count int, lookbackDays int, samples []string) string { - description, ok := taxonomy.ErrorTaxonomy[category] - if !ok { - description = category - } - - var sb strings.Builder - sb.WriteString(fmt.Sprintf("You are a software engineering coach. Based on the following recurring pattern found in self-assessments, generate a specific, actionable behavioral rule.\n\n")) - sb.WriteString(fmt.Sprintf("Category: %s\n", category)) - sb.WriteString(fmt.Sprintf("Definition: %s\n", description)) - sb.WriteString(fmt.Sprintf("Occurrences in the last %d days: %d\n\n", lookbackDays, count)) - - if len(samples) > 0 { - sb.WriteString("Representative self-assessment examples:\n") - for i, s := range samples { - sb.WriteString(fmt.Sprintf("%d. %s\n", i+1, s)) - } - sb.WriteString("\n") - } - - sb.WriteString("Generate a specific, actionable procedural rule that includes:\n") - sb.WriteString("1. A \"Do\" list of concrete behavioral directives (not vague advice like 'be more careful')\n") - sb.WriteString("2. A \"Verify\" step that can be checked later (e.g., 'run llmem introspect --category X to confirm rate drops')\n") - sb.WriteString("\nKeep the response under 200 words. Be specific and practical.\n") - - return sb.String() -} - -// joinSamples concatenates sample strings for the fallback count-based format. -func joinSamples(samples []string) string { - if len(samples) == 0 { - return "" - } - return strings.Join(samples, "; ") -} - // WriteDiary writes the dream diary as markdown to the configured path. // Uses sync.Mutex for concurrency safety within the process. func (d *Dreamer) WriteDiary(result *DreamResult) error { @@ -798,13 +538,6 @@ func (d *Dreamer) WriteDiary(result *DreamResult) error { } sb.WriteString("\n") } - if len(result.Rem.BehavioralInsights) > 0 { - sb.WriteString("### Behavioral Insights\n\n") - for _, insight := range result.Rem.BehavioralInsights { - sb.WriteString(fmt.Sprintf("- **%s** (×%d): %s\n", insight.Category, insight.Count, insight.ContentSnippet)) - } - sb.WriteString("\n") - } } if err := os.WriteFile(diaryPath, []byte(sb.String()), 0600); err != nil { @@ -814,114 +547,6 @@ func (d *Dreamer) WriteDiary(result *DreamResult) error { return nil } -// validatePatches calls SkillPatcher.ValidatePatch for each behavioral insight category. -// It compares self_assessment memory counts before and after the most recent patch to -// determine whether the patch was effective. Effective patches boost procedure memories; -// ineffective patches flag them for review. -// The patch validator is specific to the dream REM phase and will not be reused. -func (d *Dreamer) validatePatches(ctx context.Context, result *DreamResult) error { - if d.skillPatcher == nil { - slog.Debug("llmem: dream: no SkillPatcher configured, skipping patch validation") - return nil - } - - if result.Rem == nil || len(result.Rem.BehavioralInsights) == 0 { - return nil - } - - cutoff := time.Now().UTC().AddDate(0, 0, -d.behavioralLookbackDays).Format(time.RFC3339) - - // Count self_assessment memories per category from before the current dream run - selfAssessments, err := d.store.Search(ctx, store.SearchParams{ - Type: "self_assessment", - ValidOnly: true, - Limit: 500, - }) - if err != nil { - slog.Error("llmem: dream: patch validation search failed", "error", err) - return fmtErr("patch validation: search: %w", err) - } - - // Build before-counts per category (memories older than cutoff) - beforeCounts := map[string]int{} - for _, m := range selfAssessments { - if m.UpdatedAt != "" && m.UpdatedAt < cutoff { - cat := taxonomy.ParseSelfAssessmentField(m.Content, "Category") - if cat != "" { - beforeCounts[cat]++ - } - } - } - - // Build after-counts per category (memories at or after cutoff = current run) - afterCounts := map[string]int{} - for _, m := range selfAssessments { - if m.UpdatedAt != "" && m.UpdatedAt >= cutoff { - cat := taxonomy.ParseSelfAssessmentField(m.Content, "Category") - if cat != "" { - afterCounts[cat]++ - } - } - } - - // Validate patches for each behavioral insight category - for _, insight := range result.Rem.BehavioralInsights { - beforeCount := beforeCounts[insight.Category] - afterCount := afterCounts[insight.Category] - - validation := skillpatch.ValidatePatch(insight.Category, beforeCount, afterCount) - - if validation.Effective { - slog.Info("llmem: dream: patch effective", "category", insight.Category, "before", beforeCount, "after", afterCount) - // Boost the procedure memory created for this insight - if insight.InsightID != "" { - boostVal := float64(insight.Count) * d.boostAmount - if boostVal > 1.0 { - boostVal = 1.0 - } - // Boost to at least 0.7 (confidence floor for effective patches) - boosted := boostVal + 0.7 - if boosted > 1.0 { - boosted = 1.0 - } - _, err := d.store.Update(ctx, store.UpdateParams{ - ID: insight.InsightID, - Confidence: &boosted, - }) - if err != nil { - slog.Debug("llmem: dream: failed to boost effective procedure", "id", insight.InsightID, "error", err) - } - } - } - - if validation.Flagged { - slog.Warn("llmem: dream: patch flagged for review — error rate not decreasing", "category", insight.Category, "before", beforeCount, "after", afterCount) - // Flag the procedure memory for review — merge into existing metadata, don't replace - if insight.InsightID != "" { - existing, err := d.store.Get(ctx, insight.InsightID, false) - if err != nil { - slog.Debug("llmem: dream: failed to fetch procedure for metadata merge", "id", insight.InsightID, "error", err) - } else if existing != nil { - merged := maps.Clone(existing.Metadata) - if merged == nil { - merged = map[string]any{} - } - merged["flagged_for_review"] = true - _, err := d.store.Update(ctx, store.UpdateParams{ - ID: insight.InsightID, - Metadata: merged, - }) - if err != nil { - slog.Debug("llmem: dream: failed to flag procedure for review", "id", insight.InsightID, "error", err) - } - } - } - } - } - - return nil -} - // GenerateDreamReport generates an HTML dream report at the given path. // Validates reportPath via paths.ValidateWritePath. func (d *Dreamer) GenerateDreamReport(result *DreamResult, reportPath string) error { @@ -976,13 +601,6 @@ func buildReportHTML(result *DreamResult) string { } sb.WriteString("\n") } - if len(result.Rem.BehavioralInsights) > 0 { - sb.WriteString("

Behavioral Insights

\n") - } } sb.WriteString("\n") diff --git a/internal/dream/dream_test.go b/internal/dream/dream_test.go index cfd9e71..066fc67 100644 --- a/internal/dream/dream_test.go +++ b/internal/dream/dream_test.go @@ -2,20 +2,12 @@ package dream import ( "context" - "encoding/json" - "fmt" - "net/http" - "net/http/httptest" "os" "path/filepath" - "strings" "testing" "time" - "github.com/MichielDean/LLMem/internal/ollama" - "github.com/MichielDean/LLMem/internal/skillpatch" "github.com/MichielDean/LLMem/internal/store" - "github.com/MichielDean/LLMem/internal/taxonomy" ) func newTestStore(t *testing.T) *store.MemoryStore { @@ -96,15 +88,12 @@ func TestDreamer_LightPhase(t *testing.T) { } func TestDreamer_LightPhase_PropagatesContext(t *testing.T) { - // Verify that lightPhase uses the context from Run, not context.Background(). - // A cancelled context should propagate through to ConsolidateDuplicates. ms := newTestStore(t) d, err := NewDreamer(DreamerConfig{Store: ms}) if err != nil { t.Fatalf("NewDreamer: %v", err) } - // Add a memory so the phase has something to process _, err = ms.Add(context.Background(), store.AddParams{ Type: "fact", Content: "context propagation test", @@ -115,13 +104,9 @@ func TestDreamer_LightPhase_PropagatesContext(t *testing.T) { t.Fatalf("Add: %v", err) } - // Run with a cancelled context — the light phase should still complete - // (ConsolidateDuplicates on SQLite is fast), but the key contract is - // that ctx is passed through, not discarded. ctx, cancel := context.WithCancel(context.Background()) cancel() - // Even with cancelled context, Run should not panic or hang result, err := d.Run(ctx, false, "light") if err != nil { t.Fatalf("Run with cancelled context: %v", err) @@ -138,7 +123,6 @@ func TestDreamer_DeepPhase(t *testing.T) { t.Fatalf("NewDreamer: %v", err) } - // Add a test memory _, err = ms.Add(context.Background(), store.AddParams{ Type: "fact", Content: "test fact", @@ -165,7 +149,6 @@ func TestDreamer_RemPhase(t *testing.T) { t.Fatalf("NewDreamer: %v", err) } - // fact is already registered via default types _, err = ms.Add(context.Background(), store.AddParams{ Type: "fact", Content: "test content for REM phase", @@ -231,7 +214,6 @@ func TestDreamer_DryRun(t *testing.T) { if err != nil { t.Fatalf("Run dry: %v", err) } - // Dry run should return results but not persist changes if result == nil { t.Fatal("expected non-nil result") } @@ -253,10 +235,9 @@ func TestGenerateDreamReport(t *testing.T) { Light: &LightPhaseResult{DuplicatePairs: 3}, Deep: &DeepPhaseResult{DecayedCount: 2, BoostedCount: 4, MergedCount: 1, AutoLinkedCount: 5}, Rem: &RemPhaseResult{ - TotalMemories: 50, - ActiveMemories: 40, - Themes: []string{"10 memories about fact"}, - BehavioralInsights: []BehavioralInsight{{Category: "ERROR_HANDLING", Count: 5, ContentSnippet: "test"}}, + TotalMemories: 50, + ActiveMemories: 40, + Themes: []string{"10 memories about fact"}, }, } @@ -292,8 +273,6 @@ func containsStr(s, substr string) bool { } func TestDreamer_WriteDiary_UsesResolvedPath(t *testing.T) { - // Verify that WriteDiary uses the resolved (validated) path, not the raw path. - // This is a contract test — the file should be at the exact path returned by ValidateWritePath. ms := newTestStore(t) dir := t.TempDir() diaryPath := filepath.Join(dir, "dream_diary.md") @@ -315,14 +294,12 @@ func TestDreamer_WriteDiary_UsesResolvedPath(t *testing.T) { t.Fatalf("WriteDiary: %v", err) } - // Verify file was written at the expected path if _, err := os.Stat(diaryPath); err != nil { t.Errorf("expected diary file at %s: %v", diaryPath, err) } } func TestDreamer_GenerateDreamReport_UsesResolvedPath(t *testing.T) { - // Verify that GenerateDreamReport uses the resolved (validated) path. ms := newTestStore(t) dir := t.TempDir() reportPath := filepath.Join(dir, "report.html") @@ -344,7 +321,6 @@ func TestDreamer_GenerateDreamReport_UsesResolvedPath(t *testing.T) { t.Fatalf("GenerateDreamReport: %v", err) } - // Verify file was written at the expected path if _, err := os.Stat(reportPath); err != nil { t.Errorf("expected report file at %s: %v", reportPath, err) } @@ -370,36 +346,6 @@ func setTimestamps(t *testing.T, ms *store.MemoryStore, id, createdAt, accessedA } } -// newTestOllamaClient creates an OllamaClient pointing at the given httptest server. -// This mirrors the pattern used in extract_test.go. -func newTestOllamaClient(t *testing.T, server *httptest.Server) *ollama.OllamaClient { - t.Helper() - client, err := ollama.NewOllamaClient(ollama.OllamaClientConfig{ - BaseURL: server.URL, - HTTPClient: server.Client(), - }) - if err != nil { - t.Fatalf("newTestOllamaClient: %v", err) - } - return client -} - -// addSelfAssessment adds a self_assessment memory with the given category and content. -func addSelfAssessment(t *testing.T, ms *store.MemoryStore, category, content string) string { - t.Helper() - fullContent := "Category: " + category + "\n" + content - id, err := ms.Add(context.Background(), store.AddParams{ - Type: "self_assessment", - Content: fullContent, - Source: "test", - Confidence: 0.8, - }) - if err != nil { - t.Fatalf("Add self_assessment: %v", err) - } - return id -} - func TestNewDreamer_DefaultStaleProcedureDays(t *testing.T) { ms := newTestStore(t) d, err := NewDreamer(DreamerConfig{Store: ms}) @@ -409,90 +355,14 @@ func TestNewDreamer_DefaultStaleProcedureDays(t *testing.T) { if d.staleProcedureDays != defaultStaleProcedureDays { t.Errorf("expected default stale procedure days %d, got %d", defaultStaleProcedureDays, d.staleProcedureDays) } - if d.model != defaultDreamModel { - t.Errorf("expected default model %q, got %q", defaultDreamModel, d.model) - } -} - -// TestNewDreamer_WithOllamaConfig verifies that OllamaClient and Model are properly -// wired through the constructor. -func TestNewDreamer_WithOllamaConfig(t *testing.T) { - ms := newTestStore(t) - - // Create a test Ollama server - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "test-model"}}} - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - ollamaClient := newTestOllamaClient(t, server) - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: ollamaClient, - Model: "custom-model", - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - if d.ollama == nil { - t.Error("expected ollama client to be set") - } - if d.model != "custom-model" { - t.Errorf("expected model 'custom-model', got %q", d.model) - } - - // Verify defaults are still applied for other fields - if d.similarityThreshold != defaultSimilarityThreshold { - t.Errorf("expected default similarity threshold %f, got %f", defaultSimilarityThreshold, d.similarityThreshold) - } - - // Test with BaseURL instead of OllamaClient - d2, err := NewDreamer(DreamerConfig{ - Store: ms, - BaseURL: server.URL, - HTTPClient: server.Client(), - Model: "another-model", - }) - if err != nil { - t.Fatalf("NewDreamer with BaseURL: %v", err) - } - if d2.ollama == nil { - t.Error("expected ollama client to be created from BaseURL") - } - if d2.model != "another-model" { - t.Errorf("expected model 'another-model', got %q", d2.model) - } - - // Test defaults: no OllamaClient, no BaseURL — should create client from default URL - d3, err := NewDreamer(DreamerConfig{Store: ms}) - if err != nil { - t.Fatalf("NewDreamer defaults: %v", err) - } - if d3.model != defaultDreamModel { - t.Errorf("expected default model %q, got %q", defaultDreamModel, d3.model) - } - // The client should have been created from defaultBaseURL - if d3.ollama == nil { - t.Error("expected ollama client created from default URL") - } } func TestDreamer_DeepPhase_StaleProcedureDoubleDecay(t *testing.T) { - // Verify that a procedure memory older than staleProcedureDays decays at 2x the normal rate. ms := newTestStore(t) - // Use DecayIntervalDays: 1 so memories older than 1 day are eligible for decay. - // Use StaleProcedureDays: 1 so procedures not accessed within 1 day are stale. d, err := NewDreamer(DreamerConfig{ Store: ms, - DecayIntervalDays: 1, + DecayIntervalDays: 1, StaleProcedureDays: 1, DecayRate: 0.05, DecayFloor: 0.1, @@ -502,7 +372,6 @@ func TestDreamer_DeepPhase_StaleProcedureDoubleDecay(t *testing.T) { t.Fatalf("NewDreamer: %v", err) } - // Add a procedure memory with high confidence procID, err := ms.Add(context.Background(), store.AddParams{ Type: "procedure", Content: "stale procedure test", @@ -513,11 +382,9 @@ func TestDreamer_DeepPhase_StaleProcedureDoubleDecay(t *testing.T) { t.Fatalf("Add procedure: %v", err) } - // Set created_at to 60 days ago and leave accessed_at empty (never accessed = stale) oldCreated := time.Now().UTC().AddDate(0, 0, -60).Format(time.RFC3339) setTimestamps(t, ms, procID, oldCreated, "") - // Run deep phase with apply=true result, err := d.Run(context.Background(), true, "deep") if err != nil { t.Fatalf("Run deep: %v", err) @@ -526,8 +393,6 @@ func TestDreamer_DeepPhase_StaleProcedureDoubleDecay(t *testing.T) { t.Fatal("expected Deep result") } - // The stale procedure should have been decayed at 2x rate: - // 0.9 - (0.05 * 2) = 0.8 if result.Deep.StaleProcedureDecayedCount != 1 { t.Errorf("expected StaleProcedureDecayedCount=1, got %d", result.Deep.StaleProcedureDecayedCount) } @@ -535,12 +400,10 @@ func TestDreamer_DeepPhase_StaleProcedureDoubleDecay(t *testing.T) { t.Errorf("expected DecayedCount >= 1, got %d", result.Deep.DecayedCount) } - // Verify the actual confidence on the memory mem, err := ms.Get(context.Background(), procID, false) if err != nil { t.Fatalf("Get: %v", err) } - // Should be 0.9 - 0.10 = 0.8 expectedConf := 0.9 - (0.05 * 2) if mem.Confidence < expectedConf-0.01 || mem.Confidence > expectedConf+0.01 { t.Errorf("expected stale procedure confidence ~%f, got %f", expectedConf, mem.Confidence) @@ -548,12 +411,11 @@ func TestDreamer_DeepPhase_StaleProcedureDoubleDecay(t *testing.T) { } func TestDreamer_DeepPhase_StaleProcedureNotDoubleDecayedWhenFresh(t *testing.T) { - // Verify that a procedure memory accessed recently (within staleProcedureDays) is NOT double-decayed. ms := newTestStore(t) d, err := NewDreamer(DreamerConfig{ Store: ms, - DecayIntervalDays: 1, + DecayIntervalDays: 1, StaleProcedureDays: 1, DecayRate: 0.05, DecayFloor: 0.1, @@ -563,7 +425,6 @@ func TestDreamer_DeepPhase_StaleProcedureNotDoubleDecayedWhenFresh(t *testing.T) t.Fatalf("NewDreamer: %v", err) } - // Add a procedure memory with high confidence procID, err := ms.Add(context.Background(), store.AddParams{ Type: "procedure", Content: "fresh procedure test", @@ -574,13 +435,10 @@ func TestDreamer_DeepPhase_StaleProcedureNotDoubleDecayedWhenFresh(t *testing.T) t.Fatalf("Add procedure: %v", err) } - // Set created_at to 60 days ago but accessed_at to just now (freshly accessed) oldCreated := time.Now().UTC().AddDate(0, 0, -60).Format(time.RFC3339) recentAccessed := time.Now().UTC().Format(time.RFC3339) setTimestamps(t, ms, procID, oldCreated, recentAccessed) - // Run deep phase with apply=true — but the memory should be skipped entirely - // because it was recently accessed (accessedAt > cutoff) result, err := d.Run(context.Background(), true, "deep") if err != nil { t.Fatalf("Run deep: %v", err) @@ -589,12 +447,10 @@ func TestDreamer_DeepPhase_StaleProcedureNotDoubleDecayedWhenFresh(t *testing.T) t.Fatal("expected Deep result") } - // Recently accessed memory should not be decayed at all if result.Deep.StaleProcedureDecayedCount != 0 { t.Errorf("expected StaleProcedureDecayedCount=0 for freshly accessed procedure, got %d", result.Deep.StaleProcedureDecayedCount) } - // Verify confidence unchanged mem, err := ms.Get(context.Background(), procID, false) if err != nil { t.Fatalf("Get: %v", err) @@ -605,13 +461,11 @@ func TestDreamer_DeepPhase_StaleProcedureNotDoubleDecayedWhenFresh(t *testing.T) } func TestDreamer_DeepPhase_NonProcedureNotDoubleDecayed(t *testing.T) { - // Verify that a non-procedure memory (e.g., type "fact") older than staleProcedureDays - // decays at the normal rate, not double. ms := newTestStore(t) d, err := NewDreamer(DreamerConfig{ Store: ms, - DecayIntervalDays: 1, + DecayIntervalDays: 1, StaleProcedureDays: 1, DecayRate: 0.05, DecayFloor: 0.1, @@ -621,7 +475,6 @@ func TestDreamer_DeepPhase_NonProcedureNotDoubleDecayed(t *testing.T) { t.Fatalf("NewDreamer: %v", err) } - // Add a fact memory with high confidence factID, err := ms.Add(context.Background(), store.AddParams{ Type: "fact", Content: "stale fact test", @@ -632,11 +485,9 @@ func TestDreamer_DeepPhase_NonProcedureNotDoubleDecayed(t *testing.T) { t.Fatalf("Add fact: %v", err) } - // Set created_at to 60 days ago and leave accessed_at empty (never accessed) oldCreated := time.Now().UTC().AddDate(0, 0, -60).Format(time.RFC3339) setTimestamps(t, ms, factID, oldCreated, "") - // Run deep phase with apply=true result, err := d.Run(context.Background(), true, "deep") if err != nil { t.Fatalf("Run deep: %v", err) @@ -645,12 +496,10 @@ func TestDreamer_DeepPhase_NonProcedureNotDoubleDecayed(t *testing.T) { t.Fatal("expected Deep result") } - // Non-procedure memory should NOT be counted as stale procedure double-decayed if result.Deep.StaleProcedureDecayedCount != 0 { t.Errorf("expected StaleProcedureDecayedCount=0 for non-procedure, got %d", result.Deep.StaleProcedureDecayedCount) } - // Verify the actual confidence - should decay at normal rate: 0.9 - 0.05 = 0.85 mem, err := ms.Get(context.Background(), factID, false) if err != nil { t.Fatalf("Get: %v", err) @@ -659,1010 +508,4 @@ func TestDreamer_DeepPhase_NonProcedureNotDoubleDecayed(t *testing.T) { if mem.Confidence < expectedConf-0.01 || mem.Confidence > expectedConf+0.01 { t.Errorf("expected non-procedure confidence ~%f (normal decay), got %f", expectedConf, mem.Confidence) } -} - -// TestDreamer_RemPhase_BehavioralInsight_WithLLM verifies that when Ollama is -// available and categories exceed the threshold, extractBehavioralInsights produces -// actionable procedural content via the LLM. -func TestDreamer_RemPhase_BehavioralInsight_WithLLM(t *testing.T) { - ms := newTestStore(t) - - // Add self_assessment memories with ERROR_HANDLING category (need >= 3) - for i := 0; i < 4; i++ { - addSelfAssessment(t, ms, "ERROR_HANDLING", "Missing error handling in HTTP call "+strings.Repeat("x", 20)) - } - - llmResponse := "After any external call (DB, HTTP, subprocess), add try/except with logging. Verify error paths execute by testing them. Check: llmem search ERROR_HANDLING --type self_assessment shows rate dropping." - - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "glm-5.1:cloud"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - resp := map[string]string{"response": llmResponse} - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: newTestOllamaClient(t, server), - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run rem: %v", err) - } - if result.Rem == nil { - t.Fatal("expected Rem result") - } - - insights := result.Rem.BehavioralInsights - if len(insights) == 0 { - t.Fatal("expected at least one behavioral insight for ERROR_HANDLING with 4 occurrences") - } - - // Find ERROR_HANDLING insight - var found bool - for _, insight := range insights { - if insight.Category == "ERROR_HANDLING" { - found = true - if insight.ContentSnippet != llmResponse { - t.Errorf("expected LLM-generated content, got: %q", insight.ContentSnippet) - } - if insight.Count != 4 { - t.Errorf("expected count 4, got %d", insight.Count) - } - break - } - } - if !found { - t.Error("expected ERROR_HANDLING insight not found") - } -} - -// TestDreamer_RemPhase_BehavioralInsight_LLMFallback verifies that when Ollama is -// unavailable, the method falls back to count-based format (current behavior). -func TestDreamer_RemPhase_BehavioralInsight_LLMFallback(t *testing.T) { - ms := newTestStore(t) - - // Add self_assessment memories with ERROR_HANDLING category (need >= 3) - for i := 0; i < 3; i++ { - addSelfAssessment(t, ms, "ERROR_HANDLING", "Missing error handler in function "+strings.Repeat("y", 10)) - } - - // Use a server that returns 404 on /api/tags so IsAvailable returns false immediately - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - http.NotFound(w, r) - })) - defer server.Close() - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: newTestOllamaClient(t, server), - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run rem: %v", err) - } - if result.Rem == nil { - t.Fatal("expected Rem result") - } - - insights := result.Rem.BehavioralInsights - if len(insights) == 0 { - t.Fatal("expected at least one behavioral insight even without Ollama") - } - - // The content should be count-based fallback format - var found bool - for _, insight := range insights { - if insight.Category == "ERROR_HANDLING" { - found = true - // Should contain the count-based format - if !strings.Contains(insight.ContentSnippet, "Behavioral insight:") { - t.Errorf("expected fallback format to contain 'Behavioral insight:', got: %q", insight.ContentSnippet) - } - if !strings.Contains(insight.ContentSnippet, "ERROR_HANDLING") { - t.Errorf("expected fallback format to contain 'ERROR_HANDLING', got: %q", insight.ContentSnippet) - } - break - } - } - if !found { - t.Error("expected ERROR_HANDLING insight in fallback") - } - - // Also test: no OllamaClient but with BaseURL/HTTPClient pointing to a server - // that doesn't serve /api/tags correctly — IsAvailable returns false, falls back gracefully - ms2 := newTestStore(t) - for i := 0; i < 3; i++ { - addSelfAssessment(t, ms2, "RACE_CONDITION", "Race condition in concurrent access "+strings.Repeat("z", 10)) - } - - // Use a server that returns 404 for /api/tags (not available) - unavailServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - http.NotFound(w, r) - })) - defer unavailServer.Close() - - d2, err := NewDreamer(DreamerConfig{ - Store: ms2, - BaseURL: unavailServer.URL, - HTTPClient: unavailServer.Client(), - // OllamaClient intentionally nil — will create from BaseURL - }) - if err != nil { - t.Fatalf("NewDreamer with unavailable BaseURL: %v", err) - } - - result2, err := d2.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run rem with unavailable Ollama: %v", err) - } - if result2.Rem == nil { - t.Fatal("expected Rem result with unavailable Ollama") - } - - insights2 := result2.Rem.BehavioralInsights - if len(insights2) == 0 { - t.Fatal("expected at least one behavioral insight with unavailable Ollama") - } -} - -// TestDreamer_RemPhase_BehavioralInsight_BelowThreshold verifies that categories -// below the threshold (count < 3) do not produce insights. -func TestDreamer_RemPhase_BehavioralInsight_BelowThreshold(t *testing.T) { - ms := newTestStore(t) - - // Add only 2 self_assessment memories for ERROR_HANDLING (below threshold of 3) - addSelfAssessment(t, ms, "ERROR_HANDLING", "Small error issue one") - addSelfAssessment(t, ms, "ERROR_HANDLING", "Small error issue two") - - // Also add a NULL_SAFETY with just 1 occurrence (also below threshold) - addSelfAssessment(t, ms, "NULL_SAFETY", "One null check missing") - - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - // Server is available but should never be called since no category exceeds threshold - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "glm-5.1:cloud"}}} - json.NewEncoder(w).Encode(resp) - return - } - t.Error("unexpected LLM call for below-threshold category") - http.Error(w, "unexpected", http.StatusBadRequest) - })) - defer server.Close() - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: newTestOllamaClient(t, server), - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run rem: %v", err) - } - if result.Rem == nil { - t.Fatal("expected Rem result") - } - - if len(result.Rem.BehavioralInsights) != 0 { - t.Errorf("expected 0 insights for below-threshold categories, got %d", len(result.Rem.BehavioralInsights)) - } -} - -// TestDreamer_RemPhase_BehavioralInsight_ProposedMetadata verifies that generated -// procedure memories have proposed:true, source:dream_rem, and category metadata. -func TestDreamer_RemPhase_BehavioralInsight_ProposedMetadata(t *testing.T) { - ms := newTestStore(t) - - // Add 3 self_assessment memories for ERROR_HANDLING - for i := 0; i < 3; i++ { - addSelfAssessment(t, ms, "ERROR_HANDLING", "Error handling gap "+strings.Repeat("a", 10)) - } - - llmResponse := "Always check error return values after external calls. Do: wrap every external call in try/except. Verify: check tests still pass after error path changes." - - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "glm-5.1:cloud"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - resp := map[string]string{"response": llmResponse} - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: newTestOllamaClient(t, server), - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run rem: %v", err) - } - - insights := result.Rem.BehavioralInsights - if len(insights) == 0 { - t.Fatal("expected at least one behavioral insight") - } - - // Find the generated procedure memory and verify its metadata - var errorHandlingInsight BehavioralInsight - var found bool - for _, insight := range insights { - if insight.Category == "ERROR_HANDLING" { - errorHandlingInsight = insight - found = true - break - } - } - if !found { - t.Fatal("expected ERROR_HANDLING insight") - } - - // The insight should have been stored as a procedure memory - if errorHandlingInsight.InsightID == "" { - t.Fatal("expected insight to have a non-empty InsightID (stored in the DB)") - } - - // Retrieve the stored procedure memory and verify metadata - mem, err := ms.Get(context.Background(), errorHandlingInsight.InsightID, false) - if err != nil { - t.Fatalf("Get stored procedure: %v", err) - } - - if mem.Type != "procedure" { - t.Errorf("expected type 'procedure', got %q", mem.Type) - } - if mem.Source != "dream_rem" { - t.Errorf("expected source 'dream_rem', got %q", mem.Source) - } - - // Verify metadata fields — Metadata is map[string]any, no type assertion needed - if proposed, _ := mem.Metadata["proposed"].(bool); !proposed { - t.Errorf("expected proposed=true in metadata, got %v", mem.Metadata["proposed"]) - } - if mem.Metadata["source"] != "dream_rem" { - t.Errorf("expected source='dream_rem' in metadata, got %v", mem.Metadata["source"]) - } - if mem.Metadata["category"] != "ERROR_HANDLING" { - t.Errorf("expected category='ERROR_HANDLING' in metadata, got %v", mem.Metadata["category"]) - } - if occ, ok := mem.Metadata["occurrences"].(float64); !ok || int(occ) != 3 { - t.Errorf("expected occurrences=3 in metadata, got %v", mem.Metadata["occurrences"]) - } -} - -// TestDreamer_RemPhase_BehavioralInsight_InvalidatesOld verifies that repeated REM -// runs produce new insights and procedure memories are stored correctly. -func TestDreamer_RemPhase_BehavioralInsight_InvalidatesOld(t *testing.T) { - ms := newTestStore(t) - - // Add self_assessment memories - for i := 0; i < 3; i++ { - addSelfAssessment(t, ms, "ERROR_HANDLING", "Repeated error handling issue "+strings.Repeat("b", 10)) - } - - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "glm-5.1:cloud"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - resp := map[string]string{"response": "First run insight content"} - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: newTestOllamaClient(t, server), - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - // First REM run - result1, err := d.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run 1: %v", err) - } - if len(result1.Rem.BehavioralInsights) == 0 { - t.Fatal("expected at least one insight in first run") - } - - // The insight should have been stored with an ID - insight1 := result1.Rem.BehavioralInsights[0] - if insight1.InsightID == "" { - t.Fatal("expected InsightID after first REM run") - } - - // Verify the procedure memory exists in the store - mem, err := ms.Get(context.Background(), insight1.InsightID, false) - if err != nil { - t.Fatalf("Get procedure memory: %v", err) - } - if mem.Type != "procedure" { - t.Errorf("expected type 'procedure', got %q", mem.Type) - } - if mem.Content != "First run insight content" { - t.Errorf("expected LLM response content, got %q", mem.Content) - } -} - -// TestBuildBehavioralInsightPrompt verifies that the prompt builder includes the -// category name, occurrence count, taxonomy description, and self_assessment samples. -func TestBuildBehavioralInsightPrompt(t *testing.T) { - category := "ERROR_HANDLING" - count := 5 - samples := []string{ - "Missing try/except in HTTP call", - "Swallowed error in database query", - "Unhandled promise rejection in async function", - } - - prompt := buildBehavioralInsightPrompt(category, count, 30, samples) - - // Verify category name is in the prompt - if !strings.Contains(prompt, "ERROR_HANDLING") { - t.Error("expected prompt to contain category name") - } - - // Verify count is in the prompt - if !strings.Contains(prompt, "5") { - t.Error("expected prompt to contain occurrence count") - } - - // Verify lookbackDays is in the prompt - if !strings.Contains(prompt, "last 30 days") { - t.Error("expected prompt to contain 'last 30 days'") - } - - // Verify taxonomy description is in the prompt - if !strings.Contains(prompt, "Missing try/except") { - t.Error("expected prompt to contain taxonomy description for ERROR_HANDLING") - } - - // Verify samples are included - for _, s := range samples { - if !strings.Contains(prompt, s) { - t.Errorf("expected prompt to contain sample %q", s) - } - } - - // Verify prompt includes "Do" section guidance - if !strings.Contains(prompt, "Do") { - t.Error("expected prompt to contain 'Do' directive guidance") - } - - // Verify prompt includes "Verify" section guidance - if !strings.Contains(prompt, "Verify") { - t.Error("expected prompt to contain 'Verify' step guidance") - } - - // Verify prompt mentions word limit - if !strings.Contains(prompt, "200") { - t.Error("expected prompt to mention 200 word limit") - } -} - -// TestBuildBehavioralInsightPrompt_EmptySamples verifies prompt generation with no samples. -func TestBuildBehavioralInsightPrompt_EmptySamples(t *testing.T) { - prompt := buildBehavioralInsightPrompt("RACE_CONDITION", 3, 14, nil) - - if !strings.Contains(prompt, "RACE_CONDITION") { - t.Error("expected prompt to contain category name") - } - if !strings.Contains(prompt, "3") { - t.Error("expected prompt to contain occurrence count") - } - // Should not crash with nil samples -} - -// TestBuildBehavioralInsightPrompt_CustomLookback verifies that a non-default -// lookbackDays value appears in the prompt instead of the hardcoded "30". -func TestBuildBehavioralInsightPrompt_CustomLookback(t *testing.T) { - prompt := buildBehavioralInsightPrompt("ERROR_HANDLING", 7, 14, []string{"sample text"}) - - if !strings.Contains(prompt, "last 14 days") { - t.Error("expected prompt to contain 'last 14 days' for custom lookback") - } - // Make sure the old hardcoded "30 days" does not appear when lookbackDays is 14 - if strings.Contains(prompt, "last 30 days") { - t.Error("prompt should not contain hardcoded 'last 30 days' when lookbackDays is 14") - } -} - -// TestJoinSamples verifies that joinSamples concatenates strings with "; ". -func TestJoinSamples(t *testing.T) { - tests := []struct { - name string - samples []string - expected string - }{ - { - name: "multiple samples", - samples: []string{"alpha", "beta", "gamma"}, - expected: "alpha; beta; gamma", - }, - { - name: "single sample", - samples: []string{"only"}, - expected: "only", - }, - { - name: "nil samples returns empty", - samples: nil, - expected: "", - }, - { - name: "empty slice returns empty", - samples: []string{}, - expected: "", - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - result := joinSamples(tt.samples) - if result != tt.expected { - t.Errorf("joinSamples(%v) = %q, want %q", tt.samples, result, tt.expected) - } - }) - } -} - -// TestDreamer_RemPhase_BehavioralInsight_LLMErrorFallback verifies that when -func TestDreamer_RemPhase_BehavioralInsight_LLMErrorFallback(t *testing.T) { - ms := newTestStore(t) - - // Add self_assessment memories with ERROR_HANDLING category - for i := 0; i < 3; i++ { - addSelfAssessment(t, ms, "ERROR_HANDLING", "Error handling gap "+strings.Repeat("c", 10)) - } - - // Server that returns 500 for /api/generate but 200 for /api/tags (so IsAvailable returns true) - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "glm-5.1:cloud"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - http.Error(w, "internal server error", http.StatusInternalServerError) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: newTestOllamaClient(t, server), - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run rem: %v", err) - } - if result.Rem == nil { - t.Fatal("expected Rem result") - } - - insights := result.Rem.BehavioralInsights - if len(insights) == 0 { - t.Fatal("expected at least one behavioral insight even when LLM fails") - } - - // Should fall back to count-based format - var found bool - for _, insight := range insights { - if insight.Category == "ERROR_HANDLING" { - found = true - if !strings.Contains(insight.ContentSnippet, "Behavioral insight:") { - t.Errorf("expected fallback format with 'Behavioral insight:' when LLM fails, got: %q", insight.ContentSnippet) - } - break - } - } - if !found { - t.Error("expected ERROR_HANDLING insight in LLM error fallback") - } -} - -// TestExtractBehavioralInsights_LogsSkipped_WhenOllamaUnavailable verifies that when -// Ollama is unavailable, extractBehavioralInsights falls back to count-based format -// and the nil OllamaClient path correctly sets useLLM=false. -func TestExtractBehavioralInsights_LogsSkipped_WhenOllamaUnavailable(t *testing.T) { - ms := newTestStore(t) - - // Add self_assessment memories with ERROR_HANDLING category (need >= 3) - for i := 0; i < 3; i++ { - addSelfAssessment(t, ms, "ERROR_HANDLING", "Missing error handler in function "+strings.Repeat("y", 10)) - } - - // Use a server that returns 404 on /api/tags so IsAvailable returns false immediately - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - http.NotFound(w, r) - })) - defer server.Close() - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: newTestOllamaClient(t, server), - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run rem: %v", err) - } - if result.Rem == nil { - t.Fatal("expected Rem result") - } - - insights := result.Rem.BehavioralInsights - if len(insights) == 0 { - t.Fatal("expected at least one behavioral insight even without Ollama") - } - - // Verify the insight is count-based fallback, not LLM-generated - for _, insight := range insights { - if insight.Category == "ERROR_HANDLING" { - if !strings.Contains(insight.ContentSnippet, "Behavioral insight:") { - t.Errorf("expected count-based fallback format, got: %q", insight.ContentSnippet) - } - if !strings.Contains(insight.ContentSnippet, "ERROR_HANDLING") { - t.Errorf("expected ERROR_HANDLING in fallback format, got: %q", insight.ContentSnippet) - } - return - } - } - t.Error("expected ERROR_HANDLING insight not found") -} - -func TestDreamer_RemPhase_BehavioralInsight_SamplesPopulated(t *testing.T) { - ms := newTestStore(t) - - // Add self_assessment memories with ERROR_HANDLING category (need >= 3) - for i := 0; i < 4; i++ { - addSelfAssessment(t, ms, "ERROR_HANDLING", "Missing error handling in HTTP call "+strings.Repeat("x", 20)) - } - - // Use a server that returns 404 for /api/tags (unavailable) - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - http.NotFound(w, r) - })) - defer server.Close() - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - OllamaClient: newTestOllamaClient(t, server), - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(context.Background(), true, "rem") - if err != nil { - t.Fatalf("Run rem: %v", err) - } - if result.Rem == nil { - t.Fatal("expected Rem result") - } - - insights := result.Rem.BehavioralInsights - if len(insights) == 0 { - t.Fatal("expected at least one behavioral insight") - } - - // Find ERROR_HANDLING insight and verify Samples is populated - var found bool - for _, insight := range insights { - if insight.Category == "ERROR_HANDLING" { - found = true - if len(insight.Samples) == 0 { - t.Error("expected Samples to be populated on BehavioralInsight, but got empty slice") - } - // Each sample should contain the category name - for _, s := range insight.Samples { - if !strings.Contains(s, "ERROR_HANDLING") { - t.Errorf("expected sample to contain 'ERROR_HANDLING', got %q", s) - } - } - break - } - } - if !found { - t.Error("expected ERROR_HANDLING insight not found") - } -} - -// TestDreamer_WriteProposedChanges_UsesSamplesFromInsight verifies that -func TestDreamer_ValidatePatches_WithBehavioralInsights(t *testing.T) { - ctx := context.Background() - ms := newTestStore(t) - - // Add self_assessment memories in the NULL_SAFETY category - // (need >= behavioralThreshold to trigger behavioral insight) - for i := 0; i < 5; i++ { - _, err := ms.Add(ctx, store.AddParams{ - Type: "self_assessment", - Content: "Category: NULL_SAFETY\nWhat_happened: nil dereference\nProposed_update: always check nil", - Source: "test", - Confidence: 0.9, - }) - if err != nil { - t.Fatalf("Add: %v", err) - } - } - - // Create a SkillPatcher - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - sp, err := skillpatch.NewSkillPatcher(skillpatch.SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - // Use a mock HTTP server that simulates unavailable Ollama - // so the dream doesn't hang trying to connect - mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - // Return 404 for all requests — Ollama unavailable - http.NotFound(w, r) - })) - defer mockServer.Close() - - mockClient := ollama.OllamaClientConfig{ - BaseURL: mockServer.URL, - HTTPClient: mockServer.Client(), - } - mockOllama, _ := ollama.NewOllamaClient(mockClient) - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - SkillPatcher: sp, - OllamaClient: mockOllama, - BehavioralThreshold: 3, - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(ctx, true, "rem") - if err != nil { - t.Fatalf("Run: %v", err) - } - - if result.Rem == nil { - t.Fatal("expected REM results") - } - - // Verify that behavioral insights were generated and patch validation ran - if len(result.Rem.BehavioralInsights) == 0 { - t.Error("expected behavioral insights for NULL_SAFETY") - } - - foundNULLSafety := false - for _, insight := range result.Rem.BehavioralInsights { - if insight.Category == "NULL_SAFETY" { - foundNULLSafety = true - } - } - if !foundNULLSafety { - t.Error("expected NULL_SAFETY insight in behavioral insights") - } -} - -// TestDreamer_ValidatePatches_NoBehavioralInsights verifies that -// no patch validation occurs when there are no behavioral insights. -func TestDreamer_ValidatePatches_NoBehavioralInsights(t *testing.T) { - ctx := context.Background() - ms := newTestStore(t) - - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - sp, err := skillpatch.NewSkillPatcher(skillpatch.SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - SkillPatcher: sp, - BehavioralThreshold: 3, // high threshold so no insights are generated - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(ctx, true, "rem") - if err != nil { - t.Fatalf("Run: %v", err) - } - - if result.Rem == nil { - t.Fatal("expected REM results") - } - - // No insights means no patch validation — test verifies it doesn't crash - if len(result.Rem.BehavioralInsights) != 0 { - t.Errorf("expected no behavioral insights, got %d", len(result.Rem.BehavioralInsights)) - } -} - -// TestDreamer_ValidatePatches_MergesMetadata verifies that patch validation -// merges flagged_for_review into existing metadata rather than replacing it. -// This is a regression test for the bug where Update with Metadata replaced -// all existing metadata fields. -func TestDreamer_ValidatePatches_MergesMetadata(t *testing.T) { - ctx := context.Background() - ms := newTestStore(t) - - // Add self_assessment memories with NULL_SAFETY category (>= behavioralThreshold) - for i := 0; i < 5; i++ { - _, err := ms.Add(ctx, store.AddParams{ - Type: "self_assessment", - Content: "Category: NULL_SAFETY\nWhat_happened: nil dereference\nProposed_update: always check nil", - Source: "test", - Confidence: 0.9, - }) - if err != nil { - t.Fatalf("Add: %v", err) - } - } - - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - sp, err := skillpatch.NewSkillPatcher(skillpatch.SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - http.NotFound(w, r) - })) - defer mockServer.Close() - - mockClient := ollama.OllamaClientConfig{ - BaseURL: mockServer.URL, - HTTPClient: mockServer.Client(), - } - mockOllama, _ := ollama.NewOllamaClient(mockClient) - - d, err := NewDreamer(DreamerConfig{ - Store: ms, - SkillPatcher: sp, - OllamaClient: mockOllama, - BehavioralThreshold: 3, - }) - if err != nil { - t.Fatalf("NewDreamer: %v", err) - } - - result, err := d.Run(ctx, true, "rem") - if err != nil { - t.Fatalf("Run: %v", err) - } - - if result.Rem == nil { - t.Fatal("expected REM results") - } - - // Find the behavioral insight and check if it was flagged (afterCount >= beforeCount) - // When errors don't decrease, the patch is flagged for review - for _, insight := range result.Rem.BehavioralInsights { - if insight.Category != "NULL_SAFETY" { - continue - } - if insight.InsightID == "" { - continue - } - - // Verify existing metadata is preserved when flagged_for_review is added - mem, err := ms.Get(ctx, insight.InsightID, false) - if err != nil { - t.Fatalf("Get: %v", err) - } - if mem == nil { - t.Fatal("expected memory to exist") - } - - // The memory should have proposed=true, source=dream_rem metadata from creation - // If flagged_for_review is set, the existing metadata should still be present - if proposed, ok := mem.Metadata["proposed"].(bool); !ok || !proposed { - t.Errorf("expected proposed=true in metadata to be preserved, got %v", mem.Metadata["proposed"]) - } - if source, ok := mem.Metadata["source"].(string); !ok || source != "dream_rem" { - t.Errorf("expected source=dream_rem in metadata to be preserved, got %v", mem.Metadata["source"]) - } - } -} - -// TestDreamer_ValidatePatches_CategoryCounting_Precise verifies that category -// counting uses ParseSelfAssessmentField instead of strings.Contains. -// This prevents double-counting when a memory mentions another category in prose. -func TestDreamer_ValidatePatches_CategoryCounting_Precise(t *testing.T) { - ctx := context.Background() - ms := newTestStore(t) - - // Add a self_assessment memory in ERROR_HANDLING that mentions another category in prose. - // With ParseSelfAssessmentField, it should only count as ERROR_HANDLING. - _, err := ms.Add(ctx, store.AddParams{ - Type: "self_assessment", - Content: "Category: ERROR_HANDLING\nWhat_happened: Unlike NULL_SAFETY, this was a bare except\nProposed_update: never use bare except", - Source: "test", - Confidence: 0.9, - }) - if err != nil { - t.Fatalf("Add: %v", err) - } - - // Verify ParseSelfAssessmentField correctly extracts only ERROR_HANDLING - content := "Category: ERROR_HANDLING\nWhat_happened: Unlike NULL_SAFETY, this was a bare except" - parsed := taxonomy.ParseSelfAssessmentField(content, "Category") - if parsed != "ERROR_HANDLING" { - t.Errorf("expected Category=ERROR_HANDLING, got %q", parsed) - } - - // Demonstrate that ParseSelfAssessmentField would NOT falsely match NULL_SAFETY - // even though "NULL_SAFETY" appears in the content - nullSafetyParsed := taxonomy.ParseSelfAssessmentField(content, "Category") - if nullSafetyParsed == "NULL_SAFETY" { - t.Error("ParseSelfAssessmentField should not extract NULL_SAFETY from the ERROR_HANDLING memory") - } -} - -// TestTaxonomy_ParseSelfAssessmentField_NoSubstringMatch verifies that -// ParseSelfAssessmentField does exact field matching and doesn't match -// substrings in prose or other field values. -func TestTaxonomy_ParseSelfAssessmentField_NoSubstringMatch(t *testing.T) { - tests := []struct { - name string - content string - field string - want string - }{ - { - name: "exact category match", - content: "Category: ERROR_HANDLING\nWhat_happened: detail", - field: "Category", - want: "ERROR_HANDLING", - }, - { - name: "category differs from prose mention", - content: "Category: ERROR_HANDLING\nWhat_happened: Unlike NULL_SAFETY issues, error was bare except", - field: "Category", - want: "ERROR_HANDLING", // NOT NULL_SAFETY — the bug we fixed - }, - { - name: "missing field returns empty", - content: "What_happened: something\nContext: else", - field: "Category", - want: "", - }, - { - name: "proposed_update extracted correctly", - content: "Category: RACE_CONDITION\nProposed_update: always use mutex", - field: "Proposed_update", - want: "always use mutex", - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - got := taxonomy.ParseSelfAssessmentField(tt.content, tt.field) - if got != tt.want { - t.Errorf("ParseSelfAssessmentField(%q, %q) = %q, want %q", tt.content, tt.field, got, tt.want) - } - }) - } -} - -// TestExtractBehavioralInsights_UsesParseSelfAssessmentField verifies that -// extractBehavioralInsights uses taxonomy.ParseSelfAssessmentField for category -// matching, preventing double-counting when a category name appears in prose. -func TestExtractBehavioralInsights_UsesParseSelfAssessmentField(t *testing.T) { - ctx := context.Background() - ms := newTestStore(t) - - // Add a self_assessment whose Category field is ERROR_HANDLING, - // but whose What_happened prose mentions NULL_SAFETY. - // With strings.Contains, this would be counted under both categories. - // With ParseSelfAssessmentField, it should only count under ERROR_HANDLING. - content := "Category: ERROR_HANDLING\nWhat_happened: Unlike NULL_SAFETY, this was a bare except\nProposed_update: never use bare except" - _, err := ms.Add(ctx, store.AddParams{ - Type: "self_assessment", - Content: content, - Source: "test", - Confidence: 0.9, - }) - if err != nil { - t.Fatalf("Add: %v", err) - } - - // We need at least behavioralThreshold self_assessments for insights to be generated. - // Add enough ERROR_HANDLING memories to cross the threshold. - for i := 0; i < 10; i++ { - _, err := ms.Add(ctx, store.AddParams{ - Type: "self_assessment", - Content: fmt.Sprintf("Category: ERROR_HANDLING\nWhat_happened: error handling issue %d", i), - Source: "test", - Confidence: 0.8, - }) - if err != nil { - t.Fatalf("Add loop: %v", err) - } - } - - // Do NOT add any NULL_SAFETY memories. If extractBehavioralInsights uses - // strings.Contains, it will falsely count the ERROR_HANDLING memory under - // NULL_SAFETY because the prose contains "NULL_SAFETY". - - d := &Dreamer{ - store: ms, - behavioralThreshold: 2, - behavioralLookbackDays: 365, - } - - insights := d.extractBehavioralInsights(ctx, false) - - // There should be no insight for NULL_SAFETY — only ERROR_HANDLING insights. - for _, insight := range insights { - if insight.Category == "NULL_SAFETY" { - t.Errorf("extractBehavioralInsights counted a NULL_SAFETY insight from ERROR_HANDLING memory — double-counting bug not fixed") - } - } - - // There should be at least one ERROR_HANDLING insight. - foundErrorHandling := false - for _, insight := range insights { - if insight.Category == "ERROR_HANDLING" { - foundErrorHandling = true - } - } - if !foundErrorHandling { - t.Error("expected at least one ERROR_HANDLING insight, got none") - } } \ No newline at end of file diff --git a/internal/extract/extract.go b/internal/extract/extract.go index 7cd5494..63ec78e 100644 --- a/internal/extract/extract.go +++ b/internal/extract/extract.go @@ -12,7 +12,6 @@ import ( "strings" "github.com/MichielDean/LLMem/internal/ollama" - "github.com/MichielDean/LLMem/internal/taxonomy" ) const ( @@ -21,23 +20,17 @@ const ( ) // extractionPrompt is the system prompt for memory extraction. -var extractionPrompt = buildExtractionPrompt() - -func buildExtractionPrompt() string { - categoryChoices := taxonomy.IntrospectCategoryChoices() - return fmt.Sprintf(`You are a memory extraction system. Extract key memories from the text below. +var extractionPrompt = `You are a memory extraction system. Extract key memories from the text below. Return a JSON array of objects with these fields: -- type: one of "fact", "decision", "preference", "event", "project_state", "procedure", "self_assessment" +- type: one of "fact", "decision", "preference", "event", "project_state", "procedure", "conversation" - content: a clear, specific statement (not vague) - confidence: 0.0 to 1.0 (how certain this is a lasting memory) -- category: (self_assessment only) one of: %s If no memories are worth extracting, return an empty array []. Text: -`, categoryChoices) -} +` // regex for extracting JSON arrays from LLM responses var ( diff --git a/internal/introspect/introspect.go b/internal/introspect/introspect.go deleted file mode 100644 index 48c9047..0000000 --- a/internal/introspect/introspect.go +++ /dev/null @@ -1,387 +0,0 @@ -// Package introspect provides failure analysis and lesson learning for LLMem self-assessment. -// It uses an OllamaClient for LLM-assisted introspection with graceful degradation. -package introspect - -import ( - "context" - "fmt" - "log/slog" - "net/http" - "strings" - "time" - - "github.com/MichielDean/LLMem/internal/ollama" - "github.com/MichielDean/LLMem/internal/store" - "github.com/MichielDean/LLMem/internal/taxonomy" -) - -const ( - defaultModel = "glm-5.1:cloud" - defaultBaseURL = "http://localhost:11434" - introspectSource = "introspect" - learnSource = "learn" - introspectConfidence = 0.9 - learnConfidence = 0.85 - - // callModelTimeout is the default timeout for LLM calls in IntrospectFailure and LearnLesson. - // CallModelTimeout in params takes precedence; this is the fallback when zero. - callModelTimeout = 5 * time.Minute - - // callModelMinTimeout is the minimum allowed timeout for LLM calls. - // Timeouts below this value are rejected to prevent accidental instant-failures. - callModelMinTimeout = 10 * time.Second -) - -// LLMEnrichment indicates whether LLM enrichment was used when storing a memory. -// It is a generic status type that could be reused by other LLM callers (learn, dream REM). -type LLMEnrichment string - -const ( - // Enriched means the LLM successfully generated structured content. - Enriched LLMEnrichment = "enriched" - // Skipped means the LLM was unavailable or timed out, so raw fields were stored. - Skipped LLMEnrichment = "skipped" - // Disabled means the caller explicitly set NoLLM=true, skipping LLM entirely. - Disabled LLMEnrichment = "disabled" -) - -// IntrospectResult holds the result of an IntrospectFailure call. -// MemoryID is always non-empty on success (never empty string). -// Content is the stored memory content. -// LLMStatus indicates whether LLM enrichment was used. -// ProposedUpdate contains the proposed procedural update extracted from the -// self-assessment content. Empty when no proposed update is available. -// Category contains the error taxonomy category. May be empty when no category is specified. -type IntrospectResult struct { - MemoryID string - Content string - LLMStatus LLMEnrichment - ProposedUpdate string - Category string -} - -// LearnResult holds the result of a LearnLesson call. -// MemoryID is always non-empty on success (never empty string). -// Content is the stored memory content. -// LLMStatus indicates whether LLM enrichment was used. -type LearnResult struct { - MemoryID string - Content string - LLMStatus LLMEnrichment -} - -// IntrospectFailureParams contains the parameters for introspecting a failure. -type IntrospectFailureParams struct { - WhatHappened string - Category string - Context string - CaughtBy string - ProposedFix string - Model string - BaseURL string - - // NoLLM skips all Ollama calls: no IsAvailable check, no Generate call. - // When true, raw fields are stored immediately and LLMStatus is Disabled. - NoLLM bool - - // Timeout for the LLM call. When zero, defaults to callModelTimeout (5 minutes). - // Must be >= 10 seconds; values below are rejected with an error. - Timeout time.Duration - - // HTTPClient is an optional pre-configured HTTP client (for testing with httptest.NewServer). - // When provided, it is passed to OllamaClient constructor and bypasses URL validation. - HTTPClient *http.Client -} - -// LearnLessonParams contains the parameters for learning a lesson from a wrong→right correction. -type LearnLessonParams struct { - WhatWasWrong string - WhatIsCorrect string - Context string - Model string - BaseURL string - - // NoLLM skips all Ollama calls: no IsAvailable check, no Generate call. - // When true, raw fields are stored immediately and LLMStatus is Disabled. - NoLLM bool - - // Timeout for the LLM call. When zero, defaults to callModelTimeout (5 minutes). - // Must be >= 10 seconds; values below are rejected with an error. - Timeout time.Duration - - // HTTPClient is an optional pre-configured HTTP client (for testing with httptest.NewServer). - // When provided, it is passed to OllamaClient constructor and bypasses URL validation. - HTTPClient *http.Client -} - -// fmtErr wraps an error with the "llmem: introspect:" domain prefix. -func fmtErr(format string, args ...any) error { - return fmt.Errorf("llmem: introspect: "+format, args...) -} - -// IntrospectFailure analyzes a failure and stores a self_assessment memory. -// If the LLM is available, it uses the model to expand the bare description into -// a structured self-assessment. If unavailable, it stores a structured -// memory directly from the provided fields (graceful degradation). -// -// Contract: NEVER returns (IntrospectResult{}, nil) — either creates a memory or returns an error. -// Even on LLM failure, a storage-only memory is created with LLMStatus Skipped or Disabled. -func IntrospectFailure(ctx context.Context, ms *store.MemoryStore, params IntrospectFailureParams) (IntrospectResult, error) { - if params.WhatHappened == "" { - return IntrospectResult{}, fmtErr("what_happened is required") - } - if params.Model == "" { - params.Model = defaultModel - } - if params.Timeout != 0 && params.Timeout < callModelMinTimeout { - return IntrospectResult{}, fmtErr("timeout must be at least %v, got %v", callModelMinTimeout, params.Timeout) - } - - // Warn about unknown categories but proceed anyway - if params.Category != "" { - if _, ok := taxonomy.ErrorTaxonomy[params.Category]; !ok { - slog.Warn("llmem: introspect: unknown category, proceeding anyway", "category", params.Category) - } - } - - if params.NoLLM { - // Explicit raw-only mode: skip LLM entirely - content := buildRawFailureContent(params) - id, err := ms.Add(ctx, store.AddParams{ - Type: "self_assessment", - Content: content, - Source: introspectSource, - Confidence: introspectConfidence, - }) - if err != nil { - return IntrospectResult{}, fmtErr("store self_assessment: %w", err) - } - slog.Info("llmem: introspect: stored self_assessment (LLM disabled)", "id", id) - return IntrospectResult{ - MemoryID: id, - Content: content, - LLMStatus: Disabled, - ProposedUpdate: params.ProposedFix, - Category: params.Category, - }, nil - } - - var content string - llmResponse, llmStatus := callModel(ctx, params.Model, params.BaseURL, buildFailurePrompt(params), params.Timeout, params.HTTPClient) - - if llmStatus == Enriched && llmResponse != "" { - content = llmResponse - } else { - // Graceful degradation: build from provided fields - content = buildRawFailureContent(params) - } - - id, err := ms.Add(ctx, store.AddParams{ - Type: "self_assessment", - Content: content, - Source: introspectSource, - Confidence: introspectConfidence, - }) - if err != nil { - return IntrospectResult{}, fmtErr("store self_assessment: %w", err) - } - - slog.Info("llmem: introspect: stored self_assessment", "id", id, "llm_status", llmStatus) - - // Extract ProposedUpdate and Category from the stored content. - // When LLM enrichment succeeded, parse from the LLM response. - // When LLM was skipped or failed, the raw fields are used instead. - proposedUpdate := "" - category := params.Category - if llmStatus == Enriched && content != "" { - proposedUpdate = taxonomy.ParseSelfAssessmentField(content, "Proposed_update") - parsedCategory := taxonomy.ParseSelfAssessmentField(content, "Category") - if parsedCategory != "" { - category = parsedCategory - } - } - if proposedUpdate == "" { - proposedUpdate = params.ProposedFix - } - - return IntrospectResult{ - MemoryID: id, - Content: content, - LLMStatus: llmStatus, - ProposedUpdate: proposedUpdate, - Category: category, - }, nil -} - -// buildRawFailureContent constructs the fallback content string from provided fields -// when LLM enrichment is not available or was skipped. -func buildRawFailureContent(params IntrospectFailureParams) string { - var lines []string - if params.Category != "" { - lines = append(lines, "Category: "+params.Category) - } - if params.Context != "" { - lines = append(lines, "Context: "+params.Context) - } - lines = append(lines, "What_happened: "+params.WhatHappened) - lines = append(lines, "What_caught_it: "+orDefault(params.CaughtBy, "mid-session introspection")) - if params.ProposedFix != "" { - lines = append(lines, "Proposed_update: "+params.ProposedFix) - } - lines = append(lines, "Recurring: unknown") - return strings.Join(lines, "\n") -} - -// LearnLesson analyzes a wrong→right correction and stores a procedure memory. -// If the LLM is available, it distills the correction into a generalizable procedure. -// If unavailable, it stores the lesson directly (graceful degradation). -// -// Contract: NEVER returns (LearnResult{}, nil) — either creates a memory or returns an error. -func LearnLesson(ctx context.Context, ms *store.MemoryStore, params LearnLessonParams) (LearnResult, error) { - if params.WhatWasWrong == "" || params.WhatIsCorrect == "" { - return LearnResult{}, fmtErr("what_was_wrong and what_is_correct are required") - } - if params.Model == "" { - params.Model = defaultModel - } - if params.Timeout != 0 && params.Timeout < callModelMinTimeout { - return LearnResult{}, fmtErr("timeout must be at least %v, got %v", callModelMinTimeout, params.Timeout) - } - - if params.NoLLM { - // Explicit raw-only mode: skip LLM entirely - content := buildRawLessonContent(params) - id, err := ms.Add(ctx, store.AddParams{ - Type: "procedure", - Content: content, - Source: learnSource, - Confidence: learnConfidence, - }) - if err != nil { - return LearnResult{}, fmtErr("store procedure: %w", err) - } - slog.Info("llmem: learn: stored procedure (LLM disabled)", "id", id) - return LearnResult{MemoryID: id, Content: content, LLMStatus: Disabled}, nil - } - - var content string - llmResponse, llmStatus := callModel(ctx, params.Model, params.BaseURL, buildLessonPrompt(params), params.Timeout, params.HTTPClient) - - if llmStatus == Enriched && llmResponse != "" { - content = llmResponse - } else { - // Graceful degradation: build from provided fields - content = buildRawLessonContent(params) - } - - id, err := ms.Add(ctx, store.AddParams{ - Type: "procedure", - Content: content, - Source: learnSource, - Confidence: learnConfidence, - }) - if err != nil { - return LearnResult{}, fmtErr("store procedure: %w", err) - } - - slog.Info("llmem: learn: stored procedure", "id", id, "llm_status", llmStatus) - return LearnResult{MemoryID: id, Content: content, LLMStatus: llmStatus}, nil -} - -func buildRawLessonContent(params LearnLessonParams) string { - var lines []string - lines = append(lines, "WRONG: "+params.WhatWasWrong) - lines = append(lines, "RIGHT: "+params.WhatIsCorrect) - if params.Context != "" { - lines = append(lines, "Context: "+params.Context) - } - return strings.Join(lines, "\n") -} - -// buildFailurePrompt builds the prompt for failure introspection. -// The description may include what went wrong, the fix, and context — or just -// the problem. The LLM infers category, caught_by, and proposed_fix from -// whatever the agent provides. If the agent includes a known fix, the LLM -// produces a procedural update within the self_assessment. -func buildFailurePrompt(params IntrospectFailureParams) string { - fieldLines := taxonomy.IntrospectFieldLines() - prompt := "Analyze this failure from a coding agent's session and produce a structured self-assessment.\n\n" - prompt += "The agent provided a summary of what went wrong. Infer the category, context, " + - "how it was caught, and a proposed procedural fix from the description. " + - "Identify whether it's a recurring pattern and what procedural change would prevent it.\n\n" - prompt += "If the description includes both what went wrong AND what the correct approach is, " + - "treat the proposed_update as a definitive procedural rule. If it only describes the failure, " + - "propose a specific, actionable update based on the pattern.\n\n" - prompt += "Format each field on its own line as \"Field: value\":\n\n" - prompt += fieldLines + "\n\n" - prompt += "Agent's description:\n " + params.WhatHappened - prompt += "\n\nProduce a structured self-assessment. Be specific about what went wrong and what should change." - return prompt -} - -// buildLessonPrompt builds the prompt for lesson learning. -func buildLessonPrompt(params LearnLessonParams) string { - prompt := "A coding agent made a mistake and then corrected it. Distill the lesson into an actionable, " + - "generalizable procedure that will prevent this mistake in future sessions.\n\n" - prompt += "Be specific and practical. The procedure should be a rule the agent can follow — not vague advice.\n\n" - prompt += "What was WRONG:\n" + params.WhatWasWrong + "\n\n" - prompt += "What is CORRECT:\n" + params.WhatIsCorrect - if params.Context != "" { - prompt += "\n\nContext: " + params.Context - } - prompt += "\n\nWrite the lesson as a clear, actionable procedure. Start with the correct behavior, " + - "then explain what to avoid. Keep it under 200 words." - return prompt -} - -// callModel attempts to call the Ollama model. Returns the LLM response and enrichment status. -// When timeout is zero, defaults to callModelTimeout (5 minutes). -// Returns ("", Skipped) when Ollama is unavailable or the call times out. -// Returns (response, Enriched) when the model call succeeds with non-empty response. -// Never panics. -func callModel(ctx context.Context, model, baseURL, prompt string, timeout time.Duration, httpClient *http.Client) (string, LLMEnrichment) { - if baseURL == "" { - baseURL = defaultBaseURL - } - if timeout == 0 { - timeout = callModelTimeout - } - - timeoutCtx, cancel := context.WithTimeout(ctx, timeout) - defer cancel() - - ollamaCfg := ollama.OllamaClientConfig{ - BaseURL: baseURL, - HTTPClient: httpClient, - } - client, err := ollama.NewOllamaClient(ollamaCfg) - if err != nil { - slog.Error("llmem: introspect: create Ollama client failed", "error", err) - return "", Skipped - } - - if !client.IsAvailable(timeoutCtx) { - slog.Debug("llmem: introspect: Ollama not available, using storage-only fallback") - return "", Skipped - } - - response, err := client.Generate(timeoutCtx, prompt, model) - if err != nil { - slog.Error("llmem: introspect: model call failed", "error", err) - return "", Skipped - } - - if response == "" { - slog.Error("llmem: introspect: model returned empty response") - return "", Skipped - } - - return response, Enriched -} - -func orDefault(val, defaultVal string) string { - if val != "" { - return val - } - return defaultVal -} \ No newline at end of file diff --git a/internal/introspect/introspect_test.go b/internal/introspect/introspect_test.go deleted file mode 100644 index a7c1359..0000000 --- a/internal/introspect/introspect_test.go +++ /dev/null @@ -1,660 +0,0 @@ -package introspect - -import ( - "context" - "encoding/json" - "net/http" - "net/http/httptest" - "path/filepath" - "testing" - "time" - - "github.com/MichielDean/LLMem/internal/store" -) - -func newTestStore(t *testing.T) *store.MemoryStore { - t.Helper() - dir := t.TempDir() - dbPath := filepath.Join(dir, "test.db") - ms, err := store.NewMemoryStore(store.StoreConfig{ - DBPath: dbPath, - DisableVec: true, - }) - if err != nil { - t.Fatalf("NewMemoryStore: %v", err) - } - t.Cleanup(func() { ms.Close() }) - // self_assessment and procedure are already registered via default types - return ms -} - -func TestIntrospectFailure_WithFields(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "swallowed error on database write", - Category: "ERROR_HANDLING", - Context: "writing to SQLite store", - CaughtBy: "code review", - ProposedFix: "always check error return values", - BaseURL: "http://localhost:59998", - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - if result.MemoryID == "" { - t.Error("expected non-empty memory ID") - } - - mem, err := ms.Get(context.Background(), result.MemoryID, false) - if err != nil { - t.Fatalf("Get: %v", err) - } - if mem == nil { - t.Fatal("expected memory to be stored") - } - if mem.Type != "self_assessment" { - t.Errorf("expected type self_assessment, got %q", mem.Type) - } - if mem.Source != "introspect" { - t.Errorf("expected source introspect, got %q", mem.Source) - } - // When Ollama is unreachable, LLMStatus should be Skipped - if result.LLMStatus != Skipped { - t.Errorf("expected LLMStatus Skipped, got %q", result.LLMStatus) - } -} - -func TestIntrospectFailure_UnknownCategory(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "test failure", - Category: "UNKNOWN_CATEGORY", - BaseURL: "http://localhost:59998", - }) - if err != nil { - t.Fatalf("IntrospectFailure with unknown category: %v", err) - } - if result.MemoryID == "" { - t.Error("expected non-empty memory ID even with unknown category") - } -} - -func TestIntrospectFailure_EmptyWhatHappened(t *testing.T) { - ctx := context.Background() - ms := newTestStore(t) - _, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "", - }) - if err == nil { - t.Error("expected error for empty what_happened") - } -} - -func TestIntrospectFailure_GracefulDegradation(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "test failure", - Category: "NULL_SAFETY", - BaseURL: "http://localhost:59999", - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - if result.MemoryID == "" { - t.Error("expected non-empty memory ID even when Ollama is unavailable") - } - - mem, _ := ms.Get(context.Background(), result.MemoryID, false) - if mem == nil { - t.Fatal("expected memory to be stored") - } - if mem.Content == "" { - t.Error("expected non-empty content even without LLM") - } - if result.LLMStatus != Skipped { - t.Errorf("expected LLMStatus Skipped when Ollama unavailable, got %q", result.LLMStatus) - } -} - -func TestIntrospectFailure_ReturnsLLMSkipped_WhenOllamaUnavailable(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "some failure", - Category: "ERROR_HANDLING", - BaseURL: "http://localhost:59998", // unreachable - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - if result.LLMStatus != Skipped { - t.Errorf("expected LLMStatus Skipped, got %q", result.LLMStatus) - } - if result.MemoryID == "" { - t.Error("expected non-empty MemoryID even on LLM skip") - } - if result.Content == "" { - t.Error("expected non-empty Content even on LLM skip") - } -} - -func TestIntrospectFailure_ReturnsLLMEnriched_WhenOllamaAvailable(t *testing.T) { - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "test-model"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - resp := map[string]string{ - "response": "Category: ERROR_HANDLING\nWhat_happened: test\nProposed_update: fix it", - } - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "test failure", - Category: "ERROR_HANDLING", - BaseURL: server.URL, - Model: "test-model", - HTTPClient: server.Client(), - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - if result.LLMStatus != Enriched { - t.Errorf("expected LLMStatus Enriched, got %q", result.LLMStatus) - } - if result.MemoryID == "" { - t.Error("expected non-empty MemoryID") - } -} - -func TestIntrospectFailure_NoLLMFlag(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "some failure", - Category: "ERROR_HANDLING", - NoLLM: true, - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - if result.LLMStatus != Disabled { - t.Errorf("expected LLMStatus Disabled, got %q", result.LLMStatus) - } - if result.MemoryID == "" { - t.Error("expected non-empty MemoryID") - } - // Content should contain the raw fields - if result.Content == "" { - t.Error("expected non-empty Content with NoLLM") - } -} - -func TestIntrospectFailure_TimeoutValidation(t *testing.T) { - ms := newTestStore(t) - ctx := context.Background() - - tests := []struct { - name string - timeout time.Duration - wantErr bool - }{ - {"zero timeout defaults", 0, false}, - {"valid timeout at 10s", 10 * time.Second, false}, - {"valid timeout at 2m", 2 * time.Minute, false}, - {"invalid timeout at 5s", 5 * time.Second, true}, - {"invalid timeout at 1s", 1 * time.Second, true}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - // Use NoLLM=true to avoid any Ollama network calls (validation only) - _, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "test", - NoLLM: true, - Timeout: tt.timeout, - }) - if tt.wantErr && err == nil { - t.Error("expected error for timeout below minimum") - } - if !tt.wantErr && err != nil { - t.Errorf("unexpected error: %v", err) - } - }) - } -} - -func TestLearnLesson_WithFields(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := LearnLesson(ctx, ms, LearnLessonParams{ - WhatWasWrong: "ignored error return value", - WhatIsCorrect: "always check error return values", - Context: "database write operation", - BaseURL: "http://localhost:59998", - }) - if err != nil { - t.Fatalf("LearnLesson: %v", err) - } - if result.MemoryID == "" { - t.Error("expected non-empty memory ID") - } - - mem, err := ms.Get(context.Background(), result.MemoryID, false) - if err != nil { - t.Fatalf("Get: %v", err) - } - if mem == nil { - t.Fatal("expected memory to be stored") - } - if mem.Type != "procedure" { - t.Errorf("expected type procedure, got %q", mem.Type) - } - if mem.Source != "learn" { - t.Errorf("expected source learn, got %q", mem.Source) - } - if result.LLMStatus != Skipped { - t.Errorf("expected LLMStatus Skipped, got %q", result.LLMStatus) - } -} - -func TestLearnLesson_EmptyFields(t *testing.T) { - ms := newTestStore(t) - _, err := LearnLesson(context.Background(), ms, LearnLessonParams{ - WhatWasWrong: "", - WhatIsCorrect: "something", - }) - if err == nil { - t.Error("expected error for empty what_was_wrong") - } - - _, err = LearnLesson(context.Background(), ms, LearnLessonParams{ - WhatWasWrong: "something", - WhatIsCorrect: "", - }) - if err == nil { - t.Error("expected error for empty what_is_correct") - } -} - -func TestLearnLesson_ReturnsLLMSkipped_WhenOllamaUnavailable(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := LearnLesson(ctx, ms, LearnLessonParams{ - WhatWasWrong: "forgot to check nil", - WhatIsCorrect: "always check for nil", - BaseURL: "http://localhost:59998", // unreachable - }) - if err != nil { - t.Fatalf("LearnLesson: %v", err) - } - if result.LLMStatus != Skipped { - t.Errorf("expected LLMStatus Skipped, got %q", result.LLMStatus) - } - if result.MemoryID == "" { - t.Error("expected non-empty MemoryID even on LLM skip") - } - if result.Content == "" { - t.Error("expected non-empty Content even on LLM skip") - } -} - -func TestLearnLesson_ReturnsLLMEnriched_WhenOllamaAvailable(t *testing.T) { - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "test-model"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - resp := map[string]string{ - "response": "WRONG: forgot nil check\nRIGHT: always check nil", - } - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := LearnLesson(ctx, ms, LearnLessonParams{ - WhatWasWrong: "forgot nil check", - WhatIsCorrect: "always check nil", - BaseURL: server.URL, - Model: "test-model", - HTTPClient: server.Client(), - }) - if err != nil { - t.Fatalf("LearnLesson: %v", err) - } - if result.LLMStatus != Enriched { - t.Errorf("expected LLMStatus Enriched, got %q", result.LLMStatus) - } - if result.MemoryID == "" { - t.Error("expected non-empty MemoryID") - } -} - -func TestLearnLesson_NoLLMFlag(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := LearnLesson(ctx, ms, LearnLessonParams{ - WhatWasWrong: "forgot nil check", - WhatIsCorrect: "always check nil", - NoLLM: true, - }) - if err != nil { - t.Fatalf("LearnLesson: %v", err) - } - if result.LLMStatus != Disabled { - t.Errorf("expected LLMStatus Disabled, got %q", result.LLMStatus) - } - if result.MemoryID == "" { - t.Error("expected non-empty MemoryID") - } - if result.Content == "" { - t.Error("expected non-empty Content with NoLLM") - } -} - -func TestLearnLesson_TimeoutValidation(t *testing.T) { - ms := newTestStore(t) - ctx := context.Background() - - tests := []struct { - name string - timeout time.Duration - wantErr bool - }{ - {"zero timeout defaults", 0, false}, - {"valid timeout at 10s", 10 * time.Second, false}, - {"invalid timeout at 5s", 5 * time.Second, true}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - // Use NoLLM=true to avoid any Ollama network calls (validation only) - _, err := LearnLesson(ctx, ms, LearnLessonParams{ - WhatWasWrong: "wrong", - WhatIsCorrect: "right", - NoLLM: true, - Timeout: tt.timeout, - }) - if tt.wantErr && err == nil { - t.Error("expected error for timeout below minimum") - } - if !tt.wantErr && err != nil { - t.Errorf("unexpected error: %v", err) - } - }) - } -} - -func TestCallModel_ConfigurableTimeout(t *testing.T) { - // Verify that callModel defaults to callModelTimeout (5 minutes) when timeout is 0 - ctx := context.Background() - // Unreachable base URL — should return Skipped quickly (URL validation fails) - resp, status := callModel(ctx, "test", "http://localhost:59998", "test prompt", 0, nil) - if status != Skipped { - t.Errorf("expected Skipped for unreachable Ollama, got %q", status) - } - if resp != "" { - t.Errorf("expected empty response for unreachable Ollama, got %q", resp) - } - - // Verify custom timeout works - resp, status = callModel(ctx, "test", "http://localhost:59998", "test prompt", 15*time.Second, nil) - if status != Skipped { - t.Errorf("expected Skipped for unreachable Ollama with custom timeout, got %q", status) - } -} - -func TestCallModel_NoLLMFlag(t *testing.T) { - // This test verifies the NoLLM path bypasses callModel entirely. - // callModel itself doesn't take NoLLM — the caller handles it. - // But we verify that IntrospectFailure with NoLLM never invokes callModel - // by checking that the result has status Disabled. - ms := newTestStore(t) - ctx := context.Background() - - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "test failure", - NoLLM: true, - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - if result.LLMStatus != Disabled { - t.Errorf("expected LLMStatus Disabled with NoLLM=true, got %q", result.LLMStatus) - } -} - -func TestIntrospectFailure_WithModel(t *testing.T) { - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "test-model"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - resp := map[string]string{ - "response": "Category: ERROR_HANDLING\nWhat_happened: test\nProposed_update: fix it", - } - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "test failure", - Category: "ERROR_HANDLING", - BaseURL: server.URL, - Model: "test-model", - HTTPClient: server.Client(), - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - // With a working mock server, LLM should be Enriched - if result.LLMStatus != Enriched { - t.Errorf("expected LLMStatus Enriched with mock server, got %q", result.LLMStatus) - } - if result.MemoryID == "" { - t.Error("expected non-empty MemoryID") - } -} - -// TestIntrospectFailure_ProposedUpdateReturned verifies that IntrospectResult -// includes the ProposedUpdate and Category fields when LLM returns structured content. -func TestIntrospectFailure_ProposedUpdateReturned(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "null pointer dereference in handler", - Category: "NULL_SAFETY", - ProposedFix: "always check for nil before accessing fields", - NoLLM: true, - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - if result.ProposedUpdate != "always check for nil before accessing fields" { - t.Errorf("expected ProposedUpdate='always check for nil before accessing fields', got %q", result.ProposedUpdate) - } - if result.Category != "NULL_SAFETY" { - t.Errorf("expected Category='NULL_SAFETY', got %q", result.Category) - } -} - -// TestIntrospectFailure_ProposedUpdateFromRawContent verifies that ProposedUpdate -// is extracted from the params when LLM is available and returns a structured response. -func TestIntrospectFailure_ProposedUpdateFromRawContent(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "swallowed error in database call", - Category: "ERROR_HANDLING", - ProposedFix: "wrap errors with fmt.Errorf and proper context", - NoLLM: true, - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - - // When NoLLM is true, ProposedFix is used directly as ProposedUpdate - if result.ProposedUpdate != "wrap errors with fmt.Errorf and proper context" { - t.Errorf("expected ProposedUpdate from params, got %q", result.ProposedUpdate) - } - if result.Category != "ERROR_HANDLING" { - t.Errorf("expected Category='ERROR_HANDLING', got %q", result.Category) - } -} - -// TestIntrospectFailure_ProposedUpdateWithLLM verifies that ProposedUpdate is -// populated from LLM response when the LLM enriches the content. -func TestIntrospectFailure_ProposedUpdateWithLLM(t *testing.T) { - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "test-model"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - resp := map[string]string{ - "response": "Category: NULL_SAFETY\nWhat_happened: nil dereference\nProposed_update: always guard nil pointers in Go\nWhat_caught_it: code review", - } - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "nil dereference in handler", - Category: "NULL_SAFETY", - ProposedFix: "check nil before access", - Model: "test-model", - HTTPClient: server.Client(), - BaseURL: server.URL, - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - - if result.ProposedUpdate != "always guard nil pointers in Go" { - t.Errorf("expected ProposedUpdate from LLM response, got %q", result.ProposedUpdate) - } - if result.Category != "NULL_SAFETY" { - t.Errorf("expected Category=NULL_SAFETY, got %q", result.Category) - } -} - -// TestIntrospectFailure_EmptyProposedUpdate verifies that ProposedUpdate is empty -// when no proposed fix is provided. -func TestIntrospectFailure_EmptyProposedUpdate(t *testing.T) { - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := IntrospectFailure(ctx, ms, IntrospectFailureParams{ - WhatHappened: "some error", - NoLLM: true, - }) - if err != nil { - t.Fatalf("IntrospectFailure: %v", err) - } - if result.ProposedUpdate != "" { - t.Errorf("expected empty ProposedUpdate, got %q", result.ProposedUpdate) - } - if result.Category != "" { - t.Errorf("expected empty Category when not specified, got %q", result.Category) - } -} - -func TestLearnLesson_WithModel(t *testing.T) { - server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.URL.Path == "/api/tags" { - resp := map[string]any{"models": []map[string]string{{"name": "test-model"}}} - json.NewEncoder(w).Encode(resp) - return - } - if r.URL.Path == "/api/generate" { - resp := map[string]string{ - "response": "WRONG: forgot nil check\nRIGHT: always check nil", - } - json.NewEncoder(w).Encode(resp) - return - } - http.NotFound(w, r) - })) - defer server.Close() - - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - ms := newTestStore(t) - result, err := LearnLesson(ctx, ms, LearnLessonParams{ - WhatWasWrong: "forgot nil check", - WhatIsCorrect: "always check nil", - BaseURL: server.URL, - Model: "test-model", - HTTPClient: server.Client(), - }) - if err != nil { - t.Fatalf("LearnLesson: %v", err) - } - if result.LLMStatus != Enriched { - t.Errorf("expected LLMStatus Enriched with mock server, got %q", result.LLMStatus) - } -} - diff --git a/internal/ollama/ollama.go b/internal/ollama/ollama.go index cc31545..80cb93c 100644 --- a/internal/ollama/ollama.go +++ b/internal/ollama/ollama.go @@ -1,5 +1,5 @@ // Package ollama provides a client for the Ollama /api/generate and /api/tags endpoints. -// It is a shared extraction layer used by the extract, introspect, and dream packages +// It is a shared extraction layer used by the extract and dream packages // to call the Ollama LLM generation API. package ollama diff --git a/internal/retriever/retriever.go b/internal/retriever/retriever.go index 1bab3f7..b68e908 100644 --- a/internal/retriever/retriever.go +++ b/internal/retriever/retriever.go @@ -32,13 +32,13 @@ const ( // after initialization and is only read via DefaultTypePriority() which // returns defensive copies. var defaultTypePriorityMap = map[string]float64{ - "decision": 1.2, - "preference": 1.1, - "procedure": 1.1, - "fact": 1.0, - "project_state": 1.0, - "self_assessment": 1.0, - "event": 0.9, + "decision": 1.2, + "preference": 1.1, + "procedure": 1.1, + "fact": 1.0, + "project_state": 1.0, + "event": 0.9, + "conversation": 0.7, } // RerankSignals holds per-memory reranking signal values. diff --git a/internal/skillpatch/skillpatch.go b/internal/skillpatch/skillpatch.go deleted file mode 100644 index 12cade9..0000000 --- a/internal/skillpatch/skillpatch.go +++ /dev/null @@ -1,410 +0,0 @@ -// Package skillpatch provides direct skill file patching after introspection. -// The SkillPatcher is specific to LLMem SKILL.md files and will not be reused outside this package. -package skillpatch - -import ( - "context" - "fmt" - "log/slog" - "os" - "path/filepath" - "regexp" - "strings" - "time" - - "github.com/MichielDean/LLMem/internal/paths" -) - -// categorySkillMap maps error taxonomy categories to their skill directory. -// All 10 error categories currently map to the introspection skill. -// If no matching skill file exists, a new skill file is created in the -// lowercase category directory. -var categorySkillMap = map[string]string{ - "NULL_SAFETY": "introspection", - "ERROR_HANDLING": "introspection", - "OFF_BY_ONE": "introspection", - "RACE_CONDITION": "introspection", - "AUTH_BYPASS": "introspection", - "DATA_INTEGRITY": "introspection", - "MISSING_VERIFICATION": "introspection", - "EDGE_CASE": "introspection", - "PERFORMANCE": "introspection", - "DESIGN": "introspection", -} - -// PatchValidation holds the result of validating whether a skill patch was effective. -type PatchValidation struct { - Category string - BeforeCount int - AfterCount int - Effective bool - Flagged bool -} - -// SkillPatcher patches SKILL.md files with procedural updates from introspection. -// The SkillPatcher is specific to LLMem SKILL.md files and will not be reused outside this package. -type SkillPatcher struct { - skillDir string -} - -// SkillPatchConfig contains configuration for creating a SkillPatcher. -// SkillDir defaults to paths.GetSkillDir() if empty. -type SkillPatchConfig struct { - // SkillDir is the root directory containing skill files. - // Defaults to paths.GetSkillDir() if empty. - SkillDir string -} - -// fmtErr wraps an error with the "llmem: skillpatch:" domain prefix. -func fmtErr(format string, args ...any) error { - return fmt.Errorf("llmem: skillpatch: "+format, args...) -} - -// validCategoryRe matches category names that are safe to use as directory names. -// Only alphanumeric characters and underscores are allowed — no path separators, -// dots, or whitespace that could enable traversal or injection. -var validCategoryRe = regexp.MustCompile(`^[A-Za-z0-9_]+$`) - -// sanitizeCategory validates that a category name is safe for use as a directory -// name. It rejects categories containing path separators, dots, whitespace, or -// other characters that could enable path traversal. -func sanitizeCategory(category string) error { - if category == "" { - return fmtErr("invalid category: empty") - } - if !validCategoryRe.MatchString(category) { - return fmtErr("invalid category %q: must contain only alphanumeric characters and underscores", category) - } - return nil -} - -// sanitizeYAMLValue replaces newlines and carriage returns in a string to prevent -// YAML frontmatter injection. Newlines in YAML values can break the frontmatter -// structure and inject arbitrary YAML keys. -func sanitizeYAMLValue(s string) string { - s = strings.ReplaceAll(s, "\r\n", " ") - s = strings.ReplaceAll(s, "\n", " ") - s = strings.ReplaceAll(s, "\r", " ") - return s -} - -// NewSkillPatcher creates and initializes a SkillPatcher. -// All config fields default to sensible values if zero. -// The constructor leaves the SkillPatcher in a fully usable state. -func NewSkillPatcher(cfg SkillPatchConfig) (*SkillPatcher, error) { - skillDir := cfg.SkillDir - if skillDir == "" { - skillDir = paths.GetSkillDir() - } - - return &SkillPatcher{ - skillDir: skillDir, - }, nil -} - -// Patch appends a structured section to the matching skill file for the given category. -// If no matching skill file exists, creates one in the appropriate directory. -// If category is empty, returns fmtErr("category is required"). -// If proposedUpdate is empty, returns nil (no-op, not an error). -// Creates parent directories with 0700 permissions. -// Writes with 0600 permissions (following paths.go pattern). -func (sp *SkillPatcher) Patch(ctx context.Context, category, proposedUpdate, categoryDescription string) error { - if category == "" { - return fmtErr("category is required") - } - if err := sanitizeCategory(category); err != nil { - return err - } - if proposedUpdate == "" { - // No-op: nothing to patch - slog.Debug("llmem: skillpatch: empty proposed update, skipping patch", "category", category) - return nil - } - - skillFile, err := sp.FindSkillFile(ctx, category) - if err != nil { - return fmtErr("find skill file: %w", err) - } - - if skillFile == "" { - // No matching skill file found; create a new one - skillFile, err = sp.createSkillFile(category, categoryDescription) - if err != nil { - return fmtErr("create skill file: %w", err) - } - slog.Info("llmem: skillpatch: created new skill file", "path", skillFile, "category", category) - } - - // Check for duplicate patch content - existingContent, readErr := os.ReadFile(skillFile) - if readErr != nil && !os.IsNotExist(readErr) { - return fmtErr("read skill file %s: %w", skillFile, readErr) - } - - // Build the patch content - patchContent := buildPatchContent(category, proposedUpdate, time.Now().UTC()) - - // Check idempotency: if the proposed_update text already exists in the file - if string(existingContent) != "" && isDuplicatePatch(string(existingContent), proposedUpdate) { - slog.Debug("llmem: skillpatch: duplicate patch detected, skipping", "category", category) - return nil - } - - // Handle malformed files (no YAML frontmatter) - if string(existingContent) != "" && !hasYAMLFrontmatter(string(existingContent)) { - slog.Warn("llmem: skillpatch: skill file has no YAML frontmatter, appending with comment", "path", skillFile) - patchContent = "\n\n" + patchContent - } - - // Append patch to skill file - var newContent string - if string(existingContent) == "" { - newContent = patchContent - } else { - existing := string(existingContent) - if !strings.HasSuffix(existing, "\n") { - existing += "\n" - } - newContent = existing + patchContent - } - - if err := os.WriteFile(skillFile, []byte(newContent), 0600); err != nil { - return fmtErr("write skill file %s: %w", skillFile, err) - } - - slog.Info("llmem: skillpatch: patched skill file", "path", skillFile, "category", category) - return nil -} - -// FindSkillFile locates the SKILL.md file matching the given category. -// Performs a categorySkillMap lookup to find the skill directory name, then checks -// whether SKILL.md exists as a regular file in that directory. -// Returns the file path or empty string if not found (unknown category or missing file). -// Returns error only on I/O failures, not for "not found" (empty string is a valid result). -func (sp *SkillPatcher) FindSkillFile(ctx context.Context, category string) (string, error) { - skillDirName, ok := categorySkillMap[category] - if !ok { - // Unknown category: no matching skill directory - slog.Debug("llmem: skillpatch: no skill mapping for category", "category", category) - return "", nil - } - - // Look for SKILL.md in the mapped skill directory - candidatePath := filepath.Join(sp.skillDir, skillDirName, "SKILL.md") - - info, err := os.Stat(candidatePath) - if err != nil { - if os.IsNotExist(err) { - return "", nil - } - return "", fmtErr("stat skill file %s: %w", candidatePath, err) - } - - // Verify it's a regular file - if !info.Mode().IsRegular() { - return "", nil - } - - return candidatePath, nil -} - -// ValidatePatch checks whether the error rate in the given category decreased -// after a skill patch was applied. -// This is a pure function: it compares two integer counts and returns a PatchValidation. -// Effective is true when AfterCount < BeforeCount. -// Flagged is true when AfterCount >= BeforeCount. -// Never returns an error for zero-count categories — returns PatchValidation{Effective: false, Flagged: false}. -func ValidatePatch(category string, beforeCount, afterCount int) PatchValidation { - result := PatchValidation{ - Category: category, - BeforeCount: beforeCount, - AfterCount: afterCount, - } - - if beforeCount == 0 { - // Zero before-count means no baseline — cannot determine effectiveness - result.Effective = false - result.Flagged = false - return result - } - - result.Effective = afterCount < beforeCount - result.Flagged = afterCount >= beforeCount - return result -} - -// createSkillFile creates a new SKILL.md file in the appropriate category directory -// with proper YAML frontmatter. -// The category must already pass sanitizeCategory validation before reaching this method. -func (sp *SkillPatcher) createSkillFile(category, categoryDescription string) (string, error) { - skillDirName, ok := categorySkillMap[category] - if !ok { - // No mapping: use sanitized lowercase category as directory name. - // sanitizeCategory has already validated no path traversal characters. - skillDirName = strings.ToLower(category) - } - - dirPath := filepath.Join(sp.skillDir, skillDirName) - - // Security: verify the resolved path stays within skillDir (defense in depth - // against path traversal, even though sanitizeCategory already rejects - // separator characters). - absDirPath, err := filepath.Abs(dirPath) - if err != nil { - return "", fmtErr("resolve skill directory path %s: %w", dirPath, err) - } - absSkillDir, err := filepath.Abs(sp.skillDir) - if err != nil { - return "", fmtErr("resolve skill dir root %s: %w", sp.skillDir, err) - } - if !strings.HasPrefix(absDirPath, absSkillDir+string(filepath.Separator)) && absDirPath != absSkillDir { - return "", fmtErr("invalid category %q: resolved path escapes skill directory", category) - } - - if err := os.MkdirAll(dirPath, 0700); err != nil { - return "", fmtErr("create skill directory %s: %w", dirPath, err) - } - - description := categoryDescription - if description == "" { - description = category - } - - // Sanitize values before inserting into YAML frontmatter to prevent injection - sanitizedCategory := sanitizeYAMLValue(strings.ToLower(category)) - sanitizedDescription := sanitizeYAMLValue(description) - sanitizedHeading := sanitizeYAMLValue(category) - - // Build frontmatter - var sb strings.Builder - sb.WriteString("---\n") - sb.WriteString("name: ") - sb.WriteString(sanitizedCategory) - sb.WriteString("\n") - sb.WriteString("description: >\n ") - sb.WriteString(sanitizedDescription) - sb.WriteString("\nlicense: MIT\n") - sb.WriteString("---\n\n") - sb.WriteString("# ") - sb.WriteString(sanitizedHeading) - sb.WriteString("\n\n") - - filePath := filepath.Join(dirPath, "SKILL.md") - if err := os.WriteFile(filePath, []byte(sb.String()), 0600); err != nil { - return "", fmtErr("write skill file %s: %w", filePath, err) - } - - return filePath, nil -} - -// buildPatchContent constructs the patch section text. -func buildPatchContent(category, proposedUpdate string, now time.Time) string { - var sb strings.Builder - date := now.Format("2006-01-02") - sb.WriteString(fmt.Sprintf("\n## Patch: %s (%s)\n\n", category, date)) - sb.WriteString(fmt.Sprintf("**Detection Rule:** %s\n\n", extractDetectionRule(proposedUpdate))) - sb.WriteString("**Checklist:**\n") - items := extractChecklistItems(proposedUpdate) - for _, item := range items { - sb.WriteString(fmt.Sprintf("- [ ] %s\n", item)) - } - sb.WriteString("\n") - sb.WriteString(fmt.Sprintf("**Pitfall:** %s\n\n", extractPitfall(proposedUpdate))) - sb.WriteString(fmt.Sprintf("**Verification:** %s\n", extractVerification(proposedUpdate))) - return sb.String() -} - -// isDuplicatePatch checks if the proposed_update text already exists in the file content. -func isDuplicatePatch(content, proposedUpdate string) bool { - return strings.Contains(content, proposedUpdate) -} - -// hasYAMLFrontmatter checks if the content starts with "---" YAML frontmatter. -func hasYAMLFrontmatter(content string) bool { - return strings.HasPrefix(content, "---\n") -} - -// extractDetectionRule extracts a detection rule from proposed update content. -// Falls back to a generic rule based on the content itself. -func extractDetectionRule(proposedUpdate string) string { - // Try to find "Detection Rule:" in the content - if idx := strings.Index(proposedUpdate, "Detection Rule:"); idx >= 0 { - after := proposedUpdate[idx+len("Detection Rule:"):] - line := strings.TrimSpace(strings.Split(after, "\n")[0]) - if line != "" { - return line - } - } - // Fallback: use first sentence or truncate - first := strings.Split(proposedUpdate, "\n")[0] - if len(first) > 100 { - first = first[:100] + "..." - } - return first -} - -// extractChecklistItems extracts checklist items from proposed update content. -// Falls back to a single item from the proposed update. -func extractChecklistItems(proposedUpdate string) []string { - items := []string{} - // Look for "Checklist:" section - if idx := strings.Index(proposedUpdate, "Checklist:"); idx >= 0 { - after := proposedUpdate[idx:] - for _, line := range strings.Split(after, "\n") { - trimmed := strings.TrimSpace(line) - if strings.HasPrefix(trimmed, "- ") || strings.HasPrefix(trimmed, "* ") { - item := strings.TrimPrefix(trimmed, "- ") - item = strings.TrimPrefix(item, "* ") - item = strings.TrimSpace(item) - if item != "" { - items = append(items, item) - } - } - // Stop at next section header - if strings.HasPrefix(trimmed, "**") && !strings.HasPrefix(trimmed, "**Checklist") { - break - } - } - } - if len(items) == 0 { - items = append(items, "Review "+strings.Split(proposedUpdate, "\n")[0]) - } - return items -} - -// extractPitfall extracts a pitfall from proposed update content. -func extractPitfall(proposedUpdate string) string { - if idx := strings.Index(proposedUpdate, "Pitfall:"); idx >= 0 { - after := proposedUpdate[idx+len("Pitfall:"):] - line := strings.TrimSpace(strings.Split(after, "\n")[0]) - // Strip markdown bold - line = strings.TrimPrefix(line, "**") - line = strings.TrimSuffix(line, "**") - if line != "" { - return line - } - } - // Try **Pitfall:** format - if idx := strings.Index(proposedUpdate, "**Pitfall:**"); idx >= 0 { - after := proposedUpdate[idx+len("**Pitfall:**"):] - line := strings.TrimSpace(strings.Split(after, "\n")[0]) - if line != "" { - return line - } - } - return "Incomplete handling may recur" -} - -// extractVerification extracts a verification step from proposed update content. -func extractVerification(proposedUpdate string) string { - if idx := strings.Index(proposedUpdate, "Verification:"); idx >= 0 { - after := proposedUpdate[idx+len("Verification:"):] - line := strings.TrimSpace(strings.Split(after, "\n")[0]) - line = strings.TrimPrefix(line, "**") - line = strings.TrimSuffix(line, "**") - if line != "" { - return line - } - } - return "Run llmem search to confirm reduction" -} \ No newline at end of file diff --git a/internal/skillpatch/skillpatch_test.go b/internal/skillpatch/skillpatch_test.go deleted file mode 100644 index 599da57..0000000 --- a/internal/skillpatch/skillpatch_test.go +++ /dev/null @@ -1,567 +0,0 @@ -package skillpatch - -import ( - "context" - "os" - "path/filepath" - "strings" - "testing" - "time" -) - -func TestSkillPatcher_PatchExistingSkill(t *testing.T) { - ctx := context.Background() - - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - introDir := filepath.Join(skillDir, "introspection") - if err := os.MkdirAll(introDir, 0700); err != nil { - t.Fatalf("mkdir: %v", err) - } - - // Create an existing SKILL.md - existingContent := "---\nname: introspection\ndescription: >\n Test skill\nlicense: MIT\n---\n\n# Introspection\n\nSome existing content.\n" - skillFile := filepath.Join(introDir, "SKILL.md") - if err := os.WriteFile(skillFile, []byte(existingContent), 0600); err != nil { - t.Fatalf("write skill: %v", err) - } - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - err = sp.Patch(ctx, "NULL_SAFETY", "Always check for nil before dereferencing pointers in Go", "Missing null checks") - if err != nil { - t.Fatalf("Patch: %v", err) - } - - // Verify the content was appended - data, err := os.ReadFile(skillFile) - if err != nil { - t.Fatalf("read skill: %v", err) - } - content := string(data) - if !strings.Contains(content, "Patch: NULL_SAFETY") { - t.Error("expected patch section header in skill file") - } - if !strings.Contains(content, "Always check for nil") { - t.Error("expected proposed update content in skill file") - } - if !strings.Contains(content, "Some existing content") { - t.Error("expected existing content to be preserved") - } -} - -func TestSkillPatcher_CreateNewSkill(t *testing.T) { - ctx := context.Background() - - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - err = sp.Patch(ctx, "ERROR_HANDLING", "Always check error return values in Go", "Missing error handling") - if err != nil { - t.Fatalf("Patch: %v", err) - } - - // Verify the file was created in the introspection directory - skillFile := filepath.Join(skillDir, "introspection", "SKILL.md") - data, err := os.ReadFile(skillFile) - if err != nil { - t.Fatalf("read skill: %v", err) - } - content := string(data) - if !strings.Contains(content, "Patch: ERROR_HANDLING") { - t.Error("expected patch section in new skill file") - } - if !strings.Contains(content, "---") { - t.Error("expected YAML frontmatter in new skill file") - } -} - -func TestSkillPatcher_PatchWithEmptyCategory(t *testing.T) { - dir := t.TempDir() - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: filepath.Join(dir, "skills"), - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - err = sp.Patch(context.Background(), "", "some content", "") - if err == nil { - t.Error("expected error for empty category") - } - if !strings.Contains(err.Error(), "category is required") { - t.Errorf("expected 'category is required' error, got: %v", err) - } -} - -func TestSkillPatcher_PatchWithEmptyProposedUpdate(t *testing.T) { - dir := t.TempDir() - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: filepath.Join(dir, "skills"), - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - err = sp.Patch(context.Background(), "NULL_SAFETY", "", "") - if err != nil { - t.Errorf("expected nil for empty proposed update, got: %v", err) - } -} - -func TestSkillPatcher_IdempotentPatch(t *testing.T) { - ctx := context.Background() - - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - introDir := filepath.Join(skillDir, "introspection") - if err := os.MkdirAll(introDir, 0700); err != nil { - t.Fatalf("mkdir: %v", err) - } - - existingContent := "---\nname: introspection\ndescription: >\n Test\nlicense: MIT\n---\n\n# Introspection\n" - skillFile := filepath.Join(introDir, "SKILL.md") - if err := os.WriteFile(skillFile, []byte(existingContent), 0600); err != nil { - t.Fatalf("write skill: %v", err) - } - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - proposedUpdate := "Always check nil before dereferencing" - err = sp.Patch(ctx, "NULL_SAFETY", proposedUpdate, "") - if err != nil { - t.Fatalf("First Patch: %v", err) - } - - data1, _ := os.ReadFile(skillFile) - - // Patch again with the same content — should be idempotent - err = sp.Patch(ctx, "NULL_SAFETY", proposedUpdate, "") - if err != nil { - t.Fatalf("Second Patch: %v", err) - } - - data2, _ := os.ReadFile(skillFile) - - // Content should not have grown - if len(data2) != len(data1) { - t.Errorf("expected idempotent patch (same length), got %d bytes then %d bytes", len(data1), len(data2)) - } -} - -func TestSkillPatcher_InvalidSkillDir(t *testing.T) { - // Use a path that can't be created (will fail when trying to create a new file) - invalidDir := "/proc/no-skills-here" - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: invalidDir, - }) - if err != nil { - // NewSkillPatcher itself should succeed — the error happens on Patch - t.Fatalf("NewSkillPatcher should not fail: %v", err) - } - - // Use a category not in categorySkillMap so a new directory needs to be created - err = sp.Patch(context.Background(), "UNKNOWN_CATEGORY", "test content", "") - // This should fail because the directory can't be created under /proc - if err == nil { - t.Error("expected error for invalid skill directory") - } -} - -func TestSkillPatcher_ParsesFrontmatterCorrectly(t *testing.T) { - ctx := context.Background() - - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - introDir := filepath.Join(skillDir, "introspection") - if err := os.MkdirAll(introDir, 0700); err != nil { - t.Fatalf("mkdir: %v", err) - } - - // Create a skill file with frontmatter - existingContent := "---\nname: introspection\ndescription: >\n Introspection skill\nlicense: MIT\n---\n\n# Introspection\n\nExisting content.\n" - skillFile := filepath.Join(introDir, "SKILL.md") - if err := os.WriteFile(skillFile, []byte(existingContent), 0600); err != nil { - t.Fatalf("write skill: %v", err) - } - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - err = sp.Patch(ctx, "ERROR_HANDLING", "Always wrap errors with fmt.Errorf", "") - if err != nil { - t.Fatalf("Patch: %v", err) - } - - data, err := os.ReadFile(skillFile) - if err != nil { - t.Fatalf("read skill: %v", err) - } - content := string(data) - - // Frontmatter should still be present - if !strings.Contains(content, "---\nname: introspection") { - t.Error("expected frontmatter to be preserved after patch") - } - if !strings.Contains(content, "license: MIT") { - t.Error("expected license field to be preserved after patch") - } - if !strings.Contains(content, "Existing content.") { - t.Error("expected existing content to be preserved after patch") - } -} - -func TestSkillPatcher_MalformedFile_NoFrontmatter(t *testing.T) { - ctx := context.Background() - - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - introDir := filepath.Join(skillDir, "introspection") - if err := os.MkdirAll(introDir, 0700); err != nil { - t.Fatalf("mkdir: %v", err) - } - - // Create a skill file WITHOUT frontmatter - existingContent := "# Introspection\n\nSome content without frontmatter.\n" - skillFile := filepath.Join(introDir, "SKILL.md") - if err := os.WriteFile(skillFile, []byte(existingContent), 0600); err != nil { - t.Fatalf("write skill: %v", err) - } - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - err = sp.Patch(ctx, "NULL_SAFETY", "Check nil before deref", "") - if err != nil { - t.Fatalf("Patch on malformed file: %v", err) - } - - data, err := os.ReadFile(skillFile) - if err != nil { - t.Fatalf("read skill: %v", err) - } - content := string(data) - - // Should have the fallback comment - if !strings.Contains(content, "patch appended without frontmatter") { - t.Error("expected fallback comment for malformed file") - } - if !strings.Contains(content, "Patch: NULL_SAFETY") { - t.Error("expected patch section after fallback comment") - } -} - -func TestNewSkillPatcher_DefaultSkillDir(t *testing.T) { - dir := t.TempDir() - t.Setenv("LMEM_HOME", dir) - - sp, err := NewSkillPatcher(SkillPatchConfig{}) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - expected := filepath.Join(dir, "skills") - if sp.skillDir != expected { - t.Errorf("expected skill dir %q, got %q", expected, sp.skillDir) - } -} - -func TestSkillPatcher_FindSkillFile_Existing(t *testing.T) { - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - introDir := filepath.Join(skillDir, "introspection") - if err := os.MkdirAll(introDir, 0700); err != nil { - t.Fatalf("mkdir: %v", err) - } - skillFile := filepath.Join(introDir, "SKILL.md") - if err := os.WriteFile(skillFile, []byte("---\nname: introspection\n---\n"), 0600); err != nil { - t.Fatalf("write: %v", err) - } - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - path, err := sp.FindSkillFile(context.Background(), "NULL_SAFETY") - if err != nil { - t.Fatalf("FindSkillFile: %v", err) - } - if path != skillFile { - t.Errorf("expected %q, got %q", skillFile, path) - } -} - -func TestSkillPatcher_FindSkillFile_NotFound(t *testing.T) { - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - path, err := sp.FindSkillFile(context.Background(), "NULL_SAFETY") - if err != nil { - t.Fatalf("FindSkillFile: %v", err) - } - if path != "" { - t.Errorf("expected empty string for not found, got %q", path) - } -} - -func TestSkillPatcher_FindSkillFile_UnknownCategory(t *testing.T) { - dir := t.TempDir() - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: filepath.Join(dir, "skills"), - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - path, err := sp.FindSkillFile(context.Background(), "UNKNOWN_CATEGORY") - if err != nil { - t.Fatalf("FindSkillFile: %v", err) - } - if path != "" { - t.Errorf("expected empty string for unknown category, got %q", path) - } -} - -func TestValidatePatch_Effective(t *testing.T) { - result := ValidatePatch("NULL_SAFETY", 10, 3) - if !result.Effective { - t.Error("expected Effective=true when after < before") - } - if result.Flagged { - t.Error("expected Flagged=false when after < before") - } -} - -func TestValidatePatch_Flagged(t *testing.T) { - result := ValidatePatch("NULL_SAFETY", 5, 8) - if result.Effective { - t.Error("expected Effective=false when after >= before") - } - if !result.Flagged { - t.Error("expected Flagged=true when after >= before") - } -} - -func TestValidatePatch_ZeroBeforeCount(t *testing.T) { - result := ValidatePatch("NULL_SAFETY", 0, 0) - if result.Effective { - t.Error("expected Effective=false when before=0") - } - if result.Flagged { - t.Error("expected Flagged=false when before=0") - } -} - -func TestValidatePatch_EqualCounts_Flagged(t *testing.T) { - result := ValidatePatch("ERROR_HANDLING", 5, 5) - if result.Effective { - t.Error("expected Effective=false when after == before") - } - if !result.Flagged { - t.Error("expected Flagged=true when after >= before") - } -} - -func TestBuildPatchContent(t *testing.T) { - content := buildPatchContent("NULL_SAFETY", "Always guard nil pointers", time.Date(2025, 6, 15, 0, 0, 0, 0, time.UTC)) - if !strings.Contains(content, "## Patch: NULL_SAFETY (2025-06-15)") { - t.Errorf("expected patch header, got: %s", content) - } - if !strings.Contains(content, "**Detection Rule:**") { - t.Error("expected Detection Rule field") - } - if !strings.Contains(content, "**Checklist:**") { - t.Error("expected Checklist field") - } - if !strings.Contains(content, "**Pitfall:**") { - t.Error("expected Pitfall field") - } - if !strings.Contains(content, "**Verification:**") { - t.Error("expected Verification field") - } -} - -func TestIsDuplicatePatch(t *testing.T) { - content := "Some existing content\nAlways check for nil before dereferencing\nMore content" - if !isDuplicatePatch(content, "Always check for nil before dereferencing") { - t.Error("expected duplicate to be detected") - } - if isDuplicatePatch(content, "Unique text not in file") { - t.Error("expected non-duplicate to not be detected") - } -} - -func TestHasYAMLFrontmatter(t *testing.T) { - if !hasYAMLFrontmatter("---\nname: test\n---\nContent") { - t.Error("expected YAML frontmatter to be detected") - } - if hasYAMLFrontmatter("# Just a heading\nNo frontmatter") { - t.Error("expected no frontmatter detection for plain markdown") - } -} - -func TestSkillPatcher_PathTraversal_Rejected(t *testing.T) { - ctx := context.Background() - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - // Attempt path traversal via category not in categorySkillMap - // Categories with path separators or dots should be rejected - traversalCategories := []string{ - "../../etc", - "../config", - "foo/bar", - "foo\\bar", - ".hidden", - "attacker..evil", - } - for _, tc := range traversalCategories { - t.Run(tc, func(t *testing.T) { - err := sp.Patch(ctx, tc, "malicious update", "malicious description") - if err == nil { - t.Errorf("expected error for path-traversal category %q, got nil", tc) - } - if err != nil && !strings.Contains(err.Error(), "invalid category") { - t.Errorf("expected 'invalid category' error for %q, got: %v", tc, err) - } - }) - } -} - -func TestSkillPatcher_YAMLInjection_Prevented(t *testing.T) { - ctx := context.Background() - dir := t.TempDir() - skillDir := filepath.Join(dir, "skills") - introDir := filepath.Join(skillDir, "introspection") - if err := os.MkdirAll(introDir, 0700); err != nil { - t.Fatalf("mkdir: %v", err) - } - - existingContent := "---\nname: introspection\ndescription: >\n Test\nlicense: MIT\n---\n\n# Introspection\n" - skillFile := filepath.Join(introDir, "SKILL.md") - if err := os.WriteFile(skillFile, []byte(existingContent), 0600); err != nil { - t.Fatalf("write skill: %v", err) - } - - sp, err := NewSkillPatcher(SkillPatchConfig{ - SkillDir: skillDir, - }) - if err != nil { - t.Fatalf("NewSkillPatcher: %v", err) - } - - // Injected newlines in description should be sanitized - maliciousDesc := "Safe description\nmalicious: injected" - err = sp.Patch(ctx, "ERROR_HANDLING", "Check error returns", maliciousDesc) - if err != nil { - t.Fatalf("Patch: %v", err) - } - - data, err := os.ReadFile(skillFile) - if err != nil { - t.Fatalf("read skill: %v", err) - } - content := string(data) - - // The YAML frontmatter should not contain unescaped newlines in the description - // Newlines should be replaced with spaces - if strings.Contains(content, "malicious: injected") { - t.Error("YAML injection: newline in description was not sanitized") - } -} - -func TestSanitizeCategory(t *testing.T) { - tests := []struct { - input string - wantErr bool - }{ - {"NULL_SAFETY", false}, - {"ERROR_HANDLING", false}, - {"simple", false}, - {"../../etc", true}, - {"../config", true}, - {"foo/bar", true}, - {".hidden", true}, - {"a..b", true}, - {"", true}, - {"valid_name123", false}, - {"has space", true}, - {"has\nnewline", true}, - } - for _, tt := range tests { - t.Run(tt.input, func(t *testing.T) { - err := sanitizeCategory(tt.input) - if (err != nil) != tt.wantErr { - t.Errorf("sanitizeCategory(%q) = %v, want error=%v", tt.input, err, tt.wantErr) - } - }) - } -} - -func TestSanitizeYAMLValue(t *testing.T) { - tests := []struct { - name string - input string - want string - }{ - {"plain text", "hello world", "hello world"}, - {"newline replaced", "line1\nline2", "line1 line2"}, - {"carriage return replaced", "line1\rline2", "line1 line2"}, - {"crlf replaced", "line1\r\nline2", "line1 line2"}, - {"tab preserved", "tab\there", "tab\there"}, - {"empty string", "", ""}, - } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - got := sanitizeYAMLValue(tt.input) - if got != tt.want { - t.Errorf("sanitizeYAMLValue(%q) = %q, want %q", tt.input, got, tt.want) - } - }) - } -} \ No newline at end of file diff --git a/internal/store/migration_test.go b/internal/store/migration_test.go index 6104c3d..5a6f362 100644 --- a/internal/store/migration_test.go +++ b/internal/store/migration_test.go @@ -90,14 +90,14 @@ func TestMigration_AllApplied(t *testing.T) { t.Error("expected extraction_log table to exist") } - // Verify memory_types table has 8 types + // Verify memory_types table has 7 types var mtCount int err = db.QueryRow(`SELECT count(*) FROM "memory_types"`).Scan(&mtCount) if err != nil { t.Fatalf("memory_types count: %v", err) } - if mtCount != 8 { - t.Errorf("expected 8 memory types, got %d", mtCount) + if mtCount != 7 { + t.Errorf("expected 7 memory types, got %d", mtCount) } // Verify code_chunks table exists diff --git a/internal/store/models.go b/internal/store/models.go index 3658382..a2918af 100644 --- a/internal/store/models.go +++ b/internal/store/models.go @@ -157,7 +157,6 @@ func DefaultRegisteredTypes() []string { "project_state", "procedure", "conversation", - "self_assessment", } } diff --git a/internal/store/store_test.go b/internal/store/store_test.go index 1792934..7346ebd 100644 --- a/internal/store/store_test.go +++ b/internal/store/store_test.go @@ -1082,7 +1082,7 @@ func TestRegisterMemoryType_InvalidName(t *testing.T) { func TestDefaultRegisteredTypes(t *testing.T) { types := DefaultRegisteredTypes() - expected := []string{"fact", "decision", "preference", "event", "project_state", "procedure", "conversation", "self_assessment"} + expected := []string{"fact", "decision", "preference", "event", "project_state", "procedure", "conversation"} if len(types) != len(expected) { t.Errorf("expected %d types, got %d", len(expected), len(types)) } diff --git a/internal/taxonomy/taxonomy.go b/internal/taxonomy/taxonomy.go index 53e386b..94144ba 100644 --- a/internal/taxonomy/taxonomy.go +++ b/internal/taxonomy/taxonomy.go @@ -1,5 +1,4 @@ -// Package taxonomy provides error taxonomy constants and structured format -// for self_assessment memories in the LLMem project. +// Package taxonomy provides error taxonomy constants for the LLMem project. package taxonomy // ErrorTaxonomy maps error category keys to their descriptions. @@ -36,16 +35,6 @@ func ErrorTaxonomyKeys() []string { } } -// IntrospectCategoryChoices returns a comma-separated string of error taxonomy keys. -func IntrospectCategoryChoices() string { - keys := ErrorTaxonomyKeys() - result := keys[0] - for _, k := range keys[1:] { - result += ", " + k - } - return result -} - // ReviewSeverityTaxonomy maps severity levels to their associated error categories. // This maps human-facing severity labels to the taxonomy categories they encompass. var ReviewSeverityTaxonomy = map[string][]string{ @@ -54,108 +43,4 @@ var ReviewSeverityTaxonomy = map[string][]string{ "Strong Suggestions": {"PERFORMANCE", "DESIGN"}, "Noted": {"OFF_BY_ONE"}, "Passed": {"REVIEW_PASSED"}, -} - -// Field represents a structured field with a name and description. -type Field struct { - Name string - Description string -} - -// SelfAssessmentFields returns the ordered list of field name+description pairs -// for self_assessment memories. Returns a new slice each time (defensive copy). -func SelfAssessmentFields() []Field { - return []Field{ - {Name: "Category", Description: "Error category from the taxonomy above"}, - {Name: "Context", Description: "What you were doing when this happened"}, - {Name: "What_happened", Description: "Describe the error or issue"}, - {Name: "Outcomes", Description: "What happened as a result (broke tests, deployed bug, etc.)"}, - {Name: "What_caught_it", Description: "How was this caught? (self-review, CI, user report, etc.)"}, - {Name: "Estimates_vs_actual", Description: "How long did you think vs how long it took"}, - {Name: "Recurring", Description: "Is this a pattern? (yes/no, with reference to prior)"}, - {Name: "Proposed_update", Description: "What rule or procedure should change to prevent recurrence"}, - {Name: "Iteration_count", Description: "How many attempts before success (integer). 1 = first try"}, - } -} - -// IntrospectFieldLines returns a formatted string of self-assessment field names -// and descriptions, one per line, suitable for inclusion in prompts. -func IntrospectFieldLines() string { - fields := SelfAssessmentFields() - result := "" - for i, f := range fields { - if i > 0 { - result += "\n" - } - result += " " + f.Name + ": " + f.Description - } - return result -} - -// ParseSelfAssessment parses "Key: Value" lines from self-assessment content -// into a map. Lines without ": " are skipped. -func ParseSelfAssessment(content string) map[string]string { - result := map[string]string{} - for _, line := range splitLines(content) { - idx := findColonSpace(line) - if idx < 0 { - continue - } - key := trimSpace(line[:idx]) - val := trimSpace(line[idx+2:]) - if key != "" { - result[key] = val - } - } - return result -} - -// splitLines splits content into lines. -func splitLines(s string) []string { - var lines []string - start := 0 - for i := 0; i < len(s); i++ { - if s[i] == '\n' { - lines = append(lines, s[start:i]) - start = i + 1 - } - } - if start < len(s) { - lines = append(lines, s[start:]) - } - return lines -} - -// findColonSpace finds the index of ": " in s, or -1 if not found. -func findColonSpace(s string) int { - for i := 0; i < len(s)-1; i++ { - if s[i] == ':' && s[i+1] == ' ' { - return i - } - } - return -1 -} - -// ParseSelfAssessmentField extracts a named field value from structured self_assessment content. -// Uses ParseSelfAssessment internally. Returns empty string if field not found, never returns an error. -// This is needed by both introspect (to populate IntrospectResult.ProposedUpdate) and -// skillpatch (to extract proposed_update from stored memories). -func ParseSelfAssessmentField(content, field string) string { - parsed := ParseSelfAssessment(content) - if val, ok := parsed[field]; ok { - return val - } - return "" -} - -// trimSpace trims leading and trailing whitespace. -func trimSpace(s string) string { - // Fast path for common cases - for len(s) > 0 && (s[0] == ' ' || s[0] == '\t' || s[0] == '\r') { - s = s[1:] - } - for len(s) > 0 && (s[len(s)-1] == ' ' || s[len(s)-1] == '\t' || s[len(s)-1] == '\r') { - s = s[:len(s)-1] - } - return s } \ No newline at end of file diff --git a/internal/taxonomy/taxonomy_test.go b/internal/taxonomy/taxonomy_test.go index 150b423..871261b 100644 --- a/internal/taxonomy/taxonomy_test.go +++ b/internal/taxonomy/taxonomy_test.go @@ -12,7 +12,6 @@ func TestErrorTaxonomy_KeysMatch(t *testing.T) { t.Errorf("ErrorTaxonomyKeys contains %q but ErrorTaxonomy does not", k) } } - // Verify all taxonomy entries are in keys keySet := map[string]bool{} for _, k := range keys { keySet[k] = true @@ -46,23 +45,12 @@ func TestErrorTaxonomyKeys_ReturnsDefensiveCopy(t *testing.T) { if len(keys1) != len(keys2) { t.Fatal("keys lengths differ") } - // Modifying one should not affect the other keys1[0] = "MODIFIED" if keys2[0] == "MODIFIED" { t.Error("ErrorTaxonomyKeys should return defensive copies") } } -func TestIntrospectCategoryChoices(t *testing.T) { - choices := IntrospectCategoryChoices() - if !strings.Contains(choices, "NULL_SAFETY") { - t.Error("should contain NULL_SAFETY") - } - if !strings.Contains(choices, "REVIEW_PASSED") { - t.Error("should contain REVIEW_PASSED") - } -} - func TestReviewSeverityTaxonomy_Blocking(t *testing.T) { blocking := ReviewSeverityTaxonomy["Blocking"] expected := []string{"AUTH_BYPASS", "RACE_CONDITION", "DATA_INTEGRITY"} @@ -76,87 +64,75 @@ func TestReviewSeverityTaxonomy_Blocking(t *testing.T) { } } -func TestSelfAssessmentFields_OrderAndCount(t *testing.T) { - fields := SelfAssessmentFields() - if len(fields) != 9 { - t.Errorf("expected 9 fields, got %d", len(fields)) - } - if fields[0].Name != "Category" { - t.Errorf("first field should be Category, got %q", fields[0].Name) - } - if fields[len(fields)-1].Name != "Iteration_count" { - t.Errorf("last field should be Iteration_count, got %q", fields[len(fields)-1].Name) - } -} - -func TestSelfAssessmentFields_DefensiveCopy(t *testing.T) { - f1 := SelfAssessmentFields() - f2 := SelfAssessmentFields() - f1[0].Name = "MODIFIED" - if f2[0].Name == "MODIFIED" { - t.Error("SelfAssessmentFields should return defensive copies") - } -} - -func TestIntrospectFieldLines(t *testing.T) { - lines := IntrospectFieldLines() - if !strings.Contains(lines, "Category:") { - t.Error("should contain Category field") - } - if !strings.Contains(lines, "Iteration_count:") { - t.Error("should contain Iteration_count field") - } -} - -func TestParseSelfAssessment_Basic(t *testing.T) { - content := "Category: ERROR_HANDLING\nWhat_happened: swallowed error\nProposed_update: always check errors" - result := ParseSelfAssessment(content) - if result["Category"] != "ERROR_HANDLING" { - t.Errorf("Category: expected ERROR_HANDLING, got %q", result["Category"]) - } - if result["What_happened"] != "swallowed error" { - t.Errorf("What_happened: expected 'swallowed error', got %q", result["What_happened"]) - } - if result["Proposed_update"] != "always check errors" { - t.Errorf("Proposed_update: expected 'always check errors', got %q", result["Proposed_update"]) - } -} - -func TestParseSelfAssessment_SkipsLinesWithoutColonSpace(t *testing.T) { - content := "Category: EDGE_CASE\njust a line without colon-space\nWhat_happened: detail" - result := ParseSelfAssessment(content) - if len(result) != 2 { - t.Errorf("expected 2 entries, got %d", len(result)) - } -} - -func TestParseSelfAssessment_Empty(t *testing.T) { - result := ParseSelfAssessment("") - if len(result) != 0 { - t.Errorf("expected empty map, got %d entries", len(result)) +// splitLines is a local copy for testing. +func splitLines(s string) []string { + var lines []string + start := 0 + for i := 0; i < len(s); i++ { + if s[i] == '\n' { + lines = append(lines, s[start:i]) + start = i + 1 + } } -} - -func TestParseSelfAssessmentField_Found(t *testing.T) { - content := "Category: ERROR_HANDLING\nWhat_happened: swallowed error\nProposed_update: always check errors" - val := ParseSelfAssessmentField(content, "Proposed_update") - if val != "always check errors" { - t.Errorf("expected 'always check errors', got %q", val) + if start < len(s) { + lines = append(lines, s[start:]) } + return lines } -func TestParseSelfAssessmentField_NotFound(t *testing.T) { - content := "Category: ERROR_HANDLING\nWhat_happened: swallowed error" - val := ParseSelfAssessmentField(content, "Proposed_update") - if val != "" { - t.Errorf("expected empty string for missing field, got %q", val) +func TestSplitLines(t *testing.T) { + tests := []struct { + name string + input string + want int + }{ + {"empty", "", 0}, + {"single line", "hello", 1}, + {"two lines", "hello\nworld", 2}, + {"trailing newline", "hello\n", 1}, + } + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got := splitLines(tt.input) + if len(got) != tt.want { + t.Errorf("splitLines(%q) returned %d lines, want %d", tt.input, len(got), tt.want) + } + }) } } -func TestParseSelfAssessmentField_Category(t *testing.T) { - content := "Category: NULL_SAFETY\nWhat_happened: nil dereference" - val := ParseSelfAssessmentField(content, "Category") - if val != "NULL_SAFETY" { - t.Errorf("expected 'NULL_SAFETY', got %q", val) +func TestParseKeyValue(t *testing.T) { + tests := []struct { + name string + content string + key string + want string + }{ + {"exact match", "Category: ERROR_HANDLING\nWhat: detail", "Category", "ERROR_HANDLING"}, + {"missing key", "What: detail", "Category", ""}, + {"proposed update", "Category: RACE_CONDITION\nProposed_update: always use mutex", "Proposed_update", "always use mutex"}, + } + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + // Use the shared splitLines and findColonSpace via ParseKeyValue-like logic + lines := splitLines(tt.content) + for _, line := range lines { + idx := strings.Index(line, ": ") + if idx < 0 { + continue + } + key := strings.TrimSpace(line[:idx]) + if key == tt.key { + val := strings.TrimSpace(line[idx+2:]) + if val != tt.want { + t.Errorf("got %q, want %q", val, tt.want) + } + return + } + } + if tt.want != "" { + t.Errorf("key %q not found in content", tt.key) + } + }) } } \ No newline at end of file diff --git a/migrations/003_register_default_types.sql b/migrations/003_register_default_types.sql index 25b5bed..15517a3 100644 --- a/migrations/003_register_default_types.sql +++ b/migrations/003_register_default_types.sql @@ -11,7 +11,6 @@ INSERT OR IGNORE INTO "memory_types" ("name") VALUES ('event'); INSERT OR IGNORE INTO "memory_types" ("name") VALUES ('project_state'); INSERT OR IGNORE INTO "memory_types" ("name") VALUES ('procedure'); INSERT OR IGNORE INTO "memory_types" ("name") VALUES ('conversation'); -INSERT OR IGNORE INTO "memory_types" ("name") VALUES ('self_assessment'); -- +goose Down -DELETE FROM "memory_types" WHERE "name" IN ('fact', 'decision', 'preference', 'event', 'project_state', 'procedure', 'conversation', 'self_assessment'); \ No newline at end of file +DELETE FROM "memory_types" WHERE "name" IN ('fact', 'decision', 'preference', 'event', 'project_state', 'procedure', 'conversation'); \ No newline at end of file diff --git a/plugins/agent/hooks/hooks.json b/plugins/agent/hooks/hooks.json index ae68d44..a91027e 100644 --- a/plugins/agent/hooks/hooks.json +++ b/plugins/agent/hooks/hooks.json @@ -7,7 +7,7 @@ "hooks": [ { "type": "command", - "command": "llmem stats 2>/dev/null && echo '---' && llmem search behavioral --type self_assessment --limit 5 2>/dev/null && echo '---' && llmem search proposed --type procedure --limit 5 2>/dev/null", + "command": "llmem stats 2>/dev/null && echo '---' && llmem search proposed --type procedure --limit 5 2>/dev/null", "timeout": 30 } ] diff --git a/plugins/agent/skills/introspection-review-tracker/SKILL.md b/plugins/agent/skills/introspection-review-tracker/SKILL.md deleted file mode 100644 index 6a0a3b2..0000000 --- a/plugins/agent/skills/introspection-review-tracker/SKILL.md +++ /dev/null @@ -1,114 +0,0 @@ ---- -name: introspection-review-tracker -description: > - Reference documentation for the automated ReviewOutcomeTracker hook that persists - adversarial code review findings as self_assessment memories. The hook fires - automatically after every review — no manual steps required. This skill documents - the mapping from severity tiers to error categories and the content format. - Triggers on: "review outcome", "track findings", "record review", - "self_assessment". -license: MIT ---- - -# Introspection Review Tracker — Reference Documentation - -This skill is **reference documentation** for the automated review outcome persistence hook. The `llmem track-review` CLI command fires automatically after every adversarial code review, creating `self_assessment` memories without requiring manual agent intervention. This skill documents how the hook works, the severity-to-category mapping, and the content format. - -## How Persistence Works - -Persistence happens **mechanically** via `llmem track-review` (CLI). The agent must run this command after every adversarial code review — it is not optional. - -### CLI: `llmem track-review` - -The primary mechanism. Run after every review: - -```bash -# For a review with findings (batch mode): -llmem track-review --finding-file /tmp/review-findings.json --context "handler.py" - -# For a single finding: -llmem track-review --category NULL_SAFETY --what-happened "missing null check" --context "handler.py:42" --severity Required --caught-by self-review - -# For a clean review (no findings): -llmem track-review --context "handler.py" - -# See categories for a severity tier: -llmem suggest-categories Required -``` - -The `--finding-file` expects a JSON array of finding objects, each with `category`, `what_happened`, and optional `severity` keys. - -### Python API - -The Python API for programmatic review outcome tracking is planned but not yet implemented. Use the `llmem track-review` CLI command as the primary interface. Once available, the API will follow this pattern: - -```python -from llmem.store import MemoryStore - -store = MemoryStore(db_path=db_path) - -# Programmatic tracking will mirror the CLI: -# - Single finding → one self_assessment memory -# - Batch findings → one memory per finding -# - Clean review → REVIEW_PASSED memory -# - Category suggestions → via llmem/taxonomy.py constants -``` - -## Verification - -After an adversarial code review completes, verify that the post-review command was run: - -1. Check that at least one `self_assessment` memory was created: - ```bash - llmem search "review_tracker" --type self_assessment - ``` - -2. For each finding, confirm the category matches the severity tier mapping below. - -3. For clean reviews, confirm one `REVIEW_PASSED` memory with outcomes "all clear". - -## When This Skill Is Used - -- **After every adversarial code review completion** — the hook must be run mechanically. -- **To verify** that the hook ran correctly (see Verification above). -- **As reference** for understanding the severity-to-category mapping and content format. - -## Severity-to-Category Mapping - -The `REVIEW_SEVERITY_TAXONOMY` constant in `llmem/taxonomy.py` maps each severity tier to applicable error taxonomy categories. The `llmem suggest-categories` CLI command uses this mapping directly. - -| Severity Tier | Applicable Categories | Guidance | -|---|---|---| -| Blocking | AUTH_BYPASS, RACE_CONDITION, DATA_INTEGRITY | Security holes, data corruption risks, logic errors | -| Required | NULL_SAFETY, ERROR_HANDLING, MISSING_VERIFICATION, EDGE_CASE | Quality gaps — slop, missing safety checks, unhandled cases | -| Strong Suggestions | PERFORMANCE, DESIGN | Suboptimal approaches, missing tests, unclear intent | -| Noted | OFF_BY_ONE | Minor style issues, small boundary errors | -| Passed | REVIEW_PASSED | Clean review with no findings — positive outcome | - -These mappings are advisory — the agent should pick the most meaningful category for the actual finding, not follow them mechanically. A Required-tier finding about performance might still map to PERFORMANCE rather than NULL_SAFETY. Note that the reviewer's tier name is "Strong Suggestions" (not just "Suggestions"); the taxonomy key matches this exactly. - -## Content Format - -Memories created by `llmem track-review` use the `SELF_ASSESSMENT_FIELDS` format from `llmem/taxonomy.py:29-39`, ensuring format parity with the `llmem introspect` manual mode: - -``` -Category: -Context: -What_happened: -Outcomes: -What_caught_it: -Estimates_vs_actual: -Recurring: <"yes" or "no"> -Proposed_update: -Iteration_count: -``` - -## Key References - -- **CLI command**: `llmem track-review` — the mechanical post-review hook -- **CLI command**: `llmem suggest-categories` — list categories for a severity tier -- **Error taxonomy categories**: `llmem/taxonomy.py:3-15` → `ERROR_TAXONOMY` -- **Severity mapping**: `llmem/taxonomy.py:21-27` → `REVIEW_SEVERITY_TAXONOMY` -- **Self-assessment fields**: `llmem/taxonomy.py:29-39` → `SELF_ASSESSMENT_FIELDS` -- **Reviewer severity tiers**: Defined by the adversarial review skill (e.g., Blocking, Required, Strong Suggestions, Noted, Passed) — see your review skill's Severity Tiers section -- **llmem introspect command**: `skills/llmem/SKILL.md` (manual and automatic modes) diff --git a/plugins/agent/skills/introspection/SKILL.md b/plugins/agent/skills/introspection/SKILL.md deleted file mode 100644 index a6adf37..0000000 --- a/plugins/agent/skills/introspection/SKILL.md +++ /dev/null @@ -1,159 +0,0 @@ ---- -name: introspection -description: > - Operational reference for the LLMem introspection framework. Loads when - doing reflective work, self-assessment, session-end review, or error pattern - analysis. Triggers on: "introspection", "self-assessment", "self-review", - "session end", "reflect", "post-mortem", "sampajanna", "vigilance check", - "introspect", "error taxonomy". -license: MIT ---- - -# Introspection Skill - -This skill is the **executable version** of the introspection framework — concise, actionable rules and procedures, not theory. - -## Section 1: Eight Principles of Machine Introspection - -Each principle is a one-line rule. - -1. **Behavioral, Not Narrative** — Examine what was actually done (outputs, tool calls, errors), not explanations for why. Trust the log, not the narrative. - -2. **Accumulation Over Instance** — Single sessions are unreliable. Patterns across sessions reveal truth. Record consistently so patterns can emerge. - -3. **Specificity Enables Change** — "Be more careful" is useless. "When editing Python database code, always null-guard `.close()` in finally blocks" is actionable. - -4. **Close the Loop or Don't Bother** — Observe → record → detect pattern → modify procedure → re-observe. If the last two steps are missing, the introspection is performative. - -5. **Outside View Over Inside View** — Treat your own output the way you'd treat someone else's. Apply the same skepticism and the same checks. - -6. **Continuous Vigilance, Not Just Bookends** — Monitor during execution, not just before and after. Run sampajanna checks at every breakpoint (see Section 3). - -7. **Externalize Everything** — If a self-assessment isn't persisted in llmem, it didn't happen. The context window resets every session. Memory is the medium of self-knowledge. - -8. **Error-Centered, Not Success-Centered** — Failures carry more signal than successes. Record failures specifically — not "I made an error" but "I skipped null-checking in Python DB operations." Specificity enables future pattern-matching. - -## Section 2: Self-Assessment Checklist - -The session-end checklist covers six steps. This section explains how the introspection skill enriches each step. - -**Step 1: Did you search memory before making assumptions?** — Enrichment: before answering "yes", verify by running `llmem search "" --type decision` for each assumption you made. Record any assumption you acted on without a memory search as a `self_assessment` with `--category DESIGN`. - -**Step 2: Did you run task-intake on unfamiliar repos?** — Enrichment: if you skipped task-intake on a repo you hadn't worked in recently, run `llmem introspect --category MISSING_VERIFICATION --what-happened "skipped task-intake before editing in " --context "" --caught-by self-review` to record the gap. - -**Step 3: Did you self-review with an adversarial code reviewer?** — Enrichment: after the review completes, run `llmem track-review` to persist findings as `self_assessment` memories. Then verify that memories were created by checking `llmem search "review_tracker" --type self_assessment`. If the `track-review` command was not run, run it manually as a fallback. - -**Step 4: Did you record findings as self_assessment memories?** — Enrichment: the primary mechanism is `llmem track-review` (CLI) which runs mechanically after every adversarial code review invocation. Use `llmem introspect --category ` as a fallback only for findings the hook missed. For recurring patterns (3+ occurrences), search `llmem search "" --type self_assessment` to check for recurrence before recording. - -**Step 5: Did you commit and push?** — Enrichment: if you skipped this step, record why with `llmem add --type self_assessment "skipped commit/push because "` and flag it as a `MISSING_VERIFICATION` pattern. - -**Step 6: Did you record skipped steps and why?** — Enrichment: use `llmem introspect --category MISSING_VERIFICATION --what-happened "skipped because " --context ""` to make skipped steps traceable. - -## Section 3: Sampajanna Checks (Continuous Vigilance) - -Sampajanna (clear comprehension) is continuous self-monitoring during task execution. - -**When to run these checks:** before committing, before declaring done, when switching between subtasks, when a test fails. - -### Laxity Check — Am I cutting corners? -- Origin: Buddhist sampajanna — detecting dullness and sloppiness in one's cognitive state. -- Triggers: three specific questions about skipped verification, first-answer acceptance, and rushed error handling. -- Action: If you answer "yes" to any laxity question, stop and address it before continuing. Record the finding with `llmem introspect --category MISSING_VERIFICATION`. - -### Excitation Check — Am I going off track? -- Origin: Sampajanna — detecting agitation and reactivity that causes tangential problem-solving. -- Triggers: three questions about solving the actual problem, approach divergence, and over-engineering. -- Action: If you answer "yes" to any excitation question, re-read the original task description. Record with `llmem introspect --category DESIGN`. - -### Quality Check — Am I being sloppy? -- Origin: Sampajanna — monitoring the quality of one's output against standards. -- Triggers: three questions about error handling, edge cases, and consistency with codebase patterns. -- Action: If you answer "yes" to any quality question, fix the issue before continuing. Record with the appropriate `ERROR_TAXONOMY` category (e.g., `ERROR_HANDLING`, `EDGE_CASE`, `NULL_SAFETY`). - -## Section 4: Error Taxonomy - -The canonical source of truth is `llmem/taxonomy.py:3-15`. Category names and descriptions below are reproduced verbatim from that file. When the taxonomy is updated, update `llmem/taxonomy.py` first — the skill follows. - -| Category | Description | -|----------|-------------| -| NULL_SAFETY | Missing null/None/undefined checks before property access or method calls | -| ERROR_HANDLING | Missing try/except, bare except, swallowed errors, unhandled promise rejections | -| OFF_BY_ONE | Boundary errors, wrong loop bounds, fencepost errors | -| RACE_CONDITION | Concurrency issues, async/await problems, missing locks | -| AUTH_BYPASS | Missing auth checks, SSRF, injection vulnerabilities, security oversights | -| DATA_INTEGRITY | Stale derived fields, out-of-sync caches/embeddings/indexes, source-of-truth divergence | -| MISSING_VERIFICATION | Skipped test steps, unverified outputs, assumed-it-works | -| EDGE_CASE | Unhandled empty input, unexpected types, boundary values | -| PERFORMANCE | N+1 queries, unnecessary recomputation, memory leaks | -| DESIGN | Architectural issues, wrong abstraction level, coupling problems | -| REVIEW_PASSED | Clean review with no findings — positive outcome for tracking purposes | - -### Severity-to-Category Mapping - -The `REVIEW_SEVERITY_TAXONOMY` at `llmem/taxonomy.py:21-27` maps reviewer severity tiers to likely error categories: - -| Severity Tier | Applicable Categories | -|---|---| -| Blocking | AUTH_BYPASS, RACE_CONDITION, DATA_INTEGRITY | -| Required | NULL_SAFETY, ERROR_HANDLING, MISSING_VERIFICATION, EDGE_CASE | -| Strong Suggestions | PERFORMANCE, DESIGN | -| Noted | OFF_BY_ONE | -| Passed | REVIEW_PASSED | - -The `introspection-review-tracker` skill (`skills/introspection-review-tracker/SKILL.md`) bridges these: after an adversarial code review run, it converts each finding into a `self_assessment` memory using the appropriate category. - -### Using the Taxonomy - -```bash -# Record a self-assessment with a specific category -llmem introspect --category NULL_SAFETY --what-happened "missing null check before .field access" --context "handler.py:42" --caught-by self-review - -# Search for recurring patterns in a category -llmem search "NULL_SAFETY" --type self_assessment - -# Run automatic introspection on a session transcript -llmem introspect --auto --session - -# Run automatic introspection on arbitrary text -llmem introspect --auto --text "Encountered null pointer error when processing user input" -``` - -See `skills/llmem/SKILL.md` for the full `llmem introspect` command reference (manual and automatic modes). - -## Section 5: Outside-View Review Questions - -Principle 5 (Outside View Over Inside View) addresses the introspection illusion (Pronin 2007): people assess themselves more accurately when they treat their own output as if someone else produced it. Vague instructions like "be more careful" or "think harder" do not work. The instruction must direct attention to observable behavior. - -The deployed outside-view procedures are in two locations: - -1. **Contrastive self-assessment** — four specific behavioral checks to run before declaring any task done: verify test results, check actual HTTP responses, read actual output files, compare output against objective standards. - -2. **Adversarial review questions** — four specific questions that force outside-view perspective during self-review: would you flag this issue in someone else's PR, are you giving yourself passes, what would an adversarial reviewer find, and where are you trusting reasoning instead of verifying behavior. If your agent framework provides an adversarial code-review skill, load it and apply its questions. - -3. **Pre-PR introspection illusion check** — three checks embedded in a pre-PR review protocol: is the author trusting reasoning without evidence, are there self-serving assumptions, and what would an adversarial reviewer see that the author doesn't. If your agent framework provides a pre-PR review skill, apply its introspection illusion checks. - -**The principle:** Before declaring work done, switch perspective. Treat your output as if someone else wrote it. Apply the standards you'd apply to their code, not your own. Do not trust your intent — verify your output. - -## Section 6: Trigger Conditions - -Load this skill when: - -1. **At session end** — before running the session-end checklist. Load this skill first so the enrichment procedures from Section 2 are available during the checklist. - -2. **After running self-review** — after an adversarial code-review completes, load this skill to run the `introspection-review-tracker` skill and persist findings as `self_assessment` memories. - -3. **When running `llmem introspect`** — load this skill to ensure you're using the correct taxonomy categories and recording all required fields. - -4. **When changing behavioral directives in agent instructions** — any change that affects self-review, vigilance checks, or session-end procedures should reference this skill for consistency. - -5. **When reflecting on errors or patterns** — load this skill before running `llmem search "" --type self_assessment` to check for recurring error patterns. - -These triggers correspond to the keywords in the `description` field: "introspection", "self-assessment", "self-review", "session end", "reflect", "post-mortem", "sampajanna", "vigilance check", "introspect", "error taxonomy". - -## Key References - -- **Error taxonomy (source of truth):** `llmem/taxonomy.py:3-15` (`ERROR_TAXONOMY`), `llmem/taxonomy.py:21-27` (`REVIEW_SEVERITY_TAXONOMY`), `llmem/taxonomy.py:29-39` (`SELF_ASSESSMENT_FIELDS`). -- **Review-specific questions:** Apply the outside-view questions from your agent framework's adversarial code-review skill (Section 7, if available). -- **Pre-PR introspection illusion check:** Apply the introspection illusion checks from your agent framework's pre-PR review skill (if available). -- **Review outcome persistence:** `skills/introspection-review-tracker/SKILL.md` — Convert review findings to `self_assessment` memories. -- **llmem introspect command:** `skills/llmem/SKILL.md:152-164` — CLI reference for structured self-assessment. diff --git a/plugins/agent/skills/llmem-setup/SKILL.md b/plugins/agent/skills/llmem-setup/SKILL.md index 8f3c412..ebbd6b3 100644 --- a/plugins/agent/skills/llmem-setup/SKILL.md +++ b/plugins/agent/skills/llmem-setup/SKILL.md @@ -122,8 +122,7 @@ Memory is working memory, not a startup ritual. Search before assuming. **Session start — MANDATORY:** 1. `llmem stats` — check memory health -2. `llmem search "behavioral" --type self_assessment --limit 5` — surface recurring error patterns -3. `llmem search "proposed" --type procedure --limit 5` — check for proposed procedural memories +2. `llmem search "topic" --limit 5` — search for relevant memories **Mid-session search triggers — search whenever:** - Looking up how something works diff --git a/plugins/agent/skills/llmem/SKILL.md b/plugins/agent/skills/llmem/SKILL.md index 741287f..2bc4c4f 100644 --- a/plugins/agent/skills/llmem/SKILL.md +++ b/plugins/agent/skills/llmem/SKILL.md @@ -51,41 +51,6 @@ copilot: | project_state | Current status of a project or system | | procedure | How-to knowledge, step sequences | | conversation | Notable conversation outcomes or commitments | -| self_assessment | Structured introspective records — error patterns, behavioral corrections, recurring mistakes, proposed procedural updates | - -## Error Taxonomy - -Self-assessment memories are categorized using a standard error taxonomy. Each category identifies a class of mistake that the introspection system tracks for pattern detection: - -| Category | Description | -|----------|-------------| -| NULL_SAFETY | Missing null/None/undefined checks before property access or method calls | -| ERROR_HANDLING | Missing try/except, bare except, swallowed errors, unhandled promise rejections | -| OFF_BY_ONE | Boundary errors, wrong loop bounds, fencepost errors | -| RACE_CONDITION | Concurrency issues, async/await problems, missing locks | -| AUTH_BYPASS | Missing auth checks, SSRF, injection vulnerabilities, security oversights | -| DATA_INTEGRITY | Stale derived fields, out-of-sync caches/embeddings/indexes, source-of-truth divergence | -| MISSING_VERIFICATION | Skipped test steps, unverified outputs, assumed-it-works | -| EDGE_CASE | Unhandled empty input, unexpected types, boundary values | -| PERFORMANCE | N+1 queries, unnecessary recomputation, memory leaks | -| DESIGN | Architectural issues, wrong abstraction level, coupling problems | -| REVIEW_PASSED | Clean review with no findings — positive outcome for tracking purposes | - -## Structured self_assessment Format - -Self-assessment memories follow a structured format with nine fields: - -| Field | Required | Description | -|-------|----------|-------------| -| Category | Yes | Taxonomy category from the Error Taxonomy above (e.g. `NULL_SAFETY`, `ERROR_HANDLING`) | -| Context | No | Where and when — file, task, session date | -| What_happened | Yes | Behavioral description, not narrative | -| Outcomes | No | What were the results? Did things work on first try or require iterations? | -| What_caught_it | No | How the error was discovered (`self-review`, `test`, `user`, `CI`) | -| Estimates_vs_actual | No | Was the complexity assessment accurate? Did tasks take more or less effort? | -| Recurring | No | `yes` or `no`; if yes, reference past self_assessment IDs | -| Proposed_update | No | Specific procedural directive to prevent recurrence | -| Iteration_count | No | How many attempts before success (integer). 1 = first try, 2 = one retry, etc. | ## Key Commands @@ -129,10 +94,9 @@ llmem context --session-id # Inject context for a new llmem context --compacting --session-id # Inject key memories during compaction # Session lifecycle hooks -llmem hook --type idle --session-id # Memory extraction + introspection +llmem hook --type idle --session-id # Memory extraction llmem hook --type created --session-id # Context injection on session start -llmem hook --type ending --session-id # Automatic introspection on session end -llmem hook --type ending --session-id --model glm-5.1:cloud --base-url http://localhost:11434 +llmem hook --type ending --session-id # Memory extraction on session end llmem hook --type compacting --session-id # Context during compaction # Dream — background consolidation (decay, boost, promote, merge) @@ -144,26 +108,6 @@ llmem dream --phase rem # Run only the REM pha llmem dream --apply --phase deep # Apply only the deep phase llmem dream --apply --report /path/to/report.html # Generate HTML dream report -# Learn a lesson from a wrong→right correction -llmem learn --wrong "called wrong function" --right "call correctFunction() instead" --context "handler.py:42" - -# Introspect — analyze a failure and store self_assessment memory - -# Manual mode: specify fields directly -llmem introspect --category NULL_SAFETY --what-happened "missing null check" --context "handler.py:42" --caught-by self-review --proposed-fix "always check for None before .field" -llmem introspect --category NULL_SAFETY --what-happened "missing null check" --model glm-5.1:cloud --base-url http://localhost:11434 - -# Automatic mode: introspect a session transcript or arbitrary text -llmem introspect --auto --session # Read transcript from OpenCode adapter -llmem introspect --auto --text "Encountered a null pointer error" # Introspect arbitrary text -llmem introspect --auto --session --model glm-5.1:cloud --base-url http://localhost:11434 - -# Track review findings as self_assessment memories (automatic post-review hook) -llmem track-review --single --category NULL_SAFETY --context "handler.py:42" # Single finding (uses --single flag) -llmem track-review --findings /tmp/review-findings.json --context "handler.py" # Batch mode: persist findings from JSON file -llmem track-review --clean # Invalidate all existing track-review memories -llmem track-review # Clean review (no findings) → creates REVIEW_PASSED memory - # Export/import llmem export --output memories.json llmem import memories.json @@ -210,7 +154,6 @@ final_score = rrf_score * (1 - blend) + weighted_signal * blend | procedure | 1.1 | | fact | 1.0 | | project_state | 1.0 | -| self_assessment | 1.0 | | event | 0.9 | | conversation | 0.7 | @@ -222,10 +165,8 @@ final_score = rrf_score * (1 - blend) + weighted_signal * blend - **ANN vector index** — semantic search uses sqlite-vec (`vec0` virtual table) for fast ANN retrieval, with automatic fallback to brute-force cosine similarity if sqlite-vec is not available. - **Confidence** is 0.0-1.0. Higher = more certain. Facts from the user directly should be 0.9+, auto-extracted should be 0.7. - **Context generation** is what gets injected into the system prompt for context. Use `llmem context --session-id ` to preview what gets injected. -- **Session hooks** use `llmem hook --type --session-id `. The idle hook processes the session's transcript, extracts memories, and runs introspection automatically. The ending hook performs automatic introspection on the session transcript and stores a `self_assessment` memory; use `--model` and `--base-url` to configure the LLM for introspection. +- **Session hooks** use `llmem hook --type --session-id `. The idle hook processes the session's transcript and extracts memories automatically. The ending hook extracts memories from the session transcript. - **Access tracking** — `llmem get` is read-only and does not update `access_count` or `accessed_at`. Search operations automatically track access — each returned result's `access_count` and `accessed_at` are updated (best-effort). -- **Calibration status metadata** — procedure memories created by behavioral insights receive `calibration_status` (trend: `decreasing`, `stable`, or `increasing`) and `calibrated_at` metadata when calibration runs. Stale procedures get `stale_procedure: true` and `stale_at` metadata. These are visible via `llmem get `. -- **Review outcome tracking** — `llmem track-review` persists review findings as `self_assessment` memories. Three modes: `--single` for a single finding, `--findings ` for batch from JSON, or no flags for a clean review (creates `REVIEW_PASSED` memory). Use `--clean` to invalidate all existing track-review memories before storing new ones. ## Dream — Background Consolidation @@ -234,7 +175,7 @@ The dream system is an automated memory maintenance pipeline that runs three pha - **Light** — finds near-duplicates using cosine similarity (configurable threshold, default 0.92). Produces merge candidates for the deep phase. - **Deep** — decays idle memories (confidence decreases over time), boosts frequently accessed memories, promotes high-scoring memories, and merges near-duplicates using LLM-assisted merge with fallback to concatenation. -- **REM** — extracts themes and clusters from memory, writes a human-readable dream diary to `~/.config/llmem/dream-diary.md`. Self-assessment memories are grouped by error taxonomy category (e.g. "2 self_assessment memories about NULL_SAFETY") for pattern detection. When a category has 3+ occurrences (configurable via `skill_patch_threshold`), the REM phase generates three outputs: (1) a **procedural memory** (Tier 1 — automatic, low confidence), (2) a **behavioral insight** entry in `proposed-changes.md` (Tier 2 — human review), and (3) a **skill patch** entry in `proposed-changes.md` marked with `[SKILL PATCH]` (Tier 3 — human review). Skill patches are structured markdown snippets with Detection Rule, Checklist, Pitfall, and Verification sections that can be appended to existing skills or used as mini-skills. They are NOT auto-applied — they require human review and deployment. When behavioral insights are generated, **calibration tracking** compares self_assessment error rates (or average iteration counts) before and after each adaptation was introduced, marking them as effective (decreasing) or ineffective (stable/increasing). Procedure memories that are never accessed and older than `stale_procedure_days` are aggressively decayed (confidence reduced at double the normal decay rate). The dream diary includes a `### Calibration` section with per-category effectiveness and stale procedure counts. +- **REM** — extracts themes and clusters from memory, writes a human-readable dream diary to `~/.config/llmem/dream-diary.md`. Produces type counts, word clusters, and total/active memory counts. **Default mode is dry-run** — use `--apply` to actually make changes. Without it, `llmem dream` only previews what would happen. Use `--report /path/to/report.html` to generate an HTML dream report. @@ -253,6 +194,4 @@ The dream system is an automated memory maintenance pipeline that runs three pha | `boost_amount` | 0.05 | Confidence boost amount | | `diary_path` | ~/.config/llmem/dream-diary.md | Path to dream diary file | | `report_path` | (none) | Path for HTML dream report output | -| `behavioral_threshold` | 3 | Minimum self_assessment occurrences to trigger behavioral insight | -| `behavioral_lookback_days` | 30 | Days of self_assessment memories for behavioral insights | | `auto_link_threshold` | (none) | Cosine similarity threshold for auto-linking related memories | diff --git a/plugins/opencode/llmem.js b/plugins/opencode/llmem.js index f8b2a2b..4a051d2 100644 --- a/plugins/opencode/llmem.js +++ b/plugins/opencode/llmem.js @@ -74,30 +74,6 @@ const LLMemPlugin = async function ({ client, $, directory, worktree }) { if (context) { log(client, "info", INJECT_TAG + "\n" + context); } - - const behavioral = run([ - "search", - "behavioral", - "--type", - "self_assessment", - "--limit", - "5", - ]); - if (behavioral) { - log(client, "info", "## LLMem Behavioral Patterns\n" + behavioral); - } - - const proposed = run([ - "search", - "proposed", - "--type", - "procedure", - "--limit", - "5", - ]); - if (proposed) { - log(client, "info", "## LLMem Proposed Procedures\n" + proposed); - } } if (event.type === "session.idle") { diff --git a/skills/introspection-review-tracker/SKILL.md b/skills/introspection-review-tracker/SKILL.md deleted file mode 100644 index 6a0a3b2..0000000 --- a/skills/introspection-review-tracker/SKILL.md +++ /dev/null @@ -1,114 +0,0 @@ ---- -name: introspection-review-tracker -description: > - Reference documentation for the automated ReviewOutcomeTracker hook that persists - adversarial code review findings as self_assessment memories. The hook fires - automatically after every review — no manual steps required. This skill documents - the mapping from severity tiers to error categories and the content format. - Triggers on: "review outcome", "track findings", "record review", - "self_assessment". -license: MIT ---- - -# Introspection Review Tracker — Reference Documentation - -This skill is **reference documentation** for the automated review outcome persistence hook. The `llmem track-review` CLI command fires automatically after every adversarial code review, creating `self_assessment` memories without requiring manual agent intervention. This skill documents how the hook works, the severity-to-category mapping, and the content format. - -## How Persistence Works - -Persistence happens **mechanically** via `llmem track-review` (CLI). The agent must run this command after every adversarial code review — it is not optional. - -### CLI: `llmem track-review` - -The primary mechanism. Run after every review: - -```bash -# For a review with findings (batch mode): -llmem track-review --finding-file /tmp/review-findings.json --context "handler.py" - -# For a single finding: -llmem track-review --category NULL_SAFETY --what-happened "missing null check" --context "handler.py:42" --severity Required --caught-by self-review - -# For a clean review (no findings): -llmem track-review --context "handler.py" - -# See categories for a severity tier: -llmem suggest-categories Required -``` - -The `--finding-file` expects a JSON array of finding objects, each with `category`, `what_happened`, and optional `severity` keys. - -### Python API - -The Python API for programmatic review outcome tracking is planned but not yet implemented. Use the `llmem track-review` CLI command as the primary interface. Once available, the API will follow this pattern: - -```python -from llmem.store import MemoryStore - -store = MemoryStore(db_path=db_path) - -# Programmatic tracking will mirror the CLI: -# - Single finding → one self_assessment memory -# - Batch findings → one memory per finding -# - Clean review → REVIEW_PASSED memory -# - Category suggestions → via llmem/taxonomy.py constants -``` - -## Verification - -After an adversarial code review completes, verify that the post-review command was run: - -1. Check that at least one `self_assessment` memory was created: - ```bash - llmem search "review_tracker" --type self_assessment - ``` - -2. For each finding, confirm the category matches the severity tier mapping below. - -3. For clean reviews, confirm one `REVIEW_PASSED` memory with outcomes "all clear". - -## When This Skill Is Used - -- **After every adversarial code review completion** — the hook must be run mechanically. -- **To verify** that the hook ran correctly (see Verification above). -- **As reference** for understanding the severity-to-category mapping and content format. - -## Severity-to-Category Mapping - -The `REVIEW_SEVERITY_TAXONOMY` constant in `llmem/taxonomy.py` maps each severity tier to applicable error taxonomy categories. The `llmem suggest-categories` CLI command uses this mapping directly. - -| Severity Tier | Applicable Categories | Guidance | -|---|---|---| -| Blocking | AUTH_BYPASS, RACE_CONDITION, DATA_INTEGRITY | Security holes, data corruption risks, logic errors | -| Required | NULL_SAFETY, ERROR_HANDLING, MISSING_VERIFICATION, EDGE_CASE | Quality gaps — slop, missing safety checks, unhandled cases | -| Strong Suggestions | PERFORMANCE, DESIGN | Suboptimal approaches, missing tests, unclear intent | -| Noted | OFF_BY_ONE | Minor style issues, small boundary errors | -| Passed | REVIEW_PASSED | Clean review with no findings — positive outcome | - -These mappings are advisory — the agent should pick the most meaningful category for the actual finding, not follow them mechanically. A Required-tier finding about performance might still map to PERFORMANCE rather than NULL_SAFETY. Note that the reviewer's tier name is "Strong Suggestions" (not just "Suggestions"); the taxonomy key matches this exactly. - -## Content Format - -Memories created by `llmem track-review` use the `SELF_ASSESSMENT_FIELDS` format from `llmem/taxonomy.py:29-39`, ensuring format parity with the `llmem introspect` manual mode: - -``` -Category: -Context: -What_happened: -Outcomes: -What_caught_it: -Estimates_vs_actual: -Recurring: <"yes" or "no"> -Proposed_update: -Iteration_count: -``` - -## Key References - -- **CLI command**: `llmem track-review` — the mechanical post-review hook -- **CLI command**: `llmem suggest-categories` — list categories for a severity tier -- **Error taxonomy categories**: `llmem/taxonomy.py:3-15` → `ERROR_TAXONOMY` -- **Severity mapping**: `llmem/taxonomy.py:21-27` → `REVIEW_SEVERITY_TAXONOMY` -- **Self-assessment fields**: `llmem/taxonomy.py:29-39` → `SELF_ASSESSMENT_FIELDS` -- **Reviewer severity tiers**: Defined by the adversarial review skill (e.g., Blocking, Required, Strong Suggestions, Noted, Passed) — see your review skill's Severity Tiers section -- **llmem introspect command**: `skills/llmem/SKILL.md` (manual and automatic modes) diff --git a/skills/introspection/SKILL.md b/skills/introspection/SKILL.md deleted file mode 100644 index a6adf37..0000000 --- a/skills/introspection/SKILL.md +++ /dev/null @@ -1,159 +0,0 @@ ---- -name: introspection -description: > - Operational reference for the LLMem introspection framework. Loads when - doing reflective work, self-assessment, session-end review, or error pattern - analysis. Triggers on: "introspection", "self-assessment", "self-review", - "session end", "reflect", "post-mortem", "sampajanna", "vigilance check", - "introspect", "error taxonomy". -license: MIT ---- - -# Introspection Skill - -This skill is the **executable version** of the introspection framework — concise, actionable rules and procedures, not theory. - -## Section 1: Eight Principles of Machine Introspection - -Each principle is a one-line rule. - -1. **Behavioral, Not Narrative** — Examine what was actually done (outputs, tool calls, errors), not explanations for why. Trust the log, not the narrative. - -2. **Accumulation Over Instance** — Single sessions are unreliable. Patterns across sessions reveal truth. Record consistently so patterns can emerge. - -3. **Specificity Enables Change** — "Be more careful" is useless. "When editing Python database code, always null-guard `.close()` in finally blocks" is actionable. - -4. **Close the Loop or Don't Bother** — Observe → record → detect pattern → modify procedure → re-observe. If the last two steps are missing, the introspection is performative. - -5. **Outside View Over Inside View** — Treat your own output the way you'd treat someone else's. Apply the same skepticism and the same checks. - -6. **Continuous Vigilance, Not Just Bookends** — Monitor during execution, not just before and after. Run sampajanna checks at every breakpoint (see Section 3). - -7. **Externalize Everything** — If a self-assessment isn't persisted in llmem, it didn't happen. The context window resets every session. Memory is the medium of self-knowledge. - -8. **Error-Centered, Not Success-Centered** — Failures carry more signal than successes. Record failures specifically — not "I made an error" but "I skipped null-checking in Python DB operations." Specificity enables future pattern-matching. - -## Section 2: Self-Assessment Checklist - -The session-end checklist covers six steps. This section explains how the introspection skill enriches each step. - -**Step 1: Did you search memory before making assumptions?** — Enrichment: before answering "yes", verify by running `llmem search "" --type decision` for each assumption you made. Record any assumption you acted on without a memory search as a `self_assessment` with `--category DESIGN`. - -**Step 2: Did you run task-intake on unfamiliar repos?** — Enrichment: if you skipped task-intake on a repo you hadn't worked in recently, run `llmem introspect --category MISSING_VERIFICATION --what-happened "skipped task-intake before editing in " --context "" --caught-by self-review` to record the gap. - -**Step 3: Did you self-review with an adversarial code reviewer?** — Enrichment: after the review completes, run `llmem track-review` to persist findings as `self_assessment` memories. Then verify that memories were created by checking `llmem search "review_tracker" --type self_assessment`. If the `track-review` command was not run, run it manually as a fallback. - -**Step 4: Did you record findings as self_assessment memories?** — Enrichment: the primary mechanism is `llmem track-review` (CLI) which runs mechanically after every adversarial code review invocation. Use `llmem introspect --category ` as a fallback only for findings the hook missed. For recurring patterns (3+ occurrences), search `llmem search "" --type self_assessment` to check for recurrence before recording. - -**Step 5: Did you commit and push?** — Enrichment: if you skipped this step, record why with `llmem add --type self_assessment "skipped commit/push because "` and flag it as a `MISSING_VERIFICATION` pattern. - -**Step 6: Did you record skipped steps and why?** — Enrichment: use `llmem introspect --category MISSING_VERIFICATION --what-happened "skipped because " --context ""` to make skipped steps traceable. - -## Section 3: Sampajanna Checks (Continuous Vigilance) - -Sampajanna (clear comprehension) is continuous self-monitoring during task execution. - -**When to run these checks:** before committing, before declaring done, when switching between subtasks, when a test fails. - -### Laxity Check — Am I cutting corners? -- Origin: Buddhist sampajanna — detecting dullness and sloppiness in one's cognitive state. -- Triggers: three specific questions about skipped verification, first-answer acceptance, and rushed error handling. -- Action: If you answer "yes" to any laxity question, stop and address it before continuing. Record the finding with `llmem introspect --category MISSING_VERIFICATION`. - -### Excitation Check — Am I going off track? -- Origin: Sampajanna — detecting agitation and reactivity that causes tangential problem-solving. -- Triggers: three questions about solving the actual problem, approach divergence, and over-engineering. -- Action: If you answer "yes" to any excitation question, re-read the original task description. Record with `llmem introspect --category DESIGN`. - -### Quality Check — Am I being sloppy? -- Origin: Sampajanna — monitoring the quality of one's output against standards. -- Triggers: three questions about error handling, edge cases, and consistency with codebase patterns. -- Action: If you answer "yes" to any quality question, fix the issue before continuing. Record with the appropriate `ERROR_TAXONOMY` category (e.g., `ERROR_HANDLING`, `EDGE_CASE`, `NULL_SAFETY`). - -## Section 4: Error Taxonomy - -The canonical source of truth is `llmem/taxonomy.py:3-15`. Category names and descriptions below are reproduced verbatim from that file. When the taxonomy is updated, update `llmem/taxonomy.py` first — the skill follows. - -| Category | Description | -|----------|-------------| -| NULL_SAFETY | Missing null/None/undefined checks before property access or method calls | -| ERROR_HANDLING | Missing try/except, bare except, swallowed errors, unhandled promise rejections | -| OFF_BY_ONE | Boundary errors, wrong loop bounds, fencepost errors | -| RACE_CONDITION | Concurrency issues, async/await problems, missing locks | -| AUTH_BYPASS | Missing auth checks, SSRF, injection vulnerabilities, security oversights | -| DATA_INTEGRITY | Stale derived fields, out-of-sync caches/embeddings/indexes, source-of-truth divergence | -| MISSING_VERIFICATION | Skipped test steps, unverified outputs, assumed-it-works | -| EDGE_CASE | Unhandled empty input, unexpected types, boundary values | -| PERFORMANCE | N+1 queries, unnecessary recomputation, memory leaks | -| DESIGN | Architectural issues, wrong abstraction level, coupling problems | -| REVIEW_PASSED | Clean review with no findings — positive outcome for tracking purposes | - -### Severity-to-Category Mapping - -The `REVIEW_SEVERITY_TAXONOMY` at `llmem/taxonomy.py:21-27` maps reviewer severity tiers to likely error categories: - -| Severity Tier | Applicable Categories | -|---|---| -| Blocking | AUTH_BYPASS, RACE_CONDITION, DATA_INTEGRITY | -| Required | NULL_SAFETY, ERROR_HANDLING, MISSING_VERIFICATION, EDGE_CASE | -| Strong Suggestions | PERFORMANCE, DESIGN | -| Noted | OFF_BY_ONE | -| Passed | REVIEW_PASSED | - -The `introspection-review-tracker` skill (`skills/introspection-review-tracker/SKILL.md`) bridges these: after an adversarial code review run, it converts each finding into a `self_assessment` memory using the appropriate category. - -### Using the Taxonomy - -```bash -# Record a self-assessment with a specific category -llmem introspect --category NULL_SAFETY --what-happened "missing null check before .field access" --context "handler.py:42" --caught-by self-review - -# Search for recurring patterns in a category -llmem search "NULL_SAFETY" --type self_assessment - -# Run automatic introspection on a session transcript -llmem introspect --auto --session - -# Run automatic introspection on arbitrary text -llmem introspect --auto --text "Encountered null pointer error when processing user input" -``` - -See `skills/llmem/SKILL.md` for the full `llmem introspect` command reference (manual and automatic modes). - -## Section 5: Outside-View Review Questions - -Principle 5 (Outside View Over Inside View) addresses the introspection illusion (Pronin 2007): people assess themselves more accurately when they treat their own output as if someone else produced it. Vague instructions like "be more careful" or "think harder" do not work. The instruction must direct attention to observable behavior. - -The deployed outside-view procedures are in two locations: - -1. **Contrastive self-assessment** — four specific behavioral checks to run before declaring any task done: verify test results, check actual HTTP responses, read actual output files, compare output against objective standards. - -2. **Adversarial review questions** — four specific questions that force outside-view perspective during self-review: would you flag this issue in someone else's PR, are you giving yourself passes, what would an adversarial reviewer find, and where are you trusting reasoning instead of verifying behavior. If your agent framework provides an adversarial code-review skill, load it and apply its questions. - -3. **Pre-PR introspection illusion check** — three checks embedded in a pre-PR review protocol: is the author trusting reasoning without evidence, are there self-serving assumptions, and what would an adversarial reviewer see that the author doesn't. If your agent framework provides a pre-PR review skill, apply its introspection illusion checks. - -**The principle:** Before declaring work done, switch perspective. Treat your output as if someone else wrote it. Apply the standards you'd apply to their code, not your own. Do not trust your intent — verify your output. - -## Section 6: Trigger Conditions - -Load this skill when: - -1. **At session end** — before running the session-end checklist. Load this skill first so the enrichment procedures from Section 2 are available during the checklist. - -2. **After running self-review** — after an adversarial code-review completes, load this skill to run the `introspection-review-tracker` skill and persist findings as `self_assessment` memories. - -3. **When running `llmem introspect`** — load this skill to ensure you're using the correct taxonomy categories and recording all required fields. - -4. **When changing behavioral directives in agent instructions** — any change that affects self-review, vigilance checks, or session-end procedures should reference this skill for consistency. - -5. **When reflecting on errors or patterns** — load this skill before running `llmem search "" --type self_assessment` to check for recurring error patterns. - -These triggers correspond to the keywords in the `description` field: "introspection", "self-assessment", "self-review", "session end", "reflect", "post-mortem", "sampajanna", "vigilance check", "introspect", "error taxonomy". - -## Key References - -- **Error taxonomy (source of truth):** `llmem/taxonomy.py:3-15` (`ERROR_TAXONOMY`), `llmem/taxonomy.py:21-27` (`REVIEW_SEVERITY_TAXONOMY`), `llmem/taxonomy.py:29-39` (`SELF_ASSESSMENT_FIELDS`). -- **Review-specific questions:** Apply the outside-view questions from your agent framework's adversarial code-review skill (Section 7, if available). -- **Pre-PR introspection illusion check:** Apply the introspection illusion checks from your agent framework's pre-PR review skill (if available). -- **Review outcome persistence:** `skills/introspection-review-tracker/SKILL.md` — Convert review findings to `self_assessment` memories. -- **llmem introspect command:** `skills/llmem/SKILL.md:152-164` — CLI reference for structured self-assessment. diff --git a/skills/llmem-setup/SKILL.md b/skills/llmem-setup/SKILL.md index 01bc0fc..8a5b190 100644 --- a/skills/llmem-setup/SKILL.md +++ b/skills/llmem-setup/SKILL.md @@ -92,7 +92,7 @@ cd LLMem && npm install ``` This runs `install.js` which: -1. Copies 4 skill directories to `~/.agents/skills/` +1. Copies 2 skill directories to `~/.agents/skills/` 2. Auto-detects your platform (OpenCode, Claude Code, Copilot CLI) 3. Deploys the correct plugin to the right location 4. Deploys OpenCode custom tools to `.opencode/tools/` (if OpenCode detected) @@ -152,10 +152,10 @@ claude --plugin-dir ~/.claude/plugins/llmem ``` The plugin provides: -- **`SessionStart` hook**: Injects `llmem stats` + behavioral patterns + proposed procedures at session start -- **`SessionEnd` hook**: Runs `llmem hook ending` for memory extraction + introspection +- **`SessionStart` hook**: Injects `llmem stats` at session start +- **`SessionEnd` hook**: Runs `llmem hook ending` for memory extraction - **`PreCompact` hook**: Injects key memories before compaction -- **Skills**: `llmem`, `llmem-setup`, `introspection`, `introspection-review-tracker` — loaded on-demand +- **Skills**: `llmem`, `llmem-setup` — loaded on-demand **Instructions in CLAUDE.md — optional.** The `SessionStart` hook injects context. If you want a persistent reminder: @@ -197,7 +197,7 @@ llmem add --type fact --content "test memory" llmem search "test" # Skills are discoverable -ls ~/.agents/skills/llmem ~/.agents/skills/introspection +ls ~/.agents/skills/llmem # Plugin deployed # OpenCode: @@ -232,15 +232,13 @@ Runs nightly at 3am by default. Configure in `~/.config/llmem/config.yaml` under Agent Session │ ├── Plugin (auto, no instructions needed) - │ ├── session.created/start → llmem stats + search behavioral/proposed → inject context - │ ├── session.idle/end → llmem hook idle/ending → extract + introspect + │ ├── session.created/start → llmem stats + search → inject context + │ ├── session.idle/end → llmem hook idle/ending → extract memories │ └── session.compacting → llmem context --compacting → preserve key memories │ ├── Skills (on-demand, loaded by trigger) │ ├── llmem → CLI reference, memory types, commands - │ ├── llmem-setup → This file - │ ├── introspection → Self-assessment framework, error taxonomy - │ └── introspection-review-tracker → Review outcome tracking + │ └── llmem-setup → This file │ └── Custom Tools (structural, zero-instruction) ├── llmem-search → Search memories @@ -263,4 +261,4 @@ The plugin handles everything the agent physically cannot do itself (inject cont **Skills not discovered** — Verify skill directories: `ls ~/.agents/skills/llmem/`. If missing, re-run `node install.js`. -**Context not injected at session start** — Check the plugin log. For OpenCode, run `llmem stats` and `llmem search behavioral --type self_assessment --limit 5` manually to verify the commands work. The plugin runs these same commands. \ No newline at end of file +**Context not injected at session start** — Check the plugin log. For OpenCode, run `llmem stats` manually to verify the command works. The plugin runs these same commands. \ No newline at end of file diff --git a/skills/llmem/SKILL.md b/skills/llmem/SKILL.md index 741287f..2bc4c4f 100644 --- a/skills/llmem/SKILL.md +++ b/skills/llmem/SKILL.md @@ -51,41 +51,6 @@ copilot: | project_state | Current status of a project or system | | procedure | How-to knowledge, step sequences | | conversation | Notable conversation outcomes or commitments | -| self_assessment | Structured introspective records — error patterns, behavioral corrections, recurring mistakes, proposed procedural updates | - -## Error Taxonomy - -Self-assessment memories are categorized using a standard error taxonomy. Each category identifies a class of mistake that the introspection system tracks for pattern detection: - -| Category | Description | -|----------|-------------| -| NULL_SAFETY | Missing null/None/undefined checks before property access or method calls | -| ERROR_HANDLING | Missing try/except, bare except, swallowed errors, unhandled promise rejections | -| OFF_BY_ONE | Boundary errors, wrong loop bounds, fencepost errors | -| RACE_CONDITION | Concurrency issues, async/await problems, missing locks | -| AUTH_BYPASS | Missing auth checks, SSRF, injection vulnerabilities, security oversights | -| DATA_INTEGRITY | Stale derived fields, out-of-sync caches/embeddings/indexes, source-of-truth divergence | -| MISSING_VERIFICATION | Skipped test steps, unverified outputs, assumed-it-works | -| EDGE_CASE | Unhandled empty input, unexpected types, boundary values | -| PERFORMANCE | N+1 queries, unnecessary recomputation, memory leaks | -| DESIGN | Architectural issues, wrong abstraction level, coupling problems | -| REVIEW_PASSED | Clean review with no findings — positive outcome for tracking purposes | - -## Structured self_assessment Format - -Self-assessment memories follow a structured format with nine fields: - -| Field | Required | Description | -|-------|----------|-------------| -| Category | Yes | Taxonomy category from the Error Taxonomy above (e.g. `NULL_SAFETY`, `ERROR_HANDLING`) | -| Context | No | Where and when — file, task, session date | -| What_happened | Yes | Behavioral description, not narrative | -| Outcomes | No | What were the results? Did things work on first try or require iterations? | -| What_caught_it | No | How the error was discovered (`self-review`, `test`, `user`, `CI`) | -| Estimates_vs_actual | No | Was the complexity assessment accurate? Did tasks take more or less effort? | -| Recurring | No | `yes` or `no`; if yes, reference past self_assessment IDs | -| Proposed_update | No | Specific procedural directive to prevent recurrence | -| Iteration_count | No | How many attempts before success (integer). 1 = first try, 2 = one retry, etc. | ## Key Commands @@ -129,10 +94,9 @@ llmem context --session-id # Inject context for a new llmem context --compacting --session-id # Inject key memories during compaction # Session lifecycle hooks -llmem hook --type idle --session-id # Memory extraction + introspection +llmem hook --type idle --session-id # Memory extraction llmem hook --type created --session-id # Context injection on session start -llmem hook --type ending --session-id # Automatic introspection on session end -llmem hook --type ending --session-id --model glm-5.1:cloud --base-url http://localhost:11434 +llmem hook --type ending --session-id # Memory extraction on session end llmem hook --type compacting --session-id # Context during compaction # Dream — background consolidation (decay, boost, promote, merge) @@ -144,26 +108,6 @@ llmem dream --phase rem # Run only the REM pha llmem dream --apply --phase deep # Apply only the deep phase llmem dream --apply --report /path/to/report.html # Generate HTML dream report -# Learn a lesson from a wrong→right correction -llmem learn --wrong "called wrong function" --right "call correctFunction() instead" --context "handler.py:42" - -# Introspect — analyze a failure and store self_assessment memory - -# Manual mode: specify fields directly -llmem introspect --category NULL_SAFETY --what-happened "missing null check" --context "handler.py:42" --caught-by self-review --proposed-fix "always check for None before .field" -llmem introspect --category NULL_SAFETY --what-happened "missing null check" --model glm-5.1:cloud --base-url http://localhost:11434 - -# Automatic mode: introspect a session transcript or arbitrary text -llmem introspect --auto --session # Read transcript from OpenCode adapter -llmem introspect --auto --text "Encountered a null pointer error" # Introspect arbitrary text -llmem introspect --auto --session --model glm-5.1:cloud --base-url http://localhost:11434 - -# Track review findings as self_assessment memories (automatic post-review hook) -llmem track-review --single --category NULL_SAFETY --context "handler.py:42" # Single finding (uses --single flag) -llmem track-review --findings /tmp/review-findings.json --context "handler.py" # Batch mode: persist findings from JSON file -llmem track-review --clean # Invalidate all existing track-review memories -llmem track-review # Clean review (no findings) → creates REVIEW_PASSED memory - # Export/import llmem export --output memories.json llmem import memories.json @@ -210,7 +154,6 @@ final_score = rrf_score * (1 - blend) + weighted_signal * blend | procedure | 1.1 | | fact | 1.0 | | project_state | 1.0 | -| self_assessment | 1.0 | | event | 0.9 | | conversation | 0.7 | @@ -222,10 +165,8 @@ final_score = rrf_score * (1 - blend) + weighted_signal * blend - **ANN vector index** — semantic search uses sqlite-vec (`vec0` virtual table) for fast ANN retrieval, with automatic fallback to brute-force cosine similarity if sqlite-vec is not available. - **Confidence** is 0.0-1.0. Higher = more certain. Facts from the user directly should be 0.9+, auto-extracted should be 0.7. - **Context generation** is what gets injected into the system prompt for context. Use `llmem context --session-id ` to preview what gets injected. -- **Session hooks** use `llmem hook --type --session-id `. The idle hook processes the session's transcript, extracts memories, and runs introspection automatically. The ending hook performs automatic introspection on the session transcript and stores a `self_assessment` memory; use `--model` and `--base-url` to configure the LLM for introspection. +- **Session hooks** use `llmem hook --type --session-id `. The idle hook processes the session's transcript and extracts memories automatically. The ending hook extracts memories from the session transcript. - **Access tracking** — `llmem get` is read-only and does not update `access_count` or `accessed_at`. Search operations automatically track access — each returned result's `access_count` and `accessed_at` are updated (best-effort). -- **Calibration status metadata** — procedure memories created by behavioral insights receive `calibration_status` (trend: `decreasing`, `stable`, or `increasing`) and `calibrated_at` metadata when calibration runs. Stale procedures get `stale_procedure: true` and `stale_at` metadata. These are visible via `llmem get `. -- **Review outcome tracking** — `llmem track-review` persists review findings as `self_assessment` memories. Three modes: `--single` for a single finding, `--findings ` for batch from JSON, or no flags for a clean review (creates `REVIEW_PASSED` memory). Use `--clean` to invalidate all existing track-review memories before storing new ones. ## Dream — Background Consolidation @@ -234,7 +175,7 @@ The dream system is an automated memory maintenance pipeline that runs three pha - **Light** — finds near-duplicates using cosine similarity (configurable threshold, default 0.92). Produces merge candidates for the deep phase. - **Deep** — decays idle memories (confidence decreases over time), boosts frequently accessed memories, promotes high-scoring memories, and merges near-duplicates using LLM-assisted merge with fallback to concatenation. -- **REM** — extracts themes and clusters from memory, writes a human-readable dream diary to `~/.config/llmem/dream-diary.md`. Self-assessment memories are grouped by error taxonomy category (e.g. "2 self_assessment memories about NULL_SAFETY") for pattern detection. When a category has 3+ occurrences (configurable via `skill_patch_threshold`), the REM phase generates three outputs: (1) a **procedural memory** (Tier 1 — automatic, low confidence), (2) a **behavioral insight** entry in `proposed-changes.md` (Tier 2 — human review), and (3) a **skill patch** entry in `proposed-changes.md` marked with `[SKILL PATCH]` (Tier 3 — human review). Skill patches are structured markdown snippets with Detection Rule, Checklist, Pitfall, and Verification sections that can be appended to existing skills or used as mini-skills. They are NOT auto-applied — they require human review and deployment. When behavioral insights are generated, **calibration tracking** compares self_assessment error rates (or average iteration counts) before and after each adaptation was introduced, marking them as effective (decreasing) or ineffective (stable/increasing). Procedure memories that are never accessed and older than `stale_procedure_days` are aggressively decayed (confidence reduced at double the normal decay rate). The dream diary includes a `### Calibration` section with per-category effectiveness and stale procedure counts. +- **REM** — extracts themes and clusters from memory, writes a human-readable dream diary to `~/.config/llmem/dream-diary.md`. Produces type counts, word clusters, and total/active memory counts. **Default mode is dry-run** — use `--apply` to actually make changes. Without it, `llmem dream` only previews what would happen. Use `--report /path/to/report.html` to generate an HTML dream report. @@ -253,6 +194,4 @@ The dream system is an automated memory maintenance pipeline that runs three pha | `boost_amount` | 0.05 | Confidence boost amount | | `diary_path` | ~/.config/llmem/dream-diary.md | Path to dream diary file | | `report_path` | (none) | Path for HTML dream report output | -| `behavioral_threshold` | 3 | Minimum self_assessment occurrences to trigger behavioral insight | -| `behavioral_lookback_days` | 30 | Days of self_assessment memories for behavioral insights | | `auto_link_threshold` | (none) | Cosine similarity threshold for auto-linking related memories | From 5ce506d77b1c4bc1f0aff08a4b4b7b5c829eb3f0 Mon Sep 17 00:00:00 2001 From: Lobsterdog Contributors Date: Thu, 21 May 2026 19:13:04 -0600 Subject: [PATCH 2/2] fix: remove introspection skills from test.js expected list --- test.js | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/test.js b/test.js index dba56de..e0ef521 100644 --- a/test.js +++ b/test.js @@ -20,9 +20,7 @@ const TEMPLATES_DIR = path.join(__dirname, 'templates'); const TOOLS_DIR = path.join(__dirname, '.opencode', 'tools'); const EXPECTED_SKILLS = [ 'llmem', - 'llmem-setup', - 'introspection', - 'introspection-review-tracker' + 'llmem-setup' ]; const EXPECTED_TEMPLATES = [ 'rules.md',