diff --git a/docs/.vitepress/rfd-summaries.json b/docs/.vitepress/rfd-summaries.json index 0b9167af..3e269cfc 100644 --- a/docs/.vitepress/rfd-summaries.json +++ b/docs/.vitepress/rfd-summaries.json @@ -72,7 +72,7 @@ "summary": "Abandoned RFD split into RFD 048 and RFD 049; preserved for historical context on non-interactive mode design." }, "020-parallel-conversations.md": { - "hash": "81206c64fcf58682afb91178c17bfc945489ea89ecc380428798a5fa1a197724", + "hash": "75ceb2a99600c700dc1fd304961cc5fdd5cdbe3391017b515a1496bab49caf9e", "summary": "Replace global active conversation with per-session tracking and add conversation locks for parallel terminal work." }, "021-printer-live-redirection.md": { @@ -206,5 +206,9 @@ "052-workspace-data-store-sanitization.md": { "hash": "199bf135a50791b21bf4c5ef6028012334cef5f8be9acb9cea14c54f1f461621", "summary": "Workspace sanitization validates and repairs corrupted conversations, trashing broken data with recovery explanations." + }, + "053-auto-refresh-conversation-titles.md": { + "hash": "42b4f764ddcb1a2f2177a2b9d61d56177b723ce40bba39038a0a19370f6c0dfd", + "summary": "Auto-refresh stale conversation titles periodically by re-running LLM generation when accumulated turns exceed a configured threshold." } } diff --git a/docs/rfd/020-parallel-conversations.md b/docs/rfd/020-parallel-conversations.md index c77220a3..21cb0073 100644 --- a/docs/rfd/020-parallel-conversations.md +++ b/docs/rfd/020-parallel-conversations.md @@ -302,8 +302,8 @@ pub struct ConversationLock { obtain a `ConversationLock` is through this method. The lock is acquired once at the start of `jp query` and lives in a long-lived scope (e.g., `Ctx`). -Mutable access to a conversation's event stream requires a reference to the -lock: +Mutable access to a conversation's event stream or metadata requires a reference +to the lock: ```rust impl Workspace { @@ -312,26 +312,35 @@ impl Workspace { id: &ConversationId, _lock: &ConversationLock, ) -> Option<&mut ConversationStream> { /* ... */ } + + pub fn get_conversation_mut( + &mut self, + id: &ConversationId, + _lock: &ConversationLock, + ) -> Option> { /* ... */ } } ``` -All existing mutation methods stay on `ConversationStream`. The only change is -that `get_events_mut` (and `try_get_events_mut`) take a `&ConversationLock` -parameter. Call sites add one argument: +All existing mutation methods stay on `ConversationStream` and `Conversation`. +The change is that `get_events_mut`, `try_get_events_mut`, +`get_conversation_mut`, and `try_get_conversation_mut` take a +`&ConversationLock` parameter. Call sites add one argument: ```rust // before workspace.try_get_events_mut(&cid)?.add_config_delta(delta); +workspace.try_get_conversation_mut(&cid)?.title = Some(title); // after workspace.try_get_events_mut(&cid, &lock)?.add_config_delta(delta); +workspace.try_get_conversation_mut(&cid, &lock)?.title = Some(title); ``` This enforces the lock-before-mutate invariant at the API boundary. You cannot -call `get_events_mut` without proof that the process holds the lock. -`ConversationLock` is held for the entire `jp query` run, so any `&mut -ConversationStream` obtained through it is guaranteed to be protected by the -lock for its entire lifetime. +call `get_events_mut` or `get_conversation_mut` without proof that the process +holds the lock. `ConversationLock` is held for the entire `jp query` run, so any +mutable reference obtained through it is guaranteed to be protected by the lock +for its entire lifetime. The lock file is deleted in the `Drop` implementation of `ConversationLock`. diff --git a/docs/rfd/053-auto-refresh-conversation-titles.md b/docs/rfd/053-auto-refresh-conversation-titles.md new file mode 100644 index 00000000..7ff52384 --- /dev/null +++ b/docs/rfd/053-auto-refresh-conversation-titles.md @@ -0,0 +1,426 @@ +# RFD 053: Auto-Refresh Conversation Titles + +- **Status**: Discussion +- **Category**: Design +- **Authors**: Jean Mertz +- **Date**: 2026-03-18 + +## Summary + +Conversation titles are generated once after the first turn and never +automatically updated. This RFD adds periodic refresh: when +`conversation.title.generate.auto_refresh.turn_interval` is set to a positive +integer N (default 5), the least recently activated conversations that have +accumulated N new turns since their titles were last generated are re-titled as +background tasks on the next `jp query` run. + +## Motivation + +A title is generated on the first turn of a new conversation and then frozen. +This works well for short, focused conversations, but longer ones take +unexpected turns and end up with a title that describes only the opening +exchange. The user is left with a list of conversations whose titles no longer +reflect what they contain. + +The user can already run `conversation edit --title` to manually regenerate a +title, but this requires noticing the problem and taking action. Periodic +automatic refresh should be transparent. + +The fix needs to be careful about cost. Triggering a new LLM request on every +`jp query` invocation for every stale conversation in the workspace would spike +API usage and potentially delay the CLI on exit. This design processes a bounded +number of conversations per run in the background — the same pattern already +used for initial title generation. + +## Design + +### Configuration + +A new `auto_refresh` sub-table is added to `conversation.title.generate`: + +```toml +[conversation.title.generate] +auto = true +model = ... + +[conversation.title.generate.auto_refresh] +turn_interval = 5 # refresh every N turns; 0 = disabled (default = 5) +batch_size = 1 # max conversations to refresh per run; or "all" (default = 1) +turn_context = 10 # max turns sent to LLM for re-titling (default = 10) +``` + +`turn_interval = 0` disables the feature entirely. + +`batch_size` controls how many stale conversations are refreshed per `jp query` +invocation. The default of `1` spreads the work across runs. Setting it to +`"all"` refreshes every stale conversation in a single run — useful for catching +up after enabling the feature on a workspace with many long-running +conversations, at the cost of more LLM requests. + +`turn_context` limits how many recent turns are sent to the LLM when +re-generating a title. For long conversations, earlier turns are often +irrelevant to what the conversation is currently about. The default of `10` +keeps costs predictable and focuses the title on recent activity. Setting it to +`false` disables the limit and sends the full conversation. + +These fields map to a new nested `AutoRefreshConfig` on `GenerateConfig`: + +```rust +pub struct AutoRefreshConfig { + pub turn_interval: usize, // default 5 + pub batch_size: BatchSize, // default Count(1) + pub turn_context: Option, // default Some(10), None = unlimited +} + +enum BatchSize { + Count(usize), + All, +} +``` + +When `conversation.title.generate.auto = false`, auto-refresh is also disabled +regardless of `turn_interval`. + +### Persisted state: `title_generated_at_turn` + +To determine staleness, the system needs to know how many turns existed when the +title was last generated. A new optional field is added to `Conversation` in +`metadata.json`: + +```rust +/// Turn count when the title was last auto-generated. +/// +/// - `None` — legacy conversation; treated as baseline 0 (auto-enrolls in +/// refresh once it has accumulated `turn_interval` turns). +/// - `Some(n)` — title was auto-generated when the conversation had `n` turns. +#[serde(default, skip_serializing_if = "Option::is_none")] +pub title_generated_at_turn: Option, +``` + +This field is a pure watermark, not configuration. It lives in `metadata.json` +alongside `title` because they are tightly coupled — one is the output, the +other is the checkpoint that governs when it is regenerated. + +| Value | Meaning | Auto-refresh eligible? | +|-----------|-----------------------------------|------------------------| +| `None` | Legacy conversation (pre-feature) | Yes, baseline 0 | +| `Some(n)` | Auto-generated at turn `n` | Yes, baseline `n` | + +### Interaction with `conversation edit --title` + +`conversation edit --title` (no argument) regenerates the title via LLM. This +sets `title_generated_at_turn = Some(current_turn_count)`, enrolling the +conversation in future auto-refresh with the current position as baseline. + +`conversation edit --title "My Custom Title"` sets a user-provided title and +writes a `ConfigDelta` event with `auto_refresh.turn_interval = 0` to the +conversation's event stream, disabling auto-refresh for that conversation. The +user explicitly chose this title; the system should not overwrite it. To +re-enable auto-refresh later, the user can run `config set +conversation.title.generate.auto_refresh.turn_interval 5` (or any positive +value) to write a new `ConfigDelta` that re-enrolls the conversation. + +`conversation edit --no-title` removes the title entirely and writes a +`ConfigDelta` event with `auto_refresh.turn_interval = 0`, disabling +auto-refresh for that conversation. The user explicitly chose to have no title; +the system should not generate one later. + +### Background task + +A new `TitleRefreshTask` runs the full refresh pipeline in the background: +candidate scanning, turn counting, stream loading, and LLM calls. The main +thread's only responsibility is spawning the task — all heavy I/O happens off +the critical path. + +#### Spawn (main thread) + +At the start of a `jp query` invocation, after the workspace is loaded, a single +`TitleRefreshTask` is spawned if `turn_interval > 0` and `auto = true`. The task +receives: + +- A `Storage` handle for filesystem access. `Storage` encapsulates directory + structure, dual-root resolution (workspace + user), and file I/O. The task + uses its existing methods (`load_all_conversation_ids`, + `load_conversation_metadata`, `load_conversation_stream`) rather than + performing raw filesystem walks. +- The active conversation ID (to exclude it from candidacy). +- The `AutoRefreshConfig` from configuration. +- Provider and model configuration for the LLM call. + +No conversation scanning or event file reading happens on the main thread. + +#### Run (background) + +The task performs the following steps inside `run()`: + +1. List all conversation IDs via `Storage::load_all_conversation_ids`. +2. For each conversation, load metadata via + `Storage::load_conversation_metadata` to get `title`, + `title_generated_at_turn`, `last_activated_at`, and `turn_count`. + + `load_conversation_metadata` already calls the lightweight + `load_count_and_timestamp_events` to populate `events_count` and + `last_event_at` from `events.json`. This function is extended to also + deserialize the `type` field and count `turn_start` events, populating a new + `turn_count` field on `Conversation`. The extension adds one field to the + internal `RawEvent` struct: + + ```rust + #[derive(serde::Deserialize)] + struct RawEvent { + timestamp: Box, + #[serde(rename = "type")] + event_type: Box, + } + ``` + + A `turn_start` event is counted when `event_type.get()` equals + `"\"turn_start\""`. This avoids a second pass over `events.json`. + +3. Skip the active conversation (it is being actively worked on). +4. Compute staleness: a conversation is stale if `turn_count >= baseline + + turn_interval`, where `baseline` is `title_generated_at_turn.unwrap_or(0)`. +5. Sort stale conversations by `last_activated_at` ascending (least recently + active first). +6. Take up to `batch_size` candidates. +7. For each candidate, load the full `ConversationStream` via + `Storage::load_conversation_stream`. Check the conversation's merged config + via `stream.config()` — if `turn_interval` has been overridden to `0` via a + `ConfigDelta` (e.g., by `conversation edit --title "..."`), skip this + candidate. Otherwise, scope the stream to the last `turn_context` turns (if + configured) and make the LLM title generation call. + +If a file read fails (e.g., partially written by a concurrent session), the task +skips that conversation and moves to the next. + +The task checks its `CancellationToken` between each candidate. If cancelled +(e.g., the user's query has finished and `TaskHandler` is shutting down), the +task stops immediately — it does not wait for an in-flight LLM response or +process remaining candidates. Any titles already generated by completed +candidates are kept and synced; incomplete work is discarded. This avoids +delaying CLI exit when `batch_size` is large or set to `"all"`. + +#### Title retention schema + +To avoid unnecessary title churn, the title generation schema is extended with a +`retain_current` field. The LLM receives the current title in its prompt and can +indicate that it is still adequate: + +```json +{ + "retain_current": false, + "titles": [ + "New title suggestion" + ] +} +``` + +The prompt includes: + +> The conversation currently has the title: "{current_title}". If this title +> still accurately describes the conversation, set `retain_current` to `true`. +> Only generate new titles if the conversation has meaningfully changed +> direction. + +When `retain_current` is `true`, the task advances the `title_generated_at_turn` +checkpoint (recording that the title was evaluated) but leaves `title` +unchanged. This prevents the same conversation from being re-evaluated on every +run while keeping its perfectly good title. + +#### Context window safety + +The `turn_context` setting (default 10) provides the first line of defense +against oversized requests: only the most recent N turns are sent to the LLM. +This scoping happens before any token-level checks. + +The title generation model may still have a smaller context window than those N +turns require. The inquiry system already solves this problem: it estimates +char-based token counts and drops older events to fit the model's context +window. + +The core truncation logic — estimate chars, compare to budget, drop oldest +events, re-sanitize — is extracted from `jp_cli::cmd::query::tool::inquiry` into +a shared utility (in `jp_llm` or `jp_conversation`) that both the inquiry +backend and the title generator can use. Each caller computes its own overhead +(the inquiry system accounts for tools, attachments, and cache-preserving +granularity; the title generator only needs system prompt and title +instructions). + +The pipeline for each candidate is: scope to last `turn_context` turns > +estimate chars > truncate if over budget > send to LLM. + +#### Sync (main thread) + +After the background task completes, `sync` writes results back to the +workspace. For each successfully refreshed candidate: + +1. Update `conversation.title` (unless `retain_current` was `true`). +2. Set `conversation.title_generated_at_turn = Some(turn_count_at_evaluation)`, + where `turn_count_at_evaluation` is the turn count observed by the background + task when it read the conversation's events. + +Using the count at evaluation time rather than at sync time means the checkpoint +advances by what was true when the decision was made, not by any turns added +during the current session. + +If a candidate fails (LLM error, parse failure), the task logs a warning and +skips it. Successful candidates are still synced. + +### Interaction with conversation locks ([RFD 020]) + +Once [RFD 020] lands, conversations are protected by exclusive file locks during +write operations. The title refresh task interacts with locks as follows: + +**Read phase (`run`):** The task reads `metadata.json` and `events.json` through +`Storage` without acquiring a lock. Read-only access does not require a lock per +[RFD 020]'s model. If a concurrent session is writing to a conversation and the +file is partially written, the JSON parse fails and the task skips that +conversation. + +**Write phase (`sync`):** The task attempts a non-blocking lock (`try_lock`) on +each candidate conversation before writing. If the lock is held by another +session, the title update for that conversation is discarded — the work is lost +but no corruption occurs. The conversation will be retried on the next eligible +run. If the lock is free, the task acquires it, writes the metadata update, and +releases it. + +This approach avoids blocking the CLI exit on lock contention and naturally +handles the common case: stale conversations are by definition idle, so lock +contention on them is rare. + +### Spawn location + +The `TitleRefreshTask` is spawned in `query.rs`, alongside the existing +first-turn title spawn. This restricts title refresh to `jp query` — the only +command with a meaningful conversation lifetime and where an LLM call is already +expected. Short-lived commands (`conversation ls`, `conversation edit`, etc.) do +not trigger it. + +The existing first-turn title spawn in `query.rs` is updated to set +`title_generated_at_turn = Some(1)`, so all new conversations have a baseline +and become eligible for future auto-refresh. + +## Drawbacks + +Each `jp query` run spawns a background task that loads metadata for every +conversation via `Storage`. Since `load_conversation_metadata` already reads +`events.json` (for `events_count` and `last_event_at`), the turn counting +extension adds no extra file reads — it piggybacks on the existing lightweight +parse. For workspaces with hundreds of conversations this is nonzero I/O, though +it happens entirely in the background and does not delay the user's query. + +The `retain_current` schema adds a small amount of complexity to the title +generation prompt and response handling. Models may occasionally set +`retain_current = false` and produce a title that is semantically identical to +the original, causing cosmetic churn. This is a minor nuisance, not a +correctness issue. + +## Alternatives + +**Timestamp-based staleness.** Track when the title was generated and refresh if +enough time has elapsed. Rejected: time is a weaker signal than turns. A +conversation that receives one turn per day and one that receives twenty turns +per hour have the same time-based staleness but very different content drift. + +**Use `events_count` as a proxy for turns.** Already computed and readily +available. Rejected: it's imprecise. A single turn with heavy tool use generates +many events; the threshold would behave inconsistently across different usage +patterns. Turn count is the right unit. + +**Cache `turn_count` in `metadata.json`.** Avoids reading `events.json` during +candidacy checks. Rejected: this introduces derived state from `events.json` +into `metadata.json`, breaking the convention that all conversation-level +behavioral state flows through the event stream's `ConfigDelta`. The background +task architecture makes this optimization unnecessary — the I/O happens off the +critical path. + +**Scan on the main thread, load streams in the background.** Perform candidate +selection synchronously and only push the LLM call to the background. Rejected: +candidate scanning requires reading `metadata.json` for every conversation and +`events.json` for stale candidates. This forces eager loading of all +conversation metadata on the main thread, changing `jp query` startup from O(1) +disk reads (active conversation only) to O(N). Moving the entire pipeline to the +background keeps startup cost at O(1). + +## Non-Goals + +This RFD does not change when or how the initial title is generated. The +first-turn behavior is unchanged except for setting `title_generated_at_turn`. + +It does not add any user-visible indication that a title was refreshed in the +background. + +## Risks and Open Questions + +**Concurrent CLI runs.** Two simultaneous `jp query` invocations could both +spawn a `TitleRefreshTask` that selects the same stale conversation. The result +is two LLM requests producing the same (or a slightly different) title — no data +corruption, just a wasted request. Once [RFD 020] lands, the `sync`-phase +locking prevents concurrent metadata writes; the second task's `try_lock` fails +and the update is discarded. + +**Token cost of re-titling long conversations.** The `turn_context` default of +10 bounds the typical cost, but users who set `turn_context = false` (unlimited) +or have very long individual turns may still send large payloads. The context +window truncation utility provides a hard safety net, but the cost scales with +the retained context size. Worth monitoring once the feature ships. + +**Title quality on truncated context.** Both `turn_context` scoping and context +window truncation mean the LLM sees only a suffix of the conversation. The +generated title will reflect recent activity rather than the full arc. This is +an acceptable trade-off — recent activity is usually more relevant to what the +user is currently working on — but users should be aware that titles may shift +focus as the conversation evolves. + +## Implementation Plan + +### Phase 0: Shared truncation utility (independent) + +- Extract the core truncation logic (estimate chars, compare to budget, drop + oldest events, re-sanitize) from `jp_cli::cmd::query::tool::inquiry` into a + shared utility in `jp_llm` or `jp_conversation`. +- Update the inquiry backend to use the shared utility. +- Update `TitleGeneratorTask::update_title` to truncate the event stream when + the title model's context window is smaller than the conversation. + +### Phase 1: State (independent) + +- Add `title_generated_at_turn: Option` to `Conversation` in + `jp_conversation`. +- Add `turn_count: usize` (computed, `#[serde(skip)]`) to `Conversation`. +- Extend `load_count_and_timestamp_events` in `jp_storage` to count `turn_start` + events and populate `turn_count`. +- Update `conversation edit --title` (no argument) to set + `title_generated_at_turn = Some(current_turn_count)` after LLM generation. +- Update `conversation edit --title "..."` (user-provided) to write a + `ConfigDelta` with `auto_refresh.turn_interval = 0`. +- Update `conversation edit --no-title` to write a `ConfigDelta` with + `auto_refresh.turn_interval = 0`. + +### Phase 2: Configuration (independent) + +- Add `AutoRefreshConfig` (with `turn_interval`, `batch_size`, `turn_context`) + as a nested config on `GenerateConfig` in `jp_config`. +- Wire through `AssignKeyValue`, `PartialConfigDelta`, and `ToPartial` impls. + +### Phase 3: Title retention schema (independent) + +- Extend `title_schema` and `title_instructions` in `jp_llm::title` with the + `retain_current` field and current-title prompt context. +- Update `extract_titles` (or add a companion function) to handle the + `retain_current` response. + +### Phase 4: Task and spawn (depends on Phase 0, 1, 2, 3) + +- Implement `TitleRefreshTask` with the full background pipeline: directory + walk, metadata reading, turn counting, stream loading, LLM calls, and + per-candidate sync with `try_lock`. +- Update the existing first-turn title spawn in `query.rs` to set + `title_generated_at_turn = Some(1)`. +- Spawn `TitleRefreshTask` in `query.rs` when `turn_interval > 0` and `auto = + true`. + +Phases 0–3 can be reviewed and merged independently. Phase 4 depends on all of +them. + +[RFD 020]: 020-parallel-conversations.md