-
Notifications
You must be signed in to change notification settings - Fork 2
Handle JSONL compaction events in file watcher #2
Description
Problem
The JSONL file watcher (server/watcher.py) tracks read positions per file using a simple offset dictionary:
# Line 16
_file_positions: dict[str, int] = {}When reading new content, it seeks to the last known position and reads forward:
# Lines 265-269
with open(file_path, "r", encoding="utf-8", errors="replace") as f:
f.seek(last_pos)
new_lines = f.readlines()
_file_positions[file_path] = f.tell()The problem: When Claude Code compacts a session, it rewrites the JSONL file. The file's structure changes — it may be shorter than the previously recorded position, entries may be consolidated, and the byte offsets shift entirely.
What happens today during compaction
- Claude Code rewrites/truncates the JSONL file (the file becomes shorter)
watchfiles(using kqueue on macOS / inotify on Linux) detects the modification_schedule_process()is called, which debounces for 1 second (DEBOUNCE_SECONDS = 1.0at line 19)_process_file_changes()runs:last_posis the old position (e.g., byte 50000)f.seek(50000)on a file that's now only 20000 bytesf.readlines()returns an empty list (we're past EOF)_file_positions[file_path]is updated to the end of the shorter file
- Result: All new content in the compacted file is silently missed. The watcher thinks it's caught up, but it skipped everything.
Additional compaction scenarios
- File replacement (write to temp, rename): On some systems,
watchfilesmay see this as aChange.addedevent for the new file. The old position is still stored under the same path key, leading to the same seek-past-EOF issue. - Multiple rapid compactions: If compaction happens during the debounce window, only the final state is processed — intermediate states are lost (which is actually fine for compaction, but the position tracking is still broken).
What needs to happen
1. Detect file truncation/rewrite
Before seeking to last_pos, check if the file has been truncated:
import os
file_size = os.path.getsize(file_path)
last_pos = _file_positions.get(file_path, 0)
if file_size < last_pos:
# File was truncated/rewritten — reset position and re-read from start
logger.info("File %s appears compacted (size %d < last_pos %d), re-reading",
file_path, file_size, last_pos)
last_pos = 0This is the minimum fix. If file_size < last_pos, the file has been rewritten and we need to re-read from the beginning.
2. Handle duplicate transcript entries on re-read
When we reset to position 0 and re-read the entire compacted file, we'll encounter entries that were already stored in the database from previous reads. The transcript storage logic needs to handle this gracefully:
- Option A: Use upsert/ignore semantics when writing transcripts — if a transcript entry with the same session_id + timestamp + content already exists, skip it.
- Option B: Clear existing transcripts for the session before re-ingesting (destructive, simpler).
- Option C: Track a content hash or message ID alongside the position, and use that to detect which entries are new.
Recommendation: Option A (upsert/ignore) is the safest and most general approach.
3. Handle file-history-snapshot entries
The JSONL parser (watcher.py:25-124) currently skips "file-history-snapshot" type entries. After compaction, Claude Code may include different metadata entries. Verify that the parser handles any new entry types that appear in compacted files gracefully (skips unknown types without crashing).
4. Test coverage
- Truncation test: Write a JSONL file, process it, truncate and rewrite with new content, process again — verify new content is captured and position is reset.
- Shorter-file test: Write a large JSONL, process it, replace with a shorter JSONL, process — verify no data is silently lost.
- Duplicate handling test: Process a file, simulate compaction that keeps some of the same entries, re-process — verify no duplicate transcripts in the database.
- Rapid compaction test: Trigger multiple file rewrites within the debounce window (1 second) — verify the final state is correctly captured.
5. Logging and observability
Add info-level logging when a compaction is detected so operators can correlate any dashboard anomalies with compaction events. Include the old position, new file size, and session ID.
Acceptance criteria
- File truncation/rewrite is detected by comparing file size to stored position
- Position is reset to 0 when truncation is detected
- Re-reading after compaction does not create duplicate transcript entries
- Unknown JSONL entry types in compacted files are handled gracefully
- Unit tests for truncation detection, duplicate handling, and rapid compaction
- Info-level logging when compaction is detected
Technical context
- Watcher uses
watchfileslibrary (awatch) with recursive monitoring of~/.claude/projects/ - Only
Change.addedandChange.modifiedevents are processed (line 399) - 1-second debounce per file path (
watcher.py:18-20, 368-381) - Position tracking is in-memory only — lost on server restart (which is fine, but means restart also triggers a full re-read)