Skip to content

feat: Store raw JSONL entries for future re-parsing#98

Merged
evansenter merged 3 commits into
mainfrom
feat/raw-entries-storage
Jan 27, 2026
Merged

feat: Store raw JSONL entries for future re-parsing#98
evansenter merged 3 commits into
mainfrom
feat/raw-entries-storage

Conversation

@evansenter

Copy link
Copy Markdown
Owner

Summary

  • Add raw_entries table to store unparsed JSONL entries
  • upload_entries now stores both raw and parsed data
  • Add --force flag to push command for re-sending all data
  • Add docs/TAILSCALE_SETUP.md deployment guide

Why

Raw entries enable re-parsing historical data when the parser improves, without losing the original source material.

Changes

  • storage.py: Migration 13 adds raw_entries table, add_raw_entries_batch() method
  • server.py: upload_entries stores raw JSON alongside parsed events
  • cli.py: --force flag skips incremental sync, tracks raw_entries_added
  • guide.md: Documents --force flag and raw entry storage
  • docs/TAILSCALE_SETUP.md: Full deployment guide for multi-machine setup

Test plan

  • All 391 tests pass
  • Test push --force re-sends all entries and populates raw_entries
  • Verify raw_entries table on server after push

🤖 Generated with Claude Code

- Add raw_entries table to store unparsed JSONL entries
- upload_entries now stores both raw and parsed data
- Add --force flag to push command for re-sending all data
- Add docs/TAILSCALE_SETUP.md for deployment guide

Raw entries enable re-parsing historical data when the parser improves,
without losing the original source material.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Jan 27, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds raw JSONL entry storage for future re-parsing capabilities, with a new raw_entries table, --force flag for the push command, and Tailscale deployment documentation.

Issues Found

Critical

None

Important

  • storage.py:180 - SCHEMA_VERSION is still 12 but migration 13 was added. This means the migration will never run on existing databases - they'll check current_version < SCHEMA_VERSION which will be false when current_version=12 and SCHEMA_VERSION=12. Update to SCHEMA_VERSION = 13.
  • docs/SCHEMA.md - The new raw_entries table is not documented. Per CLAUDE.md: "Schema Design: See docs/SCHEMA.md for database tables, indexes, and migration history." Add documentation for the table structure, indexes, and add migration 13 to the Migration History table.

Suggestions

  • tests/test_server.py:427-445 - The existing test_upload_entries doesn't verify the new raw_entries_added field in the response. Consider adding an assertion like assert "raw_entries_added" in result.
  • tests/test_storage.py - No tests for the new add_raw_entries_batch() or get_raw_entry_count() storage methods. These are public API methods that should have test coverage.
  • storage.py:611 - The UNIQUE constraint on (session_id, timestamp, entry_json) includes the full JSON blob. This is unusual and could cause performance issues with large entries. Consider using a hash of entry_json or (session_id, timestamp, uuid) where uuid is extracted from the JSON, since entries already have UUIDs.

Verdict

REQUEST_CHANGES - SCHEMA_VERSION must be updated to 13 for the migration to run, and SCHEMA.md needs the new table documented per project conventions.


Automated review by Claude Code

- Update SCHEMA_VERSION from 12 to 13 (critical: migration wouldn't run)
- Add raw_entries table documentation to SCHEMA.md
- Add tests for add_raw_entries_batch and get_raw_entry_count

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Jan 27, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds raw JSONL entry storage for future re-parsing capabilities, including a new raw_entries table (migration 13), a --force flag for the push command to re-send all entries, and comprehensive Tailscale deployment documentation.

Issues Found

Critical

None

Important

None

Suggestions

  • tests/test_server.py:439-445 - The test_upload_entries test verifies entries_received, events_parsed, events_added, and sessions_updated fields but does not verify the new raw_entries_added field in the response. Consider adding assert "raw_entries_added" in result for completeness.

Previously Addressed (Filtered)

  • storage.py:180 - SCHEMA_VERSION updated to 13 (was raised in prior review)
  • docs/SCHEMA.md - raw_entries table now fully documented with schema, indexes, and migration history
  • tests/test_storage.py - Tests for add_raw_entries_batch() and get_raw_entry_count() now exist in TestRawEntries class
  • storage.py UNIQUE constraint on entry_json - Design decision documented in SCHEMA.md with rationale

4 items from prior feedback rounds were not re-raised.

Verdict

REQUEST_CHANGES - Minor test coverage gap for the new response field.


Automated review by Claude Code

Use unique identifiers to avoid dedup across test runs.

Addresses claude-review suggestion on PR #98.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@evansenter

Copy link
Copy Markdown
Owner Author

Feedback Addressed

Implemented

  • [Suggestion] test_upload_entries - Added assertion for raw_entries_added field. Used unique identifiers to avoid dedup issues across test runs.

@claude

claude Bot commented Jan 27, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a raw_entries table to store unparsed JSONL entries for future re-parsing, implements a --force flag for the push command to re-send all entries, and includes comprehensive Tailscale deployment documentation.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

  • storage.py:180 - SCHEMA_VERSION updated to 13 (Implemented in prior round)
  • docs/SCHEMA.md - raw_entries table now fully documented (Implemented in prior round)
  • tests/test_storage.py - Tests for add_raw_entries_batch() and get_raw_entry_count() added (Implemented in prior round)
  • storage.py UNIQUE constraint - Design decision documented in SCHEMA.md (Implemented in prior round)
  • tests/test_server.py - Added assertion for raw_entries_added field with unique IDs (Implemented per Feedback Addressed)

5 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - All prior feedback has been addressed. The implementation follows project conventions (storage API, migration decorator pattern), includes comprehensive test coverage, and documentation is complete.


Automated review by Claude Code

@evansenter evansenter merged commit 49c84b3 into main Jan 27, 2026
3 checks passed
@evansenter evansenter deleted the feat/raw-entries-storage branch January 27, 2026 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant