This document explains the internal architecture and sync behavior.
sync— run from inside a project repo; syncs only conversations from that repo into it.backup— full export of all conversations into a dedicated backup repo (--output-pathrequired).stats— shows index totals for a given repo.explore— TUI to browse and search exported conversations.hooks— install/uninstall pre-commit hook that runs sync.
The exporter is split into four layers:
cli.py- Typer entrypoint and option parsing.
adapters/- Source-specific parsing and normalization.
codex.py,claude.py,cursor.pyimplement Codex JSONL, Claude projects, and Cursor workspaceStorage.
engine.py- Idempotent sync loop, index management, path mapping, and output writes.
render.py- Markdown transcript rendering and JSON serialization.
NormalizedSession stores:
- source metadata (
source_system,session_id,source_path) - routing metadata (
user,system_name,cwd,started_at) - content (
messages) - stable key (
session_key)
NormalizedMessage stores:
roletext- optional
timestamp
This model lets additional adapters (Claude, Gemini, etc.) plug in without changing engine/output logic.
backup (full export):
history/<user>/<source-system>/<system-name>/<cwd-relative-to-home>/
- If
cwdis under/Users/<name>/, path is made relative to that home. - If
cwdis under/home/<name>/, path is made relative to that home. - Otherwise, the absolute path is sanitized and used.
- Path segments are sanitized to filesystem-safe slugs.
sync (repo-scoped):
history/<user>/<source-system>/
Sessions are written flat — no machine name or cwd nesting.
Basename:
YYYY-MM-DD-HHMM-<slug>
Where:
- date/time is from session start timestamp (
session_meta.payload.timestampfallback to event timestamp) - slug is derived from first user message text (fallback to session id tail)
Outputs per session:
- Markdown transcript:
<basename>.md - Hidden normalized JSON:
.<basename>.json
Index location:
<output-repo>/.convx/index.json
Index record per session key:
fingerprint: SHA-256 of source file bytessource_pathmarkdown_pathjson_pathbasenameupdated_at
Sync algorithm:
- Discover candidate session files with adapter.
- Compute source fingerprint.
- Peek session key (cheap first-line parse).
- If key exists and fingerprint unchanged, skip.
- Else parse + normalize + render + overwrite output files atomically.
- Update index atomically.
Repo-scoped sync (sync):
- Auto-detects the current working directory as the git repo (
repo_filter_path=cwd). - A session is eligible when its
cwdresolves under the repo path. - Fallback matching also accepts sessions whose
cwdcontains the repo folder name (for cross-machine path differences). - Conversations are written into the same repo they are filtered by.
Atomic writing:
- write
*.tmp Path.replace()into final file
Implement a new adapter that exposes:
discover_files(input_path)peek_session(file_path, source_system)parse_session(file_path, source_system, user, system_name)
Then register it in adapters/__init__.py.