Releases: robertoshimizu/session-graph
Releases · robertoshimizu/session-graph
v0.6.0 — Setup Script & Onboarding
Highlights
One-command setup: git clone ... && cd session-graph && ./setup.sh
The interactive setup script handles prerequisites, environment configuration, Python virtualenv, Docker services, and a smoke test — all idempotent.
Added
- Interactive
setup.sh(7-step, idempotent) --authCLI flag onload_fuseki.py(fixes Docker Fuseki 401)- Sample session fixture for smoke testing (no LLM calls needed)
- README troubleshooting table
Changed
stop_hook.shuses dynamic paths (no more hardcoded/Users/...)- README Quick Start simplified to
./setup.sh hooks/directory now tracked in git
See CHANGELOG.md for full details.
v0.5.0 — Triple Extraction Cache
Added
- Triple extraction cache (
.triple_cache.db) — SQLite cache keyed by message UUID prevents redundant Gemini API calls when the stop hook re-processes sessions. Re-runs rebuild the full RDF graph but skip all cached messages (0 API calls). - Docker Fuseki promoted to primary triplestore on port 3030.
Changed
docker/queue_consumer.py— removed mtime-based watermark check (cache makes it unnecessary).docker-compose.yml— added volume mount for.triple_cache.dbpersistence between container and host.- Fixed stale
_init_vertex_credentialsimport inlink_entities.py.
Scale
- 1,334,432 RDF triples | 607+ sessions | 47,868 knowledge triples | 4,774 Wikidata-linked entities
v0.4.0 — Real-time Pipeline Automation
Added
- RabbitMQ-based pipeline automation — stop hook publishes to RabbitMQ (~33ms) instead of running Python in a background subshell. A long-running
pipeline-runnercontainer consumes the queue independently. - Docker Compose stack —
rabbitmq(management UI on :15672),pipeline-runner(pika consumer), andfusekiin a singledocker compose up -d. - Vertex AI credentials in container —
queue_consumer.pydecodesGOOGLE_APPLICATION_CREDENTIALS_BASE64from.envat startup. - Fuseki auth support —
load_fuseki.pyfunctions accept optionalauthtuple. - Integration test (
tests/test_integration.sh) — 16-point end-to-end test. - CI workflow — GitHub Actions with lint + tests, triggers on PRs to both
mainandopen-source/session-graph.
Changed
hooks/stop_hook.shrewritten:curlPOST to RabbitMQ HTTP API replaces background Python process.
v0.3.0 — Entity filtering & cost optimization
What's new
Two-level entity filtering
- Level 1 (
is_valid_entity()intriple_extraction.py) — 13 filter groups reject garbage at extraction time: filenames, hex colors, CLI flags, ICD codes, snake_case identifiers, DOM selectors, version strings, CSS dimensions, issue refs, function calls, npm scopes, percentage values. - Level 2 (
is_linkable_entity()inlink_entities.py) — pre-filters before Wikidata API calls, catching anything that slipped through Level 1. - 48 whitelisted short terms (
ai,api,llm,rdf,sql, etc.) bypass all filters.
Top-10 extraction cap
Prompt now extracts at most 10 triples per message, prioritizing architectural decisions and technology choices. Hard cap enforced in parsing. Median extraction rate (~1.4 triples/msg) is unaffected; caps the noisy long tail.
Frequency-based entity linking
New --min-sessions N flag (default: 2) only links entities appearing in N+ sessions. ~77% of entities are single-session noise.
Impact
| Metric | Before | After |
|---|---|---|
| Triples per full extraction | 75,743 | 43,949 (-42%) |
| Entities sent to Wikidata linker | ~28,000 | ~3,729 (-87%) |
| Linking time (estimated) | ~54 hours | ~7 hours |
| Linking cost | ~$2.80 | ~$0.37 |
Full changelog
See CHANGELOG.md for complete version history.