Skip to content

Releases: robertoshimizu/session-graph

v0.6.0 — Setup Script & Onboarding

21 Feb 19:18

Choose a tag to compare

Highlights

One-command setup: git clone ... && cd session-graph && ./setup.sh

The interactive setup script handles prerequisites, environment configuration, Python virtualenv, Docker services, and a smoke test — all idempotent.

Added

  • Interactive setup.sh (7-step, idempotent)
  • --auth CLI flag on load_fuseki.py (fixes Docker Fuseki 401)
  • Sample session fixture for smoke testing (no LLM calls needed)
  • README troubleshooting table

Changed

  • stop_hook.sh uses dynamic paths (no more hardcoded /Users/...)
  • README Quick Start simplified to ./setup.sh
  • hooks/ directory now tracked in git

See CHANGELOG.md for full details.

v0.5.0 — Triple Extraction Cache

21 Feb 18:51
cfd1b34

Choose a tag to compare

Added

  • Triple extraction cache (.triple_cache.db) — SQLite cache keyed by message UUID prevents redundant Gemini API calls when the stop hook re-processes sessions. Re-runs rebuild the full RDF graph but skip all cached messages (0 API calls).
  • Docker Fuseki promoted to primary triplestore on port 3030.

Changed

  • docker/queue_consumer.py — removed mtime-based watermark check (cache makes it unnecessary).
  • docker-compose.yml — added volume mount for .triple_cache.db persistence between container and host.
  • Fixed stale _init_vertex_credentials import in link_entities.py.

Scale

  • 1,334,432 RDF triples | 607+ sessions | 47,868 knowledge triples | 4,774 Wikidata-linked entities

v0.4.0 — Real-time Pipeline Automation

21 Feb 18:51
cfd1b34

Choose a tag to compare

Added

  • RabbitMQ-based pipeline automation — stop hook publishes to RabbitMQ (~33ms) instead of running Python in a background subshell. A long-running pipeline-runner container consumes the queue independently.
  • Docker Compose stackrabbitmq (management UI on :15672), pipeline-runner (pika consumer), and fuseki in a single docker compose up -d.
  • Vertex AI credentials in containerqueue_consumer.py decodes GOOGLE_APPLICATION_CREDENTIALS_BASE64 from .env at startup.
  • Fuseki auth supportload_fuseki.py functions accept optional auth tuple.
  • Integration test (tests/test_integration.sh) — 16-point end-to-end test.
  • CI workflow — GitHub Actions with lint + tests, triggers on PRs to both main and open-source/session-graph.

Changed

  • hooks/stop_hook.sh rewritten: curl POST to RabbitMQ HTTP API replaces background Python process.

v0.3.0 — Entity filtering & cost optimization

21 Feb 11:34
8686d45

Choose a tag to compare

What's new

Two-level entity filtering

  • Level 1 (is_valid_entity() in triple_extraction.py) — 13 filter groups reject garbage at extraction time: filenames, hex colors, CLI flags, ICD codes, snake_case identifiers, DOM selectors, version strings, CSS dimensions, issue refs, function calls, npm scopes, percentage values.
  • Level 2 (is_linkable_entity() in link_entities.py) — pre-filters before Wikidata API calls, catching anything that slipped through Level 1.
  • 48 whitelisted short terms (ai, api, llm, rdf, sql, etc.) bypass all filters.

Top-10 extraction cap

Prompt now extracts at most 10 triples per message, prioritizing architectural decisions and technology choices. Hard cap enforced in parsing. Median extraction rate (~1.4 triples/msg) is unaffected; caps the noisy long tail.

Frequency-based entity linking

New --min-sessions N flag (default: 2) only links entities appearing in N+ sessions. ~77% of entities are single-session noise.

Impact

Metric Before After
Triples per full extraction 75,743 43,949 (-42%)
Entities sent to Wikidata linker ~28,000 ~3,729 (-87%)
Linking time (estimated) ~54 hours ~7 hours
Linking cost ~$2.80 ~$0.37

Full changelog

See CHANGELOG.md for complete version history.