Releases · robertoshimizu/session-graph

Triple extraction cache (.triple_cache.db) — SQLite cache keyed by message UUID prevents redundant Gemini API calls when the stop hook re-processes sessions. Re-runs rebuild the full RDF graph but skip all cached messages (0 API calls).
Docker Fuseki promoted to primary triplestore on port 3030.

Changed

docker/queue_consumer.py — removed mtime-based watermark check (cache makes it unnecessary).
docker-compose.yml — added volume mount for .triple_cache.db persistence between container and host.
Fixed stale _init_vertex_credentials import in link_entities.py.

Scale

1,334,432 RDF triples | 607+ sessions | 47,868 knowledge triples | 4,774 Wikidata-linked entities

Assets 2

21 Feb 18:51

robertoshimizu

v0.4.0

cfd1b34

v0.4.0 — Real-time Pipeline Automation

Added

RabbitMQ-based pipeline automation — stop hook publishes to RabbitMQ (~33ms) instead of running Python in a background subshell. A long-running pipeline-runner container consumes the queue independently.
Docker Compose stack — rabbitmq (management UI on :15672), pipeline-runner (pika consumer), and fuseki in a single docker compose up -d.
Vertex AI credentials in container — queue_consumer.py decodes GOOGLE_APPLICATION_CREDENTIALS_BASE64 from .env at startup.
Fuseki auth support — load_fuseki.py functions accept optional auth tuple.
Integration test (tests/test_integration.sh) — 16-point end-to-end test.
CI workflow — GitHub Actions with lint + tests, triggers on PRs to both main and open-source/session-graph.

Changed

hooks/stop_hook.sh rewritten: curl POST to RabbitMQ HTTP API replaces background Python process.

Assets 2

21 Feb 11:34

robertoshimizu

v0.3.0

8686d45

v0.3.0 — Entity filtering & cost optimization

What's new

Two-level entity filtering

Level 1 (is_valid_entity() in triple_extraction.py) — 13 filter groups reject garbage at extraction time: filenames, hex colors, CLI flags, ICD codes, snake_case identifiers, DOM selectors, version strings, CSS dimensions, issue refs, function calls, npm scopes, percentage values.
Level 2 (is_linkable_entity() in link_entities.py) — pre-filters before Wikidata API calls, catching anything that slipped through Level 1.
48 whitelisted short terms (ai, api, llm, rdf, sql, etc.) bypass all filters.

Top-10 extraction cap

Prompt now extracts at most 10 triples per message, prioritizing architectural decisions and technology choices. Hard cap enforced in parsing. Median extraction rate (~1.4 triples/msg) is unaffected; caps the noisy long tail.

Frequency-based entity linking

New --min-sessions N flag (default: 2) only links entities appearing in N+ sessions. ~77% of entities are single-session noise.

Impact

Metric	Before	After
Triples per full extraction	75,743	43,949 (-42%)
Entities sent to Wikidata linker	~28,000	~3,729 (-87%)
Linking time (estimated)	~54 hours	~7 hours
Linking cost	~$2.80	~$0.37

Full changelog

See CHANGELOG.md for complete version history.

Assets 2

Releases: robertoshimizu/session-graph

v0.6.0 — Setup Script & Onboarding

Highlights

Added

Changed

Uh oh!

v0.5.0 — Triple Extraction Cache

Added

Changed

Scale

Uh oh!

v0.4.0 — Real-time Pipeline Automation

Added

Changed

Uh oh!

v0.3.0 — Entity filtering & cost optimization

What's new

Two-level entity filtering

Top-10 extraction cap

Frequency-based entity linking

Impact

Full changelog

Uh oh!