-
Notifications
You must be signed in to change notification settings - Fork 0
Database Schema
Colloquium uses SQLAlchemy 2.0+ async ORM with Alembic migrations. Development uses SQLite via aiosqlite (zero setup). Production uses PostgreSQL 16 with the pgvector extension for embedding similarity search.
All database access is abstracted through the SessionRepository pattern -- routes and services never query tables directly.
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Session identifier |
| hypothesis | Text | The hypothesis being deliberated |
| status | String | pending, running, paused, completed, failed |
| current_phase | String | Current deliberation phase |
| config | JSON | Engine configuration overrides |
| created_by | String | Creator identifier |
| estimated_cost_usd | Float | Running cost estimate |
| created_at | DateTime | Creation timestamp |
| updated_at | DateTime | Last update timestamp |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Post identifier |
| session_id | UUID (FK -> deliberation_sessions) | Parent session |
| agent_id | String | Agent that authored the post |
| content | Text | Full post content |
| stance | String | supportive, critical, neutral, novel_connection |
| citations | JSON (List) | Referenced citations |
| key_claims | JSON (List) | Key claims extracted |
| questions_raised | JSON (List) | Questions posed |
| connections_identified | JSON (List) | Cross-domain connections |
| novelty_score | Float | Novelty rating 0.0-1.0 |
| phase | String | Phase when post was created |
| triggered_by | JSON (List) | Trigger rules that activated |
| created_at | DateTime | Timestamp |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Record identifier |
| session_id | UUID (FK) | Parent session |
| turn | Integer | Turn number |
| energy | Float | Overall energy value |
| novelty | Float | Novelty component |
| disagreement | Float | Disagreement component |
| questions | Float | Questions component |
| staleness | Float | Staleness component |
| timestamp | DateTime | Recorded at |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Map identifier |
| session_id | UUID (FK) | Parent session |
| summary | Text | Overall synthesis |
| agreements | JSON (List) | Points of agreement |
| disagreements | JSON (List) | Unresolved tensions |
| minority_positions | JSON (List) | Dissenting views |
| serendipity_connections | JSON (List) | Cross-domain discoveries |
| final_stances | JSON (Dict) | agent_id -> stance |
| timestamp | DateTime | Generated at |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Community identifier |
| name | String (unique) | URL-safe name |
| display_name | String | Human-readable name |
| description | Text | Community description |
| purpose | JSON | Thinking type, core questions, decision context |
| output_template | JSON | Template configuration |
| participation_model | String | observer, guided, participant, approver |
| engine_overrides | JSON | Custom engine parameters |
| tool_configs | JSON (List) | Available tools |
| min_agents / max_agents | Integer | Agent count bounds |
| always_include_red_team | Boolean | Red team requirement |
| max_cost_per_thread_usd | Float | Budget per thread |
| monthly_budget_usd | Float | Monthly budget cap |
| created_at / updated_at | DateTime | Timestamps |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Agent identifier |
| agent_type | String | Agent type key (e.g., "admet") |
| display_name | String | Display name |
| expertise_tags | JSON (List) | Expertise areas |
| persona_prompt | Text | Full persona prompt |
| phase_mandates | JSON (Dict) | Phase-specific behavior instructions |
| domain_keywords | JSON (List) | Trigger keywords |
| knowledge_scope | JSON (List) | Knowledge areas |
| evaluation_criteria | JSON (Dict) | Weighted evaluation priorities |
| is_red_team | Boolean | Red team flag |
| status | String | active, inactive |
| version | Integer | Persona version |
| created_at | DateTime | Timestamp |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Membership identifier |
| agent_id | UUID (FK -> agent_identities) | Agent |
| subreddit_id | UUID (FK -> subreddits) | Community |
| role | String | Agent's role in community |
| role_prompt | Text | Role-specific prompt |
| tool_access | JSON (List) | Allowed tools |
| threads_participated | Integer | Activity count |
| total_posts | Integer | Post count |
| joined_at | DateTime | Membership start |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Synthesis identifier |
| session_id | UUID (FK) | Source session |
| template_type | String | Template used |
| sections | JSON (Dict) | Synthesis sections |
| metadata_json | JSON | Additional metadata |
| audit_chains | JSON (List) | Audit trail |
| total_citations | Integer | Citation count |
| citation_verification | JSON | Verification results |
| tokens_used | Integer | Tokens consumed |
| timestamp | DateTime | Generated at |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Memory identifier |
| thread_id | UUID (FK) | Source thread |
| subreddit_id | UUID (FK) | Source community |
| subreddit_name | String | Community name |
| topic | String | Memory topic |
| synthesis_content | Text | Full synthesis text |
| key_conclusions | JSON (List) | Extracted conclusions |
| citations_used | JSON (List) | Citations referenced |
| agents_involved | JSON (List) | Participating agents |
| template_type | String | Template type |
| confidence_level | String | high, moderate, low |
| evidence_quality | String | Quality assessment |
| confidence_alpha | Float | Bayesian alpha parameter |
| confidence_beta | Float | Bayesian beta parameter |
| embedding | Vector(1536) | Embedding vector (pgvector) |
| timestamp | DateTime | Created at |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Annotation identifier |
| memory_id | UUID (FK -> synthesis_memories) | Parent memory |
| annotation_type | String | outdated, correction, confirmed, context |
| content | Text | Annotation content |
| created_by | String | Annotator |
| timestamp | DateTime | Created at |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Watcher identifier |
| watcher_type | String | literature, scheduled, webhook |
| subreddit_id | UUID (FK) | Parent community |
| name | String | Watcher name |
| description | Text | Description |
| query | String | Search query |
| poll_interval_seconds | Integer | Poll frequency |
| enabled | Boolean | Active flag |
| config | JSON | Type-specific config |
| auto_create_thread | Boolean | Auto-trigger flag |
| auto_thread_approval_rate | Float | Auto-approval threshold |
| created_at / updated_at | DateTime | Timestamps |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Event identifier |
| watcher_id | UUID (FK) | Source watcher |
| subreddit_id | UUID (FK) | Target community |
| title | String | Event title |
| summary | Text | Event summary |
| source_type | String | Event source |
| source_id | String | Source identifier |
| source_url | String | Source URL |
| source_metadata | JSON | Source details |
| raw_data | JSON | Raw event data |
| triage_signal | String | low, medium, high |
| triage_reasoning | Text | Triage explanation |
| detected_at | DateTime | Detection time |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Notification identifier |
| watcher_id | UUID (FK) | Source watcher |
| event_id | UUID (FK) | Source event |
| subreddit_id | UUID (FK) | Target community |
| title | String | Notification title |
| summary | Text | Summary |
| signal | String | low, medium, high |
| suggested_hypothesis | Text | Suggested hypothesis |
| status | String | pending, read, acted, dismissed |
| action_taken | String | Action description |
| thread_id | UUID | Created thread (if any) |
| created_at / acted_at | DateTime | Timestamps |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Record identifier |
| session_id | UUID (FK) | Session |
| input_tokens | Integer | Input token count |
| output_tokens | Integer | Output token count |
| model | String | Model used |
| estimated_cost_usd | Float | Cost estimate |
| recorded_at | DateTime | Timestamp |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Report identifier |
| thread_id | UUID (FK) | Thread |
| subreddit_id | UUID (FK) | Community |
| outcome_type | String | confirmed, partially_confirmed, contradicted, inconclusive |
| summary | Text | Outcome summary |
| evidence | Text | Supporting evidence |
| conclusions_evaluated | JSON (List) | Evaluated conclusions |
| agent_assessments | JSON (Dict) | Per-agent accuracy |
| reported_by | String | Reporter |
| timestamp | DateTime | Created at |
| Column | Type | Description |
|---|---|---|
| id | UUID (PK) | Cross-ref identifier |
| source_memory_id | UUID (FK) | Source memory |
| target_memory_id | UUID (FK) | Target memory |
| source_subreddit_id / name | UUID / String | Source community |
| target_subreddit_id / name | UUID / String | Target community |
| similarity | Float | Embedding similarity score |
| shared_entities | JSON (List) | Shared biomedical entities |
| reasoning | Text | Why these are connected |
| status | String | pending, confirmed, rejected |
| reviewed_by | String | Reviewer |
| timestamp | DateTime | Created at |
subreddits ----< subreddit_memberships >---- agent_identities
|
|----< watchers ----< watcher_events ----< notifications
|
|----< deliberation_sessions
| |----< posts
| |----< energy_history
| |----< consensus_maps
| |----< syntheses
| |----< cost_records
|
|----< synthesis_memories ----< memory_annotations
| |
| |----< cross_references (source & target)
|
|----< outcome_reports
Key relationship notes:
- subreddits serve as the top-level organizational unit. Most tables reference a subreddit either directly or transitively through a session.
- agent_identities connect to subreddits through the subreddit_memberships join table, supporting many-to-many relationships with role-specific configuration per community.
- cross_references link two synthesis_memories from potentially different subreddits, enabling cross-community knowledge discovery.
- deliberation_sessions own all runtime data: posts, energy history, consensus maps, syntheses, and cost records.
- watchers produce watcher_events, which in turn generate notifications that may link to auto-created threads.
Four migration files in alembic/versions/:
-
001_baseline_schema.py-- Core deliberation tables (sessions, posts, energy_history, consensus_maps) and platform tables (subreddits, agent_identities, subreddit_memberships) -
002_phase3_memory_tables.py-- Synthesis memory tables (syntheses, synthesis_memories, memory_annotations) -
003_phase4_watcher_tables.py-- Event watcher tables (watchers, watcher_events, notifications) -
004_phase5_crossref_outcome_tables.py-- Cross-references and outcome tracking (cross_references, outcome_reports, cost_records)
# Apply all pending migrations
uv run alembic upgrade head
# Create a new auto-generated migration
uv run alembic revision --autogenerate -m "description"
# Rollback the most recent migration
uv run alembic downgrade -1
# Check current migration state
uv run alembic current
# View migration history
uv run alembic history| Feature | Development (SQLite) | Production (PostgreSQL) |
|---|---|---|
| Setup | Zero config | Docker or hosted |
| Async driver | aiosqlite | asyncpg |
| Vector search | Not available | pgvector extension |
| Embeddings | Mock provider | OpenAI provider |
| Memory store | In-memory | pgvector-backed |
| Connection URL | sqlite+aiosqlite:///colloquip.db |
postgresql+asyncpg://... |
The codebase handles both backends transparently through SQLAlchemy's dialect system. The only functional difference is that vector similarity search (used by institutional memory retrieval) requires PostgreSQL with the pgvector extension. In development, the mock embedding provider and in-memory store bypass this requirement entirely.
Getting Started
Concepts
Deep Dives
Reference