Database Schema

Overview

Colloquium uses SQLAlchemy 2.0+ async ORM with Alembic migrations. Development uses SQLite via aiosqlite (zero setup). Production uses PostgreSQL 16 with the pgvector extension for embedding similarity search.

All database access is abstracted through the SessionRepository pattern -- routes and services never query tables directly.

Tables

Core Deliberation Tables

deliberation_sessions

Column	Type	Description
id	UUID (PK)	Session identifier
hypothesis	Text	The hypothesis being deliberated
status	String	pending, running, paused, completed, failed
current_phase	String	Current deliberation phase
config	JSON	Engine configuration overrides
created_by	String	Creator identifier
estimated_cost_usd	Float	Running cost estimate
created_at	DateTime	Creation timestamp
updated_at	DateTime	Last update timestamp

posts

Column	Type	Description
id	UUID (PK)	Post identifier
session_id	UUID (FK -> deliberation_sessions)	Parent session
agent_id	String	Agent that authored the post
content	Text	Full post content
stance	String	supportive, critical, neutral, novel_connection
citations	JSON (List)	Referenced citations
key_claims	JSON (List)	Key claims extracted
questions_raised	JSON (List)	Questions posed
connections_identified	JSON (List)	Cross-domain connections
novelty_score	Float	Novelty rating 0.0-1.0
phase	String	Phase when post was created
triggered_by	JSON (List)	Trigger rules that activated
created_at	DateTime	Timestamp

energy_history

Column	Type	Description
id	UUID (PK)	Record identifier
session_id	UUID (FK)	Parent session
turn	Integer	Turn number
energy	Float	Overall energy value
novelty	Float	Novelty component
disagreement	Float	Disagreement component
questions	Float	Questions component
staleness	Float	Staleness component
timestamp	DateTime	Recorded at

consensus_maps

Column	Type	Description
id	UUID (PK)	Map identifier
session_id	UUID (FK)	Parent session
summary	Text	Overall synthesis
agreements	JSON (List)	Points of agreement
disagreements	JSON (List)	Unresolved tensions
minority_positions	JSON (List)	Dissenting views
serendipity_connections	JSON (List)	Cross-domain discoveries
final_stances	JSON (Dict)	agent_id -> stance
timestamp	DateTime	Generated at

Platform Tables

subreddits

Column	Type	Description
id	UUID (PK)	Community identifier
name	String (unique)	URL-safe name
display_name	String	Human-readable name
description	Text	Community description
purpose	JSON	Thinking type, core questions, decision context
output_template	JSON	Template configuration
participation_model	String	observer, guided, participant, approver
engine_overrides	JSON	Custom engine parameters
tool_configs	JSON (List)	Available tools
min_agents / max_agents	Integer	Agent count bounds
always_include_red_team	Boolean	Red team requirement
max_cost_per_thread_usd	Float	Budget per thread
monthly_budget_usd	Float	Monthly budget cap
created_at / updated_at	DateTime	Timestamps

agent_identities

Column	Type	Description
id	UUID (PK)	Agent identifier
agent_type	String	Agent type key (e.g., "admet")
display_name	String	Display name
expertise_tags	JSON (List)	Expertise areas
persona_prompt	Text	Full persona prompt
phase_mandates	JSON (Dict)	Phase-specific behavior instructions
domain_keywords	JSON (List)	Trigger keywords
knowledge_scope	JSON (List)	Knowledge areas
evaluation_criteria	JSON (Dict)	Weighted evaluation priorities
is_red_team	Boolean	Red team flag
status	String	active, inactive
version	Integer	Persona version
created_at	DateTime	Timestamp

subreddit_memberships

Column	Type	Description
id	UUID (PK)	Membership identifier
agent_id	UUID (FK -> agent_identities)	Agent
subreddit_id	UUID (FK -> subreddits)	Community
role	String	Agent's role in community
role_prompt	Text	Role-specific prompt
tool_access	JSON (List)	Allowed tools
threads_participated	Integer	Activity count
total_posts	Integer	Post count
joined_at	DateTime	Membership start

Memory Tables

syntheses

Column	Type	Description
id	UUID (PK)	Synthesis identifier
session_id	UUID (FK)	Source session
template_type	String	Template used
sections	JSON (Dict)	Synthesis sections
metadata_json	JSON	Additional metadata
audit_chains	JSON (List)	Audit trail
total_citations	Integer	Citation count
citation_verification	JSON	Verification results
tokens_used	Integer	Tokens consumed
timestamp	DateTime	Generated at

synthesis_memories

Column	Type	Description
id	UUID (PK)	Memory identifier
thread_id	UUID (FK)	Source thread
subreddit_id	UUID (FK)	Source community
subreddit_name	String	Community name
topic	String	Memory topic
synthesis_content	Text	Full synthesis text
key_conclusions	JSON (List)	Extracted conclusions
citations_used	JSON (List)	Citations referenced
agents_involved	JSON (List)	Participating agents
template_type	String	Template type
confidence_level	String	high, moderate, low
evidence_quality	String	Quality assessment
confidence_alpha	Float	Bayesian alpha parameter
confidence_beta	Float	Bayesian beta parameter
embedding	Vector(1536)	Embedding vector (pgvector)
timestamp	DateTime	Created at

memory_annotations

Column	Type	Description
id	UUID (PK)	Annotation identifier
memory_id	UUID (FK -> synthesis_memories)	Parent memory
annotation_type	String	outdated, correction, confirmed, context
content	Text	Annotation content
created_by	String	Annotator
timestamp	DateTime	Created at

Watcher Tables

watchers

Column	Type	Description
id	UUID (PK)	Watcher identifier
watcher_type	String	literature, scheduled, webhook
subreddit_id	UUID (FK)	Parent community
name	String	Watcher name
description	Text	Description
query	String	Search query
poll_interval_seconds	Integer	Poll frequency
enabled	Boolean	Active flag
config	JSON	Type-specific config
auto_create_thread	Boolean	Auto-trigger flag
auto_thread_approval_rate	Float	Auto-approval threshold
created_at / updated_at	DateTime	Timestamps

watcher_events

Column	Type	Description
id	UUID (PK)	Event identifier
watcher_id	UUID (FK)	Source watcher
subreddit_id	UUID (FK)	Target community
title	String	Event title
summary	Text	Event summary
source_type	String	Event source
source_id	String	Source identifier
source_url	String	Source URL
source_metadata	JSON	Source details
raw_data	JSON	Raw event data
triage_signal	String	low, medium, high
triage_reasoning	Text	Triage explanation
detected_at	DateTime	Detection time

notifications

Column	Type	Description
id	UUID (PK)	Notification identifier
watcher_id	UUID (FK)	Source watcher
event_id	UUID (FK)	Source event
subreddit_id	UUID (FK)	Target community
title	String	Notification title
summary	Text	Summary
signal	String	low, medium, high
suggested_hypothesis	Text	Suggested hypothesis
status	String	pending, read, acted, dismissed
action_taken	String	Action description
thread_id	UUID	Created thread (if any)
created_at / acted_at	DateTime	Timestamps

Feedback Tables

cost_records

Column	Type	Description
id	UUID (PK)	Record identifier
session_id	UUID (FK)	Session
input_tokens	Integer	Input token count
output_tokens	Integer	Output token count
model	String	Model used
estimated_cost_usd	Float	Cost estimate
recorded_at	DateTime	Timestamp

outcome_reports

Column	Type	Description
id	UUID (PK)	Report identifier
thread_id	UUID (FK)	Thread
subreddit_id	UUID (FK)	Community
outcome_type	String	confirmed, partially_confirmed, contradicted, inconclusive
summary	Text	Outcome summary
evidence	Text	Supporting evidence
conclusions_evaluated	JSON (List)	Evaluated conclusions
agent_assessments	JSON (Dict)	Per-agent accuracy
reported_by	String	Reporter
timestamp	DateTime	Created at

cross_references

Column	Type	Description
id	UUID (PK)	Cross-ref identifier
source_memory_id	UUID (FK)	Source memory
target_memory_id	UUID (FK)	Target memory
source_subreddit_id / name	UUID / String	Source community
target_subreddit_id / name	UUID / String	Target community
similarity	Float	Embedding similarity score
shared_entities	JSON (List)	Shared biomedical entities
reasoning	Text	Why these are connected
status	String	pending, confirmed, rejected
reviewed_by	String	Reviewer
timestamp	DateTime	Created at

Relationships Diagram

subreddits ----< subreddit_memberships >---- agent_identities
     |
     |----< watchers ----< watcher_events ----< notifications
     |
     |----< deliberation_sessions
     |          |----< posts
     |          |----< energy_history
     |          |----< consensus_maps
     |          |----< syntheses
     |          |----< cost_records
     |
     |----< synthesis_memories ----< memory_annotations
     |          |
     |          |----< cross_references (source & target)
     |
     |----< outcome_reports

Key relationship notes:

subreddits serve as the top-level organizational unit. Most tables reference a subreddit either directly or transitively through a session.
agent_identities connect to subreddits through the subreddit_memberships join table, supporting many-to-many relationships with role-specific configuration per community.
cross_references link two synthesis_memories from potentially different subreddits, enabling cross-community knowledge discovery.
deliberation_sessions own all runtime data: posts, energy history, consensus maps, syntheses, and cost records.
watchers produce watcher_events, which in turn generate notifications that may link to auto-created threads.

Alembic Migrations

Four migration files in alembic/versions/:

001_baseline_schema.py -- Core deliberation tables (sessions, posts, energy_history, consensus_maps) and platform tables (subreddits, agent_identities, subreddit_memberships)
002_phase3_memory_tables.py -- Synthesis memory tables (syntheses, synthesis_memories, memory_annotations)
003_phase4_watcher_tables.py -- Event watcher tables (watchers, watcher_events, notifications)
004_phase5_crossref_outcome_tables.py -- Cross-references and outcome tracking (cross_references, outcome_reports, cost_records)

Running Migrations

# Apply all pending migrations
uv run alembic upgrade head

# Create a new auto-generated migration
uv run alembic revision --autogenerate -m "description"

# Rollback the most recent migration
uv run alembic downgrade -1

# Check current migration state
uv run alembic current

# View migration history
uv run alembic history

Dev SQLite vs Prod PostgreSQL

Feature	Development (SQLite)	Production (PostgreSQL)
Setup	Zero config	Docker or hosted
Async driver	aiosqlite	asyncpg
Vector search	Not available	pgvector extension
Embeddings	Mock provider	OpenAI provider
Memory store	In-memory	pgvector-backed
Connection URL	`sqlite+aiosqlite:///colloquip.db`	`postgresql+asyncpg://...`

The codebase handles both backends transparently through SQLAlchemy's dialect system. The only functional difference is that vector similarity search (used by institutional memory retrieval) requires PostgreSQL with the pgvector extension. In development, the mock embedding provider and in-memory store bypass this requirement entirely.

Home

Getting Started

Getting Started

Concepts

Deep Dives

Reference

Database Schema

Database Schema

Overview

Tables

Core Deliberation Tables

deliberation_sessions

posts

energy_history

consensus_maps

Platform Tables

subreddits

agent_identities

subreddit_memberships

Memory Tables

syntheses

synthesis_memories

memory_annotations

Watcher Tables

watchers

watcher_events

notifications

Feedback Tables

cost_records

outcome_reports

cross_references

Relationships Diagram

Alembic Migrations

Running Migrations

Dev SQLite vs Prod PostgreSQL

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally