Skip to content

Latest commit

 

History

History
260 lines (207 loc) · 6.08 KB

File metadata and controls

260 lines (207 loc) · 6.08 KB

Consilium - Product Requirements Document

"Seeking Truth Through AI Councils" - A Multi-LLM Council Interface for Comparison, Deliberation, and Evaluation


🎯 Overview

Consilium is a web-based interface that allows users to query multiple Large Language Models simultaneously, compare their responses side-by-side, and optionally have another LLM evaluate and rank the responses.

Core Value Proposition

  • See how different AI models approach the same question
  • Get diverse perspectives on complex problems
  • Evaluate response quality with structured criteria
  • Save and export sessions for reference

🏗️ Architecture

Tech Stack

Layer Technology
Frontend React 18 + TypeScript + Vite
Styling Tailwind CSS
State Zustand
Backend Node.js + Express
Database SQLite (better-sqlite3)
API OpenRouter (unified LLM access)

Ports

  • Frontend: http://localhost:3800
  • Backend: http://localhost:3801

🎨 User Interface

Layout Options

  1. Grid View - 2x2, 2x3, or 3x3 panels based on council size
  2. Tab View - One model per tab, summary tab
  3. Summary View - Overview with scores, expand to see full responses

Theme

  • Light/Dark mode toggle
  • System preference detection

Panel Components

  • Model name/icon header
  • Streaming response area (Markdown rendered)
  • Copy button
  • Response time indicator
  • Token count (when available)

🤖 Modes of Operation

Mode 1: Council Compare (MVP)

  • All selected models receive the same prompt simultaneously
  • Responses stream in side-by-side
  • Optional: Judge evaluates all responses

Mode 2: Single Evaluator

  • Multiple models answer
  • One designated judge evaluates all responses
  • Produces scores + prose analysis

Mode 3: Panel of Judges (Phase 2)

  • One model answers
  • Multiple models evaluate that single response
  • Consensus scoring

Mode 4: Consensus Debate (Phase 2)

  • Round-robin discussion
  • Models agree/disagree with previous responses
  • 3 rounds max to reach consensus

📊 Evaluation Criteria

Customizable checklist (all enabled by default):

  • Accuracy
  • Clarity
  • Completeness
  • Creativity
  • Adherence to Instructions
  • Safety/Appropriateness
  • Reasoning Quality
  • Conciseness

Output:

  • Numerical scores (1-10 per criterion)
  • Overall ranking
  • Prose analysis/justification

💾 Data Persistence

SQLite Schema

-- Sessions table
CREATE TABLE sessions (
  id TEXT PRIMARY KEY,
  name TEXT,
  mode TEXT,
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  council_config JSON,
  tags JSON
);

-- Messages table
CREATE TABLE messages (
  id TEXT PRIMARY KEY,
  session_id TEXT,
  role TEXT, -- 'user', 'assistant', 'evaluation'
  model_id TEXT,
  content TEXT,
  tokens_used INTEGER,
  latency_ms INTEGER,
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (session_id) REFERENCES sessions(id)
);

-- Evaluations table
CREATE TABLE evaluations (
  id TEXT PRIMARY KEY,
  session_id TEXT,
  judge_model TEXT,
  criteria JSON,
  scores JSON,
  ranking JSON,
  analysis TEXT,
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (session_id) REFERENCES sessions(id)
);

-- Presets table
CREATE TABLE presets (
  id TEXT PRIMARY KEY,
  name TEXT,
  models JSON,
  judge_model TEXT,
  system_prompts JSON,
  is_default BOOLEAN DEFAULT FALSE,
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Export Formats

  • JSON (full data)
  • Markdown (human-readable)
  • CSV (comparison tables)
  • PDF (formatted report) - Phase 2

🔌 API Integration

OpenRouter Configuration

const OPENROUTER_BASE_URL = 'https://openrouter.ai/api/v1';
const OPENROUTER_API_KEY = process.env.OPENROUTER_API_KEY;

Default Models

Role Model ID Display Name
Council anthropic/claude-sonnet-4 Claude Sonnet
Council openai/gpt-4o GPT-4o
Council google/gemini-2.0-flash-001 Gemini 2.0
Council x-ai/grok-2 Grok 2
Judge google/gemini-2.0-flash-001 Gemini 2.0

Streaming

  • Use Server-Sent Events (SSE) for streaming responses
  • Each model streams independently to its panel

🛣️ Roadmap

MVP (Phase 1) - Current Sprint

  • Project setup
  • OpenRouter integration
  • Model selection UI
  • Side-by-side streaming panels
  • Basic evaluation mode
  • Chat persistence (SQLite)
  • Export to JSON
  • Light/dark theme

Phase 2

  • Council presets
  • Panel of Judges mode
  • Consensus Debate mode
  • Advanced exports (PDF, CSV)
  • Tagging and search
  • Cost tracking
  • Per-model system prompts

Phase 3

  • MCP tool integration
  • RAG/document upload
  • Prompt templates
  • Analytics dashboard
  • Context window management

🔐 Security Notes

  • API keys stored in environment variables (never in frontend)
  • Backend proxies all LLM requests
  • SQLite file stored locally
  • No external data transmission except to OpenRouter

📁 Project Structure

consilium/
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── Layout/
│   │   │   ├── Council/
│   │   │   ├── Chat/
│   │   │   └── Settings/
│   │   ├── stores/
│   │   ├── hooks/
│   │   ├── utils/
│   │   ├── types/
│   │   └── App.tsx
│   ├── index.html
│   ├── tailwind.config.js
│   └── package.json
├── backend/
│   ├── src/
│   │   ├── routes/
│   │   ├── services/
│   │   ├── db/
│   │   └── index.ts
│   ├── consilium.db
│   └── package.json
├── PRD.md
└── README.md

Last Updated: December 16, 2024