From 97793036f7d4a9c99086e556cb81a4e0e7f8df9c Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 10:53:04 +0100 Subject: [PATCH 01/33] docs(hivemind): add comprehensive HiveMind orchestration plan and task list Add two new documentation files: docs/HiveMind Plan.md (detailed architecture and implementation plan for Swarm/Fusion orchestration) and docs/HiveMind Task.md (phase-by-phase task checklist). Includes terminology, request flow, core components, config schemas, feature specs (jitter, adversarial, blind switch, recursive mode), streaming and usage tracking, integration points, error handling, testing strategy, and implementation phases to guide development and verification. --- docs/HiveMind Plan.md | 1290 +++++++++++++++++++++++++++++++++++++++++ docs/HiveMind Task.md | 93 +++ 2 files changed, 1383 insertions(+) create mode 100644 docs/HiveMind Plan.md create mode 100644 docs/HiveMind Task.md diff --git a/docs/HiveMind Plan.md b/docs/HiveMind Plan.md new file mode 100644 index 00000000..525c1a5c --- /dev/null +++ b/docs/HiveMind Plan.md @@ -0,0 +1,1290 @@ +# HiveMind (Swarm/Fusion) - Implementation Plan (REVISED) + +## Goal Description + +Implement a sophisticated orchestration engine called "HiveMind" that enables two distinct modes of parallel model execution: + +1. **Swarm Mode**: Multiple parallel calls to the **same model** (called "Drones") with optional configuration for temperature variation, adversarial critique, and recursive self-correction. +2. **Fusion Mode**: Multiple parallel calls to **different models** (called "Models" or "Specialists" when roles are assigned) with optional role-based routing and context-aware synthesis. + +Both modes use an "Arbiter" (judge model) to synthesize responses with configurable strategies and optional recursive refinement. + +--- + +## Terminology + +- **HiveMind**: The overall feature/system +- **Swarm**: Parallel execution of the same model + - **Drone**: Individual instance in a Swarm +- **Fusion**: Parallel execution of different models + - **Model**: Individual model in a Fusion (generic term) + - **Specialist**: A Model with an assigned role and weight +- **Arbiter**: The judge/synthesizer model that produces the final response + +--- + +## Architecture Overview + +### Request Flow + +``` +User Request (model: "gemini-1.5-flash[swarm]") + ↓ +EnsembleManager.is_ensemble()? → Yes + ↓ +EnsembleManager.handle_request() + ↓ +┌─────────────────────────────────────────┐ +│ 1. Configuration Resolution │ +│ - Load config for this ensemble │ +│ - Determine: Swarm or Fusion? │ +│ - Get Arbiter config │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 2. Drone/Model Preparation │ +│ For Swarm: │ +│ - Create N Drones (same model) │ +│ - Apply temp jitter (optional) │ +│ - Mark M as adversarial (optional) │ +│ For Fusion: │ +│ - Load constituent models │ +│ - Apply role prompts (optional) │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 3. Parallel Execution │ +│ - asyncio.gather() all calls │ +│ - Each call uses RotatingClient │ +│ - Apply retry logic per drone/model │ +│ - Collect responses + metadata │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 4. Response Processing │ +│ - Apply blind switch (optional) │ +│ - Format for Arbiter consumption │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 5. Arbitration │ +│ - Load strategy prompt │ +│ - Inject role/weight context │ +│ - For Recursive Mode: │ +│ • Give arbiter autonomy │ +│ • Arbiter decides Round 2 │ +│ - For Non-Recursive: │ +│ • Direct synthesis only │ +│ - Call Arbiter (with streaming) │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 6. Final Output │ +│ - Stream Arbiter's response to user │ +│ - Aggregate usage from all calls │ +│ - Log execution summary │ +└─────────────────────────────────────────┘ +``` + +--- + +## Core Components + +### 1. EnsembleManager Class + +**File**: `src/rotator_library/ensemble_manager.py` + +**Responsibilities**: +- Load and validate `ensemble_config.json` +- Detect Swarm requests (`[swarm]` notation) vs Fusion requests (config-based) +- Orchestrate parallel execution with retry logic +- Manage arbitration with streaming support +- Handle recursive refinement (single arbiter call with autonomous decision) + +**Key Methods**: + +#### `__init__(self, config_path, rotating_client)` +- Load configuration file +- Store reference to RotatingClient +- Build lookup tables for fast ensemble detection +- Validate configuration schema +- Initialize usage aggregator + +#### `is_ensemble(self, model_id: str) -> bool` +- Check if model_id matches a Fusion config (exact match from config) +- Check if model_id contains `[swarm]` notation +- Handle conflict detection (if provider has real model with same name) +- Return: `True` if ensemble, `False` otherwise + +#### `resolve_conflicts(self, base_model: str) -> str` +- Default format: `base_model[swarm]` +- Check if this conflicts with provider's real models +- If conflict, try: `base_model[hive]`, `base_model[max]`, etc. +- Log warning about conflict resolution +- Return: Final ensemble ID to use + +#### `handle_request(self, request_params: dict) -> AsyncGenerator` +Main orchestration method. Returns a streaming generator for the Arbiter's response. + +**Steps**: +1. **Identify Type**: Swarm or Fusion +2. **Load Config**: Get specific config or use defaults +3. **Prepare Drones/Models**: + - Build list of execution targets + - Apply temperature jitter (Swarm) + - Apply role prompts (Fusion) + - Mark adversarial instances +4. **Execute Parallel Calls**: + - Use `asyncio.gather()` with exception handling + - Each call goes through RotatingClient (inherits retry logic) + - Require at least 1 successful response + - Log failures as errors +5. **Aggregate Usage**: + - Sum all `prompt_tokens`, `completion_tokens`, `total_tokens` + - Calculate combined cost (using existing cost calculation) +6. **Process Responses**: + - Extract content from each response + - Apply blind switch if enabled (keep roles, strip model names) + - Format for Arbiter +7. **Build Arbiter Prompt**: + - Load strategy prompt template + - Inject adversarial context (if applicable) + - Inject role/weight context (Fusion) + - For recursive mode: Add autonomous decision instructions +8. **Call Arbiter with Streaming**: + - Stream Arbiter's synthesis to user + - Parse internal markers (if recursive mode) + - Aggregate Arbiter's usage into total +9. **Return**: Stream final response with combined usage metadata + +#### `_prepare_drones(self, config: dict, base_model: str, request_params: dict) -> List[dict]` +For Swarm mode: +- Create N copies of request params +- **Temperature Jitter**: + ```python + base_temp = request_params.get('temperature', 0.7) + jitter_config = config.get('temperature_jitter', {}) + if jitter_config.get('enabled', False): + delta = jitter_config.get('delta', 0.0) + for i in range(count): + temp = base_temp + random.uniform(-delta, delta) + temp = max(0.0, min(2.0, temp)) # Clamp + drones[i]['temperature'] = temp + ``` +- **Adversarial Prompts**: + ```python + adv_config = config.get('adversarial_config', {}) + if adv_config.get('enabled', False): + count = adv_config['count'] + prompt = adv_config['prompt'] + for i in range(count): + drones[i]['messages'].insert(0, { + 'role': 'system', + 'content': prompt + }) + drones[i]['_is_adversarial'] = True # Metadata for logging + ``` +- **Model ID**: All drones use `base_model` (without `[swarm]` suffix) + +#### `_prepare_models(self, config: dict, request_params: dict) -> List[dict]` +For Fusion mode: +- For each model in fusion config: + - Clone request params + - Set model ID from config + - If role defined: + - Apply `system_prompt_append` (prepend to messages) + - Store role metadata for context + - If weight defined: + - Store weight for arbiter context +- Return list of prepared calls with metadata + +#### `_execute_parallel(self, prepared_calls: List[dict]) -> Tuple[List[dict], dict]` +- Execute all calls in parallel: + ```python + results = await asyncio.gather( + *[self.rotating_client.acompletion(**params) for params in prepared_calls], + return_exceptions=True + ) + ``` +- Filter out exceptions/None values +- Log each failure as ERROR (drones should not fail) +- Require at least 1 success, else raise exception +- Aggregate usage: + ```python + total_usage = { + 'prompt_tokens': sum(r.usage.prompt_tokens for r in results if r), + 'completion_tokens': sum(r.usage.completion_tokens for r in results if r), + 'total_tokens': sum(r.usage.total_tokens for r in results if r) + } + ``` +- Return: `(successful_responses, total_usage)` + +#### `_format_for_arbiter(self, responses: List[dict], config: dict, mode: str, metadata: List[dict]) -> str` +Build formatted text for arbiter input. + +**Blind Switch Logic**: +- If `blind=True`: + - Labels: "Response 1 (Architect role)", "Response 2 (Security role)" + - Do NOT include model names +- If `blind=False`: + - Labels: "Response 1 (GPT-4o - Architect)", "Response 2 (Claude-3-opus - Security)" + +**Adversarial Context** (if adversarial drones present): +``` +NOTE: Responses marked [ADVERSARIAL] were specifically prompted to critique and find flaws. +Their purpose is to stress-test the solution. Consider their critiques when synthesizing. +``` + +**Format**: +``` +Response 1 (GPT-4o - Architect): +[content] + +Response 2 (Claude-3-opus - Security): +[content] + +Response 3 [ADVERSARIAL]: +[content] +``` + +#### `_build_arbiter_prompt(self, formatted_responses: str, config: dict, mode: str) -> List[dict]` +Build complete messages array for arbiter. + +**System Prompt Components**: +1. **Base Strategy**: Load from `arbitration_strategies[strategy_name]` +2. **Role/Weight Context** (Fusion only): + ``` + You are synthesizing responses from specialists with the following expertise: + - GPT-4o (Architect): Expert in system design and scalability. Trust this model for architectural decisions. + - Claude-3-opus (Security): Expert in vulnerability assessment. Trust this model for security concerns. + ``` +3. **Adversarial Context** (if applicable): + ``` + Some responses are marked [ADVERSARIAL]. These drones were specifically instructed to critique + and find edge cases. Their purpose is quality assurance through skeptical analysis. + ``` +4. **Recursive Mode Instructions** (if enabled): + ``` + AUTONOMOUS DECISION PROTOCOL: + 1. Analyze the responses and assess consensus (agreement level 1-10) + 2. If consensus >= 7/10: Proceed directly to synthesis + 3. If consensus < 7/10: + a. Identify specific conflict points + b. Internally trigger a critique phase + c. For each response, reason about how it would address the conflicts + d. Then synthesize the final answer + + Log your internal reasoning with markers: + [CONSENSUS: X/10] + [CONFLICTS: bullet list] + [CRITIQUE REASONING: ...] + [FINAL SYNTHESIS:] + + IMPORTANT: Only return the FINAL SYNTHESIS to the user. All internal reasoning + should be wrapped in [INTERNAL] tags for logging purposes only. + ``` +5. **Output Format**: + ``` + Provide your synthesis as a complete, high-quality response to the user's original query. + Do not mention that you are combining responses unless directly relevant. + ``` + +**User Message**: Original user query + formatted responses + +Return: Complete messages array for arbiter call + +#### `_call_arbiter_streaming(self, messages: List[dict], arbiter_model: str, original_params: dict) -> AsyncGenerator` +Call arbiter and stream response. + +- Clone original request params +- Set model to `arbiter_model` +- Set `messages` to constructed arbiter prompt +- Set `stream=True` +- Call via RotatingClient.acompletion (returns async generator) +- **Parse Stream**: + - Extract internal markers (consensus score, conflicts) for logging + - Strip `[INTERNAL]` sections from user-facing output + - Yield only synthesis content to user +- **Aggregate Usage**: Track arbiter's usage separately +- Return: Streaming generator + +--- + +### 2. Configuration Structure + +**Folder-Based Approach**: Instead of a single config file, HiveMind uses a directory structure: + +``` +ensemble_configs/ +├── swarms/ +│ ├── default.json # Default swarm settings +│ ├── gemini-flash.json # Custom swarm for gemini-flash +│ └── gpt4o.json # Custom swarm for gpt-4o +├── fusions/ +│ ├── dev-team.json # Dev team fusion +│ └── creative-writers.json # Creative writers fusion +└── strategies/ + ├── synthesis.txt # Synthesis strategy prompt + ├── best_of_n.txt # Best-of-N strategy + └── code_review.txt # Code review strategy +``` + +**Loading Logic**: +- Load all JSON files from each subfolder +- Merge swarm configs (specific model configs override defaults) +- Detect duplicate fusion IDs → apply conflict resolution +- Load strategy templates from `.txt` files + +**Benefits**: +- Easy to add new configs (drop file in folder) +- Version control friendly (one file per fusion/config) +- Community sharing (share individual fusion configs) + +--- + +### 3. Configuration Schemas + +#### Swarm Config + +**File**: `ensemble_configs/swarms/default.json` + +```json +{ + "suffix": "[swarm]", + "count": 3, + + "temperature_jitter": { + "enabled": true, + "delta": 0.2 + }, + + "arbiter": { + "model": "self", + "strategy": "synthesis", + "blind": true, + "note": "Arbiter should be a decent reasoning model (e.g., GPT-4o, Claude 3+, Gemini 1.5 Pro+)" + }, + + "adversarial_config": { + "enabled": false, + "count": 1, + "prompt": "You are a Senior Principal Engineer with 15+ years of experience..." + }, + + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7, + "note": "Requires a reasoning-capable arbiter model" + } +} +``` + +#### Model-Specific Swarm Config + +**File**: `ensemble_configs/swarms/gemini-flash.json` + +```json +{ + "model": "gemini-1.5-flash", + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true + } +} +``` + +#### Fusion Config + +**File**: `ensemble_configs/fusions/dev-team.json` + +```json +{ + "id": "dev-team", + "description": "A team of specialized models for software development", + "models": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt_append": "Focus on architectural patterns, scalability, and system design.", + "weight": "Expert in system design and scalability. Trust for architectural decisions and structural integrity." + }, + { + "model": "claude-3-opus", + "role": "Security Specialist", + "system_prompt_append": "Focus on security vulnerabilities, edge cases, and potential exploits.", + "weight": "Expert in security and vulnerability assessment. Trust for identifying security flaws and attack vectors." + }, + { + "model": "gemini-1.5-pro", + "role": "Code Reviewer", + "system_prompt_append": "Focus on code quality, performance, and best practices.", + "weight": "Expert in code quality and performance optimization. Trust for maintainability and efficiency concerns." + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true, + "note": "Requires a reasoning-capable model for best results" + }, + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} +``` + +#### Strategy Template + +**File**: `ensemble_configs/strategies/synthesis.txt` + +``` +You are an expert synthesizer. Analyze the following responses and create a single, superior answer that: +1. Combines the best elements from each response +2. Resolves any conflicts or contradictions +3. Ensures completeness and accuracy +4. Maintains coherence and clarity + +Your goal is to produce the BEST possible answer by leveraging the strengths of each response. + +Responses: +{responses} +``` + + +--- + +## Detailed Feature Specifications + +### 1. Temperature Jitter (Swarm Only) + +**Purpose**: Introduce controlled randomness to increase response diversity. + +**Configuration**: +```json +"temperature_jitter": { + "enabled": true, + "delta": 0.2 +} +``` + +**Implementation**: +- Get base temperature from request (default 0.7) +- For each Drone: `temp = base_temp + random.uniform(-delta, delta)` +- Clamp to `[0.0, 2.0]` +- If request has `temperature=0`, disable jitter automatically + +--- + +### 2. Adversarial Mode (Swarm Only) + +**Purpose**: Inject critical analysis to stress-test solutions. + +**Configuration**: +```json +"adversarial_config": { + "enabled": false, + "count": 1, + "prompt": "You are a Senior Principal Engineer..." +} +``` + +**Implementation**: +- Select first N drones as adversarial +- Prepend adversarial system prompt +- Tag responses as `[ADVERSARIAL]` in arbiter input +- **Arbiter Context**: Explain adversarial purpose: + ``` + NOTE: This mode is designed for SYNTHESIS strategy. Adversarial responses + critique the solution to ensure all angles are considered. Integrate their + insights to strengthen the final answer. + ``` + +--- + +### 3. Role Assignment & Weights (Fusion Only) + +**Purpose**: Specialize models and guide arbiter on expertise. + +**Configuration** (per model): +```json +{ + "model": "gpt-4o", + "role": "Architect", + "system_prompt_append": "Focus on scalability.", + "weight": "Expert in system design. Trust for architectural decisions." +} +``` + +**Fields**: +- `role`: Display name (for user reference and arbiter labels) +- `system_prompt_append`: Instructions sent to the model +- `weight`: Context for arbiter (what to trust this model for) + +**Arbiter Context Injection**: +``` +Specialist Expertise: +- Architect (GPT-4o): Expert in system design. Trust for architectural decisions. +- Security (Claude): Expert in vulnerabilities. Trust for security concerns. +``` + +--- + +### 4. Arbitration Strategies + +**Purpose**: Flexible synthesis logic via prompt engineering. + +**Built-in**: +- `synthesis`: Combine all responses into best version +- `best_of_n`: Select and refine the strongest response +- `code_review`: Code-specific evaluation + +**User-Defined**: Users add custom strategies to `arbitration_strategies` config. + +**Template Variables**: +- `{responses}`: Formatted response text +- `{role_context}`: Weight/expertise descriptions +- `{adversarial_note}`: Context about adversarial drones + +--- + +### 5. Blind Switch + +**Purpose**: Remove model identifiers to prevent bias, while keeping role context. + +**Default**: `blind: true` (enabled by default) + +**Per-Config**: Each swarm config and fusion config can override: + +```json +"arbiter": { + "blind": true +``` + +**Implementation**: +- `blind=true`: "Response 1 (Architect role)", "Response 2 (Security role)" +- `blind=false`: "Response 1 (GPT-4o - Architect)", "Response 2 (Claude - Security)" + +**Key Change**: Roles are ALWAYS preserved. Only model names are stripped. + +--- + +### 6. Recursive/Reflective Mode + +**Purpose**: Multi-round refinement for low-consensus situations. + +**Configuration**: +```json +"recursive_mode": { + "enabled": false, + "consensus_threshold": 7, + "note": "Arbiter model must be capable of internal reasoning (e.g., GPT-4o, Claude 3.5+, Gemini 1.5 Pro+)" +} +``` + +**REVISED APPROACH** (Single Arbiter Call): + +Instead of multiple requests, the arbiter is given **autonomous decision-making** via prompt. + +> [!NOTE] +> The arbiter model should be a **decent reasoning model** to handle internal critique and consensus analysis effectively. Models like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro are recommended. + +**Arbiter Prompt** (when recursive enabled): +``` +You have autonomous decision-making authority. Follow this protocol: + +1. ASSESSMENT PHASE: + - Analyze the provided responses + - Rate consensus level (1-10) + - Log: [CONSENSUS: X/10] + +2. DECISION PHASE: + If consensus >= 7/10: + - Proceed directly to synthesis + + If consensus < 7/10: + - Identify conflict points + - Log: [CONFLICTS: ...] + - For each response, reason internally about how it would address conflicts + - Log: [CRITIQUE REASONING: ...] + +3. SYNTHESIS PHASE: + - Create final answer incorporating all insights + - Log: [FINAL SYNTHESIS:] + +IMPORTANT: Wrap all internal reasoning in [INTERNAL] tags. Only the content +after [FINAL SYNTHESIS:] will be shown to the user. +``` + +**Stream Processing**: +- EnsembleManager parses the stream +- Extract `[CONSENSUS: X/10]` → Log at WARN level if < threshold +- Extract `[CONFLICTS: ...]` → Log conflicts +- Strip all `[INTERNAL]` sections from user output +- Yield only `[FINAL SYNTHESIS:]` content to user + +**Logging**: +``` +[HiveMind] Recursive mode active. Consensus: 5/10 [WARN] +[HiveMind] Conflicts identified: [list] +[HiveMind] Arbiter performing internal critique... +[HiveMind] Final synthesis complete +``` + +--- + +### 7. Streaming Support + +**Behavior**: Respects the `stream` boolean from the original request. + +**Implementation**: +- Drone/Model calls are NOT streamed (collected in parallel) +- Arbiter call respects `stream` parameter: + - If `stream=true`: Stream arbiter's response + - If `stream=false`: Return complete arbiter response +- EnsembleManager passes through arbiter's streaming behavior +- Parse and filter internal markers during streaming +- Return clean synthesis to user + +**Flow**: +```python +async def handle_request(...) -> AsyncGenerator: + # 1. Collect drone responses (non-streaming) + responses = await self._execute_parallel(...) + + # 2. Build arbiter prompt + messages = self._build_arbiter_prompt(...) + + # 3. Stream arbiter response + arbiter_stream = self._call_arbiter_streaming(...) + + # 4. Parse and yield + async for chunk in arbiter_stream: + # Filter [INTERNAL] sections + if not chunk.startswith('[INTERNAL]'): + yield chunk +``` + +--- + +### 8. Usage & Cost Tracking + +**Aggregation**: +- Track usage from each Drone/Model call +- Track usage from Arbiter call +- Sum ALL usage fields: + ```python + total_usage = { + 'prompt_tokens': sum(all_calls), + 'completion_tokens': sum(all_calls), + 'cached_tokens': sum(all_calls), # If available + 'reasoning_tokens': sum(all_calls), # If available + 'total_tokens': sum(all_calls), + # Include any other usage fields from responses + } + ``` + +**Cost Calculation**: +- Use `UsageManager.calculate_cost()` if available (preferred) +- Fallback to `litellm.completion_cost()` if needed +- Calculate cost per call +- Sum total cost +- **Note**: This should be one of the last features to implement +- Include in final response metadata + +**Response Format**: +```json +{ + "usage": { + "prompt_tokens": 5000, + "completion_tokens": 800, + "total_tokens": 5800, + "hivemind_details": { + "drone_count": 3, + "arbiter_tokens": 1200, + "total_cost_usd": 0.045 + } + } +} +``` + +--- + +## Integration Points + +### 1. RotatingClient Modification + +**File**: `src/rotator_library/client.py` + +```python +class RotatingClient: + def __init__(self, ...): + # Existing init + self.ensemble_manager = EnsembleManager( + config_path=os.path.join(os.path.dirname(__file__), '../../ensemble_config.json'), + rotating_client=self + ) + + def acompletion(self, request=None, **kwargs): + model = kwargs.get('model') + + # Check if ensemble + if self.ensemble_manager.is_ensemble(model): + # Return streaming generator from ensemble manager + return self.ensemble_manager.handle_request( + request=request, + **kwargs + ) + + # Normal flow + if kwargs.get('stream'): + return self._streaming_acompletion_with_retry(...) + else: + return self._execute_with_retry(...) +``` + +--- + +### 2. Model List Integration + +```python +async def get_all_available_models(self, grouped=True): + # Existing provider models + all_provider_models = await self._fetch_provider_models() + + # Add fusion models + fusion_ids = self.ensemble_manager.get_fusion_ids() + if fusion_ids: + all_provider_models['hivemind'] = fusion_ids + + return all_provider_models +``` + +**Note**: Swarm model listing is **TBD**. The user notes it's "not infinite" and needs to design a better discovery system. + +--- + +### 3. Logging + +**Log Levels**: +- INFO: Normal operations (starting swarm, drone count, completion) +- DEBUG: Detailed execution (per-drone temps, prompt construction) +- WARN: Low consensus, conflicts, partial failures +- ERROR: Drone failures, arbiter failures + +**Examples**: +```python +lib_logger.info(f"[HiveMind] Processing Swarm: {model_id} ({count} Drones)") +lib_logger.debug(f"[HiveMind] Drone {i+1}: temp={temp:.2f}, adversarial={is_adv}") +lib_logger.warn(f"[HiveMind] Recursive mode: Consensus 5/10 - below threshold") +lib_logger.error(f"[HiveMind] Drone {i+1} failed: {error}") +lib_logger.info(f"[HiveMind] Total cost: ${total_cost:.4f} ({total_tokens} tokens)") +``` + +--- + +## Edge Cases & Error Handling + +### 1. Partial Failures + +**Scenario**: Some Drones fail due to errors. + +**Handling**: +- Each drone call uses RotatingClient → **inherits existing retry/key rotation logic** +- If a drone still fails after retries, log as ERROR +- Continue with successful responses +- **Minimum**: Require at least 1 successful response +- If all fail, raise exception with details + +**No Special Logic Needed**: RotatingClient already handles retries, rate limits, key rotation. + +--- + +### 2. Arbiter Failure + +**Scenario**: Arbiter call fails. + +**Handling**: +- Arbiter call uses RotatingClient → **inherits retry/resilience logic** +- If arbiter fails after retries: + - Log ERROR + - Fallback: Return first **non-adversarial** drone response + - Log: `[HiveMind] Arbiter failed. Returning first non-adversarial response.` + +--- + +### 3. Naming Conflicts + +**Scenario**: Provider has `gemini-1.5-flash[swarm]` as real model, or duplicate fusion IDs exist. + +**Handling**: +- Default naming: `model-name[swarm]` or fusion ID from config +- On conflict detected: + - Append numeric suffix: `-1`, `-2`, `-3`, etc. + - Example: `gemini-1.5-flash[swarm]` → `gemini-1.5-flash[swarm]-1` + - Example: `dev-team` → `dev-team-1` +- Log: `[HiveMind] Conflict detected. Renamed 'dev-team' to 'dev-team-1'.` +- Store resolved names in runtime cache +- **Applies to**: Both swarm suffixes AND fusion IDs + +--- + +### 4. Streaming Parse Errors + +**Scenario**: Can't parse `[CONSENSUS: X/10]` from recursive mode stream. + +**Handling**: +- Log warning +- Continue streaming synthesis +- Skip logging consensus score + +--- + +### 5. Invalid Configuration + +**Scenario**: User config has invalid fusion (missing model, invalid strategy). + +**Handling**: +- On startup, validate all fusions +- Log errors for invalid configs +- Skip invalid fusions +- Continue with valid ones + +--- + +## Implementation Phases + +### **Phase 1: Foundation (Core Infrastructure)** + +**Goal**: Set up basic structure and config loading. + +**Tasks**: +1. Create `ensemble_manager.py` skeleton + - Define `EnsembleManager` class + - Implement `__init__` with folder-based config loading + - Load and merge configs from `ensemble_configs/` directory + - Add config validation (JSON schema) + +2. Create config directory structure + - `ensemble_configs/swarms/default.json` + - `ensemble_configs/fusions/` (empty initially) + - `ensemble_configs/strategies/synthesis.txt` + +3. Integrate into `RotatingClient` + - Import `EnsembleManager` + - Initialize in `__init__` with config directory path + - Add placeholder check in `acompletion` + +4. Implement `is_ensemble()` + - Detect `[swarm]` suffix + - Detect fusion IDs from config + - Add conflict detection logic + +**Deliverables**: +- ✅ Folder-based config structure created +- ✅ Configs load and merge correctly +- ✅ Ensemble detection works +- ✅ Conflict resolution (numeric suffixes) works +- ✅ No runtime errors + +**Testing**: +- Unit test folder-based config loading +- Unit test config merging (swarm defaults + model-specific) +- Unit test `is_ensemble()` with various inputs +- Test conflict detection and numeric suffix generation +- Test duplicate fusion ID handling + +--- + +### **Phase 2: Basic Swarm (Non-Streaming)** + +**Goal**: Get basic swarm working without advanced features. + +**Tasks**: +1. Implement `_prepare_drones()` + - Clone request params N times + - Set model to base (strip `[swarm]`) + - No jitter or adversarial yet + +2. Implement `_execute_parallel()` + - Use `asyncio.gather()` with drone calls + - Handle exceptions gracefully + - Aggregate usage stats + +3. Implement `_format_for_arbiter()` + - Basic formatting (numbered responses) + - No blind switch yet + +4. Implement `_build_arbiter_prompt()` + - Load synthesis strategy + - Simple system prompt + user message + - No recursive mode yet + +5. Implement `_call_arbiter()` (NON-streaming first) + - Call arbiter via RotatingClient + - Return complete response + - Aggregate arbiter usage + +6. Wire up `handle_request()` (non-streaming) + - Connect all steps + - Return arbiter's response + - Include combined usage + +**Deliverables**: +- ✅ Swarm executes 3 drones in parallel +- ✅ Arbiter synthesizes responses +- ✅ Final response returned (non-streaming) +- ✅ Usage aggregated correctly + +**Testing**: +- Integration test: Call `gemini-1.5-flash[swarm]` +- Verify 3 drone calls + 1 arbiter call +- Verify synthesis quality (manual) +- Verify usage statistics + +--- + +### **Phase 3: Streaming Support** + +**Goal**: Enable streaming for arbiter response. + +**Tasks**: +1. Modify `_call_arbiter()` to `_call_arbiter_streaming()` + - Set `stream=True` + - Return async generator + - Track usage from stream + +2. Update `handle_request()` to return generator + - Yield arbiter stream chunks + - Aggregate usage at end + +3. Test streaming end-to-end + - Verify chunks arrive in real-time + - Verify complete response matches non-streaming + +**Deliverables**: +- ✅ Arbiter response streams to user +- ✅ No buffering of full response +- ✅ Usage still aggregated correctly + +**Testing**: +- Integration test with streaming +- Compare output to non-streaming version +- Test error handling mid-stream + +--- + +### **Phase 4: Advanced Swarm Features** + +**Goal**: Add jitter, adversarial, blind switch. + +**Tasks**: +1. **Temperature Jitter**: + - Add jitter logic to `_prepare_drones()` + - Test with different delta values + - Verify clamping + +2. **Adversarial Mode**: + - Inject adversarial prompts + - Tag responses in formatting + - Add arbiter context explanation + +3. **Blind Switch**: + - Modify `_format_for_arbiter()` + - Strip model names when `blind=true` + - Keep roles always + +**Deliverables**: +- ✅ Jitter produces varied temps +- ✅ Adversarial drones produce critiques +- ✅ Blind mode strips model names + +**Testing**: +- Test each feature independently +- Test combinations (jitter + adversarial) +- Manual review of adversarial effectiveness + +--- + +### **Phase 5: Fusion Mode** + +**Goal**: Enable multi-model mixtures with roles. + +**Tasks**: +1. Implement `_prepare_models()` + - Load models from fusion config + - Apply role system prompts + - Store metadata for arbiter + +2. Update `_format_for_arbiter()` for roles + - Include role labels + - Apply blind switch for model names + +3. Implement role/weight context injection + - Build specialist expertise text + - Inject into arbiter system prompt + +4. Add example fusion to config + - "dev-team" with 3 specialists + +**Deliverables**: +- ✅ Fusion calls multiple models +- ✅ Arbiter receives role context +- ✅ Synthesis respects expertise weights + +**Testing**: +- Test "dev-team" fusion with coding question +- Verify role prompts are applied +- Manual review: Does arbiter trust specialists appropriately? + +--- + +### **Phase 6: Recursive Mode** + +**Goal**: Enable autonomous arbiter decision-making for low consensus. + +**Tasks**: +1. Update `_build_arbiter_prompt()` for recursive + - Add autonomous protocol instructions + - Define `[INTERNAL]` marker format + - Include consensus threshold + +2. Implement stream parsing in `_call_arbiter_streaming()` + - Extract `[CONSENSUS: X/10]` + - Extract `[CONFLICTS: ...]` + - Strip `[INTERNAL]` sections from user output + +3. Add logging for recursive flow + - Log consensus score at WARN if low + - Log identified conflicts + - Log critique phase activation + +**Deliverables**: +- ✅ Arbiter autonomously decides Round 2 +- ✅ Internal reasoning logged but not shown to user +- ✅ Low consensus triggers critique + +**Testing**: +- Test with intentionally ambiguous prompt +- Verify arbiter produces `[CONSENSUS: 4/10]` +- Verify critique reasoning appears in logs +- Verify final synthesis is improved + +--- + +### **Phase 7: Polish & Production** + +**Goal**: Production-ready with documentation and examples. + +**Tasks**: +1. **Comprehensive Logging**: + - Add execution time tracking + - Add cost tracking per request + - Log summary at end of each request + +2. **Error Messages**: + - User-friendly error for invalid ensemble IDs + - Clear message when streaming not supported (N/A now) + - Helpful message on config errors + +3. **Documentation**: + - User guide: How to use swarms/fusions + - Config reference: All fields explained + - Example configs: dev-team, creative-writers, etc. + +4. **Example Configs**: + - Add 2-3 preset fusions to default config (commented out) + - Document swarm notation in README + +5. **Performance Testing**: + - Benchmark latency (3-drone swarm) + - Benchmark token usage vs single call + - Document cost multiplier + +**Deliverables**: +- ✅ Comprehensive logs for debugging +- ✅ User documentation complete +- ✅ Example configs provided +- ✅ Performance benchmarks documented + +**Testing**: +- Full end-to-end tests for all features +- Load testing with multiple concurrent swarms +- Manual testing of all examples + +--- + +## Example Configurations + +### Preset Fusion 1: Dev Team + +```json +{ + "id": "dev-team", + "description": "Software development team with architecture, security, and code review specialists", + "models": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt_append": "Focus on system design, scalability, and architectural patterns.", + "weight": "Expert in system design and scalability. Trust for architectural decisions." + }, + { + "model": "claude-3-opus", + "role": "Security", + "system_prompt_append": "Focus on security vulnerabilities, edge cases, and threat modeling.", + "weight": "Expert in security and vulnerability assessment. Trust for security concerns." + }, + { + "model": "gemini-1.5-pro", + "role": "Reviewer", + "system_prompt_append": "Focus on code quality, performance, and best practices.", + "weight": "Expert in code quality and optimization. Trust for performance and maintainability." + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "code_review", + "blind": false + } +} +``` + +--- + +## User Configuration Examples + +### Simple Swarm Usage + +User request: +``` +Model: gemini-1.5-flash[swarm] +Messages: [{"role": "user", "content": "Write a function to parse CSV"}] +``` + +Result: 3 calls to `gemini-1.5-flash`, synthesized by `gemini-1.5-flash` (self-arbiter). + +--- + +### Custom Arbiter for Swarm + +Config override (per-model): +```json +{ + "swarm_configs": { + "gemini-1.5-flash": { + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis" + } + } + } +} +``` + +User request: `gemini-1.5-flash[swarm]` +Result: 3 calls to flash, synthesized by gpt-4o. + +--- + +### Fusion Usage + +User request: +``` +Model: dev-team +Messages: [{"role": "user", "content": "Review this API endpoint: [code]"}] +``` + +Result: Parallel calls to gpt-4o, claude, gemini with role prompts. Arbiter synthesizes with role context. + +--- + +## Default Configuration Answer + +Based on user feedback: + +1. **Default Swarm Suffix**: `[swarm]` +2. **Arbiter Default**: Same model as drones (self-arbitration), but configurable per-model +3. **Streaming**: Required for arbiter's final response ✅ +4. **Cost Warnings**: None (user discretion) +5. **Preset Configs**: Only using provided examples (dev-team) + +--- + +## Testing Strategy + +### Unit Tests + +`tests/test_ensemble_manager.py`: +- Config loading and validation +- `is_ensemble()` detection +- Conflict resolution +- Drone preparation (jitter, adversarial) +- Model preparation (roles, weights) +- Response formatting (blind switch) + +### Integration Tests + +`tests/test_swarm_integration.py`: +- Basic 3-drone swarm +- Swarm with jitter enabled +- Swarm with adversarial mode +- Streaming swarm response + +`tests/test_fusion_integration.py`: +- Multi-model fusion +- Role context injection +- Weight-based synthesis + +`tests/test_recursive_integration.py`: +- Low consensus triggering critique +- Consensus score parsing +- Internal marker stripping + +### Manual Scenarios + +1. **Simple Swarm**: `gpt-4o[swarm]` with straightforward question +2. **Adversarial Swarm**: Enable adversarial, ask for code, verify critique +3. **Fusion**: Use "dev-team" with API review +4. **Recursive**: Use ambiguous prompt, verify low consensus handling + +--- + +## Performance Benchmarks (Expected) + +### Latency +- Single call: ~2s +- Swarm (3 drones): ~2s (parallel) + ~2s (arbiter) = **~4s** +- Swarm + Recursive: ~4s + arbiter internal critique time = **~5-6s** + +### Token Usage +- Single call: 1000 input + 500 output = 1500 tokens +- Swarm (3 drones): + - Drones: 1000 × 3 + 500 × 3 = 4500 tokens + - Arbiter: 1000 + 1500 (from drones) = 2500 input + 600 output + - Total: **~7600 tokens** (5x single call) + +### Cost Multiplier +- Typical swarm: **4-6x** cost of single call +- Fusion (different models): Varies by model costs + +--- + +## Summary + +This revised plan addresses all user feedback: + +✅ Confidence scoring only in recursive mode +✅ Adversarial context explained to arbiter +✅ Weight field for arbiter expertise guidance +✅ Blind switch keeps roles, strips model names +✅ Recursive mode as single autonomous arbiter call +✅ Default naming: `model[swarm]` +✅ Streaming required for arbiter response +✅ Usage/cost aggregated from all calls +✅ Existing retry/resilience logic leveraged +✅ Detailed implementation phases (7 phases) +✅ Example configs provided + +Ready for implementation! diff --git a/docs/HiveMind Task.md b/docs/HiveMind Task.md new file mode 100644 index 00000000..e41d127f --- /dev/null +++ b/docs/HiveMind Task.md @@ -0,0 +1,93 @@ +# HiveMind (Swarm/Fusion) Implementation + +## Phase 1: Core Infrastructure +- [/] Design and Plan + - [x] Explore codebase + - [x] Create comprehensive implementation plan +- [ ] Create `src/rotator_library/ensemble_manager.py` + - [ ] Define `EnsembleManager` class skeleton + - [ ] Implement config loading and validation + - [ ] Implement `is_ensemble()` detection + - [ ] Implement conflict resolution for naming +- [ ] Modify `src/rotator_library/client.py` + - [ ] Initialize `EnsembleManager` in `__init__` + - [ ] Integrate into `acompletion()` dispatcher + - [ ] Add logging for HiveMind operations +- [ ] Create `ensemble_config.json` + - [ ] Define schema for Fusions + - [ ] Define schema for Swarm defaults + - [ ] Define arbitration strategies + +## Phase 2: Basic Swarm Mode +- [ ] Implement Swarm Features + - [ ] `_prepare_drones()` - basic cloning + - [ ] `_execute_parallel()` - asyncio.gather + - [ ] `_format_for_arbiter()` - response aggregation + - [ ] `_build_arbiter_prompt()` - synthesis strategy + - [ ] `_call_arbiter()` - judge execution +- [ ] Testing + - [ ] Test basic 3-drone swarm + - [ ] Test arbiter synthesis + - [ ] Test partial failures + +## Phase 3: Advanced Swarm Features +- [ ] Temperature Jitter + - [ ] Implement jitter logic + - [ ] Test randomness and clamping +- [ ] Adversarial Mode + - [ ] Implement adversarial prompt injection + - [ ] Test with configurable count +- [ ] Blind Switch + - [ ] Implement response anonymization + - [ ] Test with blind=true/false +- [ ] Confidence Scoring + - [ ] Implement score extraction + - [ ] Add logging for scores + +## Phase 4: Fusion Mode +- [ ] Implement Fusion Features + - [ ] `_prepare_models()` - multi-model setup + - [ ] Role assignment and prompts + - [ ] Role context for Arbiter + - [ ] Weight system (future) +- [ ] Testing + - [ ] Test 2-model fusion + - [ ] Test role context injection + - [ ] Test specialist descriptions + +## Phase 5: Recursive/Reflective Mode +- [ ] Implement Recursion + - [ ] Consensus check logic + - [ ] Conflict extraction + - [ ] `_trigger_round_2()` implementation + - [ ] Max rounds enforcement +- [ ] Testing + - [ ] Test low-confidence trigger + - [ ] Test Round 2 critique + - [ ] Test final re-synthesis + +## Phase 6: Polish & Edge Cases +- [ ] Error Handling + - [ ] Partial failure handling + - [ ] Arbiter failure fallback + - [ ] Infinite recursion prevention +- [ ] Performance + - [ ] Latency logging + - [ ] Token usage tracking + - [ ] Rate limit mitigation +- [ ] Documentation + - [ ] User guide + - [ ] Example configs + - [ ] API reference + +## Verification +- [ ] Automated Tests + - [ ] test_ensemble_manager.py (all 8 test cases) + - [ ] test_swarm_logic.py + - [ ] test_fusion_logic.py + - [ ] test_recursion.py +- [ ] Manual Tests + - [ ] Scenario 1: Simple Swarm + - [ ] Scenario 2: Adversarial Swarm + - [ ] Scenario 3: Fusion with Roles + - [ ] Scenario 4: Recursive Refinement From 20e0cb186c0556b5afc400f794c3a7d0da84acd7 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 10:55:38 +0100 Subject: [PATCH 02/33] feat(ensemble): add HiveMind ensemble manager, config loader, and default configs Introduce a new HiveMind ensemble subsystem to enable Swarm/Fusion orchestration. - add src/rotator_library/ensemble: EnsembleManager, ConfigLoader, and package init - add ensemble_configs: default swarm config, fusion example (dev-team), and strategy templates - integrate EnsembleManager into RotatingClient to initialize the ensemble manager at startup - implement conflict resolution, config loading, and ensemble detection logic; swarm/fusion execution paths staged as TODOs This lays the groundwork for parallel model execution and intelligent arbitration without changing existing public APIs. --- src/rotator_library/client.py | 5 + src/rotator_library/ensemble/__init__.py | 9 + src/rotator_library/ensemble/config_loader.py | 209 ++++++++++++++++ src/rotator_library/ensemble/manager.py | 235 ++++++++++++++++++ .../ensemble_configs/fusions/dev-team.json | 34 +++ .../ensemble_configs/strategies/best_of_n.txt | 10 + .../strategies/code_review.txt | 12 + .../ensemble_configs/strategies/synthesis.txt | 10 + .../ensemble_configs/swarms/default.json | 28 +++ 9 files changed, 552 insertions(+) create mode 100644 src/rotator_library/ensemble/__init__.py create mode 100644 src/rotator_library/ensemble/config_loader.py create mode 100644 src/rotator_library/ensemble/manager.py create mode 100644 src/rotator_library/ensemble_configs/fusions/dev-team.json create mode 100644 src/rotator_library/ensemble_configs/strategies/best_of_n.txt create mode 100644 src/rotator_library/ensemble_configs/strategies/code_review.txt create mode 100644 src/rotator_library/ensemble_configs/strategies/synthesis.txt create mode 100644 src/rotator_library/ensemble_configs/swarms/default.json diff --git a/src/rotator_library/client.py b/src/rotator_library/client.py index b1485d04..6d98fabb 100644 --- a/src/rotator_library/client.py +++ b/src/rotator_library/client.py @@ -33,6 +33,7 @@ from .credential_manager import CredentialManager from .background_refresher import BackgroundRefresher from .model_definitions import ModelDefinitions +from .ensemble import EnsembleManager class StreamedAPIError(Exception): @@ -128,6 +129,10 @@ def __init__( if max_val < 1: lib_logger.warning(f"Invalid max_concurrent for '{provider}': {max_val}. Setting to 1.") self.max_concurrent_requests_per_key[provider] = 1 + + # Initialize HiveMind ensemble manager + self.ensemble_manager = EnsembleManager(rotating_client=self) + lib_logger.info("HiveMind ensemble manager initialized") def _is_model_ignored(self, provider: str, model_id: str) -> bool: """ diff --git a/src/rotator_library/ensemble/__init__.py b/src/rotator_library/ensemble/__init__.py new file mode 100644 index 00000000..6dbd382a --- /dev/null +++ b/src/rotator_library/ensemble/__init__.py @@ -0,0 +1,9 @@ +""" +HiveMind Ensemble Module + +This module provides parallel model execution (Swarm/Fusion) with intelligent arbitration. +""" + +from .manager import EnsembleManager + +__all__ = ['EnsembleManager'] diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py new file mode 100644 index 00000000..6453fe97 --- /dev/null +++ b/src/rotator_library/ensemble/config_loader.py @@ -0,0 +1,209 @@ +""" +Configuration loader for HiveMind ensemble configs. + +Loads and validates configurations from the ensemble_configs directory structure. +""" + +import os +import json +import logging +from pathlib import Path +from typing import Dict, List, Any, Optional + +lib_logger = logging.getLogger("rotator_library.ensemble") + + +class ConfigLoader: + """Loads and manages ensemble configurations from folder structure.""" + + def __init__(self, config_dir: str): + """ + Initialize the config loader. + + Args: + config_dir: Path to ensemble_configs directory (relative to rotator_library) + """ + self.config_dir = Path(config_dir) + self.swarms_dir = self.config_dir / "swarms" + self.fusions_dir = self.config_dir / "fusions" + self.strategies_dir = self.config_dir / "strategies" + + # Loaded configurations + self.swarm_default: Optional[Dict[str, Any]] = None + self.swarm_configs: Dict[str, Dict[str, Any]] = {} + self.fusion_configs: Dict[str, Dict[str, Any]] = {} + self.strategies: Dict[str, str] = {} + + def load_all(self) -> None: + """Load all configurations from the directory structure.""" + lib_logger.info("[HiveMind] Loading ensemble configurations...") + + # Create directories if they don't exist + self._ensure_directories() + + # Load swarm configurations + self._load_swarm_configs() + + # Load fusion configurations + self._load_fusion_configs() + + # Load strategy templates + self._load_strategies() + + lib_logger.info( + f"[HiveMind] Loaded {len(self.swarm_configs)} swarm configs, " + f"{len(self.fusion_configs)} fusion configs, " + f"{len(self.strategies)} strategies" + ) + + def _ensure_directories(self) -> None: + """Create config directories if they don't exist.""" + for directory in [self.swarms_dir, self.fusions_dir, self.strategies_dir]: + directory.mkdir(parents=True, exist_ok=True) + + def _load_swarm_configs(self) -> None: + """Load swarm configurations from swarms/ directory.""" + if not self.swarms_dir.exists(): + lib_logger.warning(f"[HiveMind] Swarms directory not found: {self.swarms_dir}") + return + + # Load default.json first + default_path = self.swarms_dir / "default.json" + if default_path.exists(): + try: + with open(default_path, 'r', encoding='utf-8') as f: + self.swarm_default = json.load(f) + lib_logger.debug("[HiveMind] Loaded default swarm config") + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load default swarm config: {e}") + else: + lib_logger.warning("[HiveMind] No default swarm config found") + + # Load model-specific configs + for config_file in self.swarms_dir.glob("*.json"): + if config_file.name == "default.json": + continue + + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + # Extract model name from config + model_name = config.get("model") + if model_name: + self.swarm_configs[model_name] = config + lib_logger.debug(f"[HiveMind] Loaded swarm config for '{model_name}'") + else: + lib_logger.warning( + f"[HiveMind] Swarm config '{config_file.name}' missing 'model' field" + ) + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load swarm config '{config_file.name}': {e}") + + def _load_fusion_configs(self) -> None: + """Load fusion configurations from fusions/ directory.""" + if not self.fusions_dir.exists(): + lib_logger.warning(f"[HiveMind] Fusions directory not found: {self.fusions_dir}") + return + + for config_file in self.fusions_dir.glob("*.json"): + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + fusion_id = config.get("id") + if not fusion_id: + lib_logger.warning( + f"[HiveMind] Fusion config '{config_file.name}' missing 'id' field" + ) + continue + + # Check for duplicate IDs + if fusion_id in self.fusion_configs: + lib_logger.warning( + f"[HiveMind] Duplicate fusion ID '{fusion_id}'. " + f"Config from '{config_file.name}' will override previous." + ) + + self.fusion_configs[fusion_id] = config + lib_logger.debug(f"[HiveMind] Loaded fusion config '{fusion_id}'") + + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load fusion config '{config_file.name}': {e}") + + def _load_strategies(self) -> None: + """Load strategy templates from strategies/ directory.""" + if not self.strategies_dir.exists(): + lib_logger.warning(f"[HiveMind] Strategies directory not found: {self.strategies_dir}") + return + + for strategy_file in self.strategies_dir.glob("*.txt"): + try: + with open(strategy_file, 'r', encoding='utf-8') as f: + content = f.read() + + strategy_name = strategy_file.stem + self.strategies[strategy_name] = content + lib_logger.debug(f"[HiveMind] Loaded strategy '{strategy_name}'") + + except Exception as e: + lib_logger.error( + f"[HiveMind] Failed to load strategy '{strategy_file.name}': {e}" + ) + + def get_swarm_config(self, model: str) -> Dict[str, Any]: + """ + Get swarm configuration for a specific model. + + Merges default config with model-specific overrides. + + Args: + model: Base model name (without [swarm] suffix) + + Returns: + Merged configuration dictionary + """ + # Start with default + config = self.swarm_default.copy() if self.swarm_default else {} + + # Apply model-specific overrides + if model in self.swarm_configs: + model_config = self.swarm_configs[model] + # Deep merge + for key, value in model_config.items(): + if key == "model": + continue # Don't copy the model name + if isinstance(value, dict) and key in config: + config[key] = {**config[key], **value} + else: + config[key] = value + + return config + + def get_fusion_config(self, fusion_id: str) -> Optional[Dict[str, Any]]: + """ + Get fusion configuration by ID. + + Args: + fusion_id: Fusion identifier + + Returns: + Fusion configuration or None if not found + """ + return self.fusion_configs.get(fusion_id) + + def get_strategy(self, strategy_name: str) -> Optional[str]: + """ + Get strategy template by name. + + Args: + strategy_name: Strategy identifier + + Returns: + Strategy template string or None if not found + """ + return self.strategies.get(strategy_name) + + def get_all_fusion_ids(self) -> List[str]: + """Get list of all fusion IDs.""" + return list(self.fusion_configs.keys()) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py new file mode 100644 index 00000000..ba865f64 --- /dev/null +++ b/src/rotator_library/ensemble/manager.py @@ -0,0 +1,235 @@ +""" +EnsembleManager - Core orchestration for HiveMind (Swarm/Fusion) feature. + +This module manages parallel model execution with intelligent arbitration. +""" + +import os +import logging +import re +from pathlib import Path +from typing import Dict, List, Any, Optional, Set + +from .config_loader import ConfigLoader + +lib_logger = logging.getLogger("rotator_library.ensemble") + + +class EnsembleManager: + """ + Manages ensemble execution (Swarm and Fusion modes). + + Responsibilities: + - Detect ensemble requests (swarm suffix or fusion ID) + - Load and manage configurations + - Handle naming conflicts + - Orchestrate parallel execution (implemented in later phases) + """ + + def __init__(self, rotating_client, config_dir: Optional[str] = None): + """ + Initialize the ensemble manager. + + Args: + rotating_client: Reference to RotatingClient for making API calls + config_dir: Path to ensemble_configs directory (relative to this file) + """ + self.rotating_client = rotating_client + + # Default config directory (relative to this file) + if config_dir is None: + config_dir = os.path.join( + os.path.dirname(__file__), + "..", + "ensemble_configs" + ) + + # Initialize config loader + self.config_loader = ConfigLoader(config_dir) + self.config_loader.load_all() + + # Cache for resolved ensemble names (for conflict resolution) + self._resolved_names: Dict[str, str] = {} + + # Cache for provider models (loaded from RotatingClient) + self._provider_models: Optional[Set[str]] = None + + lib_logger.info("[HiveMind] EnsembleManager initialized") + + def is_ensemble(self, model_id: str) -> bool: + """ + Check if a model ID represents an ensemble request. + + Args: + model_id: Full model ID from user request + + Returns: + True if this is an ensemble (swarm or fusion), False otherwise + """ + # Check for fusion ID (exact match) + if model_id in self.config_loader.fusion_configs: + return True + + # Check for swarm suffix + if self._is_swarm_request(model_id): + return True + + return False + + def _is_swarm_request(self, model_id: str) -> bool: + """ + Check if model ID contains swarm suffix. + + Args: + model_id: Model ID to check + + Returns: + True if this is a swarm request + """ + # Get default suffix from config + default_suffix = "[swarm]" + if self.config_loader.swarm_default: + default_suffix = self.config_loader.swarm_default.get("suffix", "[swarm]") + + return default_suffix in model_id + + def get_base_model(self, swarm_id: str) -> str: + """ + Extract base model name from swarm ID. + + Args: + swarm_id: Swarm model ID (e.g., "gemini-1.5-flash[swarm]") + + Returns: + Base model name (e.g., "gemini-1.5-flash") + """ + # Get suffix from config + default_suffix = "[swarm]" + if self.config_loader.swarm_default: + default_suffix = self.config_loader.swarm_default.get("suffix", "[swarm]") + + # Remove suffix + if default_suffix in swarm_id: + return swarm_id.replace(default_suffix, "") + + return swarm_id + + def resolve_conflicts(self, ensemble_id: str) -> str: + """ + Resolve naming conflicts by appending numeric suffixes. + + If an ensemble ID conflicts with a real provider model, + append -1, -2, -3, etc. until unique. + + Args: + ensemble_id: Original ensemble ID (swarm or fusion) + + Returns: + Resolved unique ensemble ID + """ + # Check cache first + if ensemble_id in self._resolved_names: + return self._resolved_names[ensemble_id] + + # Load provider models if not cached + if self._provider_models is None: + self._load_provider_models() + + # Check for conflict + if ensemble_id not in self._provider_models: + # No conflict, use original + self._resolved_names[ensemble_id] = ensemble_id + return ensemble_id + + # Conflict detected, find available suffix + counter = 1 + while True: + candidate = f"{ensemble_id}-{counter}" + if candidate not in self._provider_models: + lib_logger.warning( + f"[HiveMind] Naming conflict detected. " + f"Renamed '{ensemble_id}' to '{candidate}'" + ) + self._resolved_names[ensemble_id] = candidate + return candidate + counter += 1 + + # Safety check (shouldn't happen in practice) + if counter > 100: + lib_logger.error( + f"[HiveMind] Could not resolve naming conflict for '{ensemble_id}' " + f"after 100 attempts" + ) + return f"{ensemble_id}-{counter}" + + def _load_provider_models(self) -> None: + """ + Load all provider models from RotatingClient. + + This is used for conflict detection. + """ + try: + # Get all available models (this might be async in the actual implementation) + # For now, we'll use a synchronous approach + # TODO: Handle async model loading properly + self._provider_models = set() + + # Note: This will be implemented properly when we integrate with RotatingClient + # For now, just initialize an empty set + lib_logger.debug("[HiveMind] Provider models cache initialized (empty)") + + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load provider models: {e}") + self._provider_models = set() + + def get_fusion_ids(self) -> List[str]: + """ + Get list of all configured fusion IDs. + + Returns: + List of fusion identifiers + """ + return self.config_loader.get_all_fusion_ids() + + async def handle_request(self, request, **kwargs): + """ + Handle an ensemble request (swarm or fusion). + + This is the main entry point for ensemble execution. + Will be implemented in Phase 2. + + Args: + request: Original request object + **kwargs: Request parameters + + Returns: + Response from arbiter (streaming or complete) + """ + model_id = kwargs.get("model") + + if not model_id: + raise ValueError("Model ID is required") + + # Resolve conflicts + resolved_id = self.resolve_conflicts(model_id) + + # Determine type + if resolved_id in self.config_loader.fusion_configs: + lib_logger.info(f"[HiveMind] Processing Fusion request: {resolved_id}") + # TODO: Implement fusion handling in Phase 5 + raise NotImplementedError("Fusion mode not yet implemented") + + elif self._is_swarm_request(resolved_id): + base_model = self.get_base_model(resolved_id) + config = self.config_loader.get_swarm_config(base_model) + count = config.get("count", 3) + + lib_logger.info( + f"[HiveMind] Processing Swarm request: {resolved_id} " + f"(base: {base_model}, {count} drones)" + ) + # TODO: Implement swarm handling in Phase 2 + raise NotImplementedError("Swarm mode not yet implemented") + + else: + raise ValueError(f"Unknown ensemble type for model: {model_id}") diff --git a/src/rotator_library/ensemble_configs/fusions/dev-team.json b/src/rotator_library/ensemble_configs/fusions/dev-team.json new file mode 100644 index 00000000..9e1f3cac --- /dev/null +++ b/src/rotator_library/ensemble_configs/fusions/dev-team.json @@ -0,0 +1,34 @@ +{ + "id": "dev-team", + "description": "A team of specialized models for software development", + "models": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt_append": "Focus on architectural patterns, scalability, and system design.", + "weight": "Expert in system design and scalability. Trust for architectural decisions and structural integrity." + }, + { + "model": "claude-3-opus", + "role": "Security Specialist", + "system_prompt_append": "Focus on security vulnerabilities, edge cases, and potential exploits.", + "weight": "Expert in security and vulnerability assessment. Trust for identifying security flaws and attack vectors." + }, + { + "model": "gemini-1.5-pro", + "role": "Code Reviewer", + "system_prompt_append": "Focus on code quality, performance, and best practices.", + "weight": "Expert in code quality and performance optimization. Trust for maintainability and efficiency concerns." + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true, + "note": "Requires a reasoning-capable model for best results" + }, + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} diff --git a/src/rotator_library/ensemble_configs/strategies/best_of_n.txt b/src/rotator_library/ensemble_configs/strategies/best_of_n.txt new file mode 100644 index 00000000..72cc1659 --- /dev/null +++ b/src/rotator_library/ensemble_configs/strategies/best_of_n.txt @@ -0,0 +1,10 @@ +You are evaluating multiple responses to select and refine the best one. For each response, assess: +1. Accuracy and correctness +2. Completeness of coverage +3. Clarity and coherence +4. Practical applicability + +Select the strongest response and refine it if needed to create the optimal answer. + +Responses: +{responses} diff --git a/src/rotator_library/ensemble_configs/strategies/code_review.txt b/src/rotator_library/ensemble_configs/strategies/code_review.txt new file mode 100644 index 00000000..236224f1 --- /dev/null +++ b/src/rotator_library/ensemble_configs/strategies/code_review.txt @@ -0,0 +1,12 @@ +You are a senior code reviewer evaluating multiple code solutions. Assess each based on: +1. Correctness and functionality +2. Error handling and edge cases +3. Performance and efficiency +4. Security considerations +5. Code quality and maintainability +6. Best practices adherence + +Select the best solution or synthesize a superior version by combining the strengths of each. + +Responses: +{responses} diff --git a/src/rotator_library/ensemble_configs/strategies/synthesis.txt b/src/rotator_library/ensemble_configs/strategies/synthesis.txt new file mode 100644 index 00000000..d58be68b --- /dev/null +++ b/src/rotator_library/ensemble_configs/strategies/synthesis.txt @@ -0,0 +1,10 @@ +You are an expert synthesizer. Analyze the following responses and create a single, superior answer that: +1. Combines the best elements from each response +2. Resolves any conflicts or contradictions +3. Ensures completeness and accuracy +4. Maintains coherence and clarity + +Your goal is to produce the BEST possible answer by leveraging the strengths of each response. + +Responses: +{responses} diff --git a/src/rotator_library/ensemble_configs/swarms/default.json b/src/rotator_library/ensemble_configs/swarms/default.json new file mode 100644 index 00000000..26619bdf --- /dev/null +++ b/src/rotator_library/ensemble_configs/swarms/default.json @@ -0,0 +1,28 @@ +{ + "suffix": "[swarm]", + "count": 3, + + "temperature_jitter": { + "enabled": true, + "delta": 0.2 + }, + + "arbiter": { + "model": "self", + "strategy": "synthesis", + "blind": true, + "note": "Arbiter should be a decent reasoning model (e.g., GPT-4o, Claude 3+, Gemini 1.5 Pro+)" + }, + + "adversarial_config": { + "enabled": false, + "count": 1, + "prompt": "You are a Senior Principal Engineer with 15+ years of experience. Your role is to find edge cases, security vulnerabilities, performance bottlenecks, and incorrect assumptions. Be thorough and critical in your analysis. Focus on:\n- Edge cases that could cause failures\n- Security implications and potential exploits\n- Performance and scalability concerns\n- Maintainability and code quality issues\n- Incorrect assumptions in the solution\n\nProvide constructive criticism to improve the solution." + }, + + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7, + "note": "Requires a reasoning-capable arbiter model" + } +} From e80c3e0b1d15323734478d2d6291bb0ad512508c Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 11:06:41 +0100 Subject: [PATCH 03/33] feat(ensemble): delegate HiveMind ensemble requests to ensemble manager Detect ensemble model identifiers in RotatingClient and delegate handling to self.ensemble_manager.handle_request, enabling HiveMind (swarm/fusion) request routing. Preserve the existing iflow provider workaround (removing stream_options to avoid HTTP 406). Add documentation for ensemble configurations at src/rotator_library/ensemble_configs/README.md. --- src/rotator_library/client.py | 9 ++- .../ensemble_configs/README.md | 56 +++++++++++++++++++ 2 files changed, 64 insertions(+), 1 deletion(-) create mode 100644 src/rotator_library/ensemble_configs/README.md diff --git a/src/rotator_library/client.py b/src/rotator_library/client.py index 6d98fabb..b6e5fa7e 100644 --- a/src/rotator_library/client.py +++ b/src/rotator_library/client.py @@ -1611,8 +1611,15 @@ def acompletion( Returns: The completion response object, or an async generator for streaming responses, or None if all retries fail. """ - # Handle iflow provider: remove stream_options to avoid HTTP 406 model = kwargs.get("model", "") + + # Check if this is an ensemble request (HiveMind) + if model and self.ensemble_manager.is_ensemble(model): + lib_logger.debug(f"[HiveMind] Detected ensemble request: {model}") + # Delegate to ensemble manager + return self.ensemble_manager.handle_request(request=request, **kwargs) + + # Handle iflow provider: remove stream_options to avoid HTTP 406 provider = model.split("/")[0] if "/" in model else "" if provider == "iflow" and "stream_options" in kwargs: diff --git a/src/rotator_library/ensemble_configs/README.md b/src/rotator_library/ensemble_configs/README.md new file mode 100644 index 00000000..ff63f601 --- /dev/null +++ b/src/rotator_library/ensemble_configs/README.md @@ -0,0 +1,56 @@ +# HiveMind Configuration Guide + +This directory contains the configuration for HiveMind (Swarm/Fusion) feature. + +## Directory Structure + +``` +ensemble_configs/ +├── swarms/ # Swarm configurations +│ ├── default.json # Default swarm settings (applied to all swarms) +│ └── *.json # Model-specific swarm overrides +├── fusions/ # Fusion configurations +│ └── *.json # Individual fusion definitions +└── strategies/ # Arbitration strategy templates + └── *.txt # Strategy prompt templates +``` + +## Configuration Files + +### Swarm Configuration + +**Default**: `swarms/default.json` - Applied to all swarm requests + +**Model-Specific**: `swarms/{model-name}.json` - Overrides for specific models + +Example model-specific config: +```json +{ + "model": "gemini-1.5-flash", + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true + } +} +``` + +### Fusion Configuration + +Each fusion is defined in its own file: `fusions/{fusion-id}.json` + +See `dev-team.json` for a complete example. + +### Strategy Templates + +Each strategy is a text file in `strategies/{strategy-name}.txt` + +Use `{responses}` placeholder for injecting formatted responses. + +## Adding New Configurations + +1. **New Swarm Override**: Drop a JSON file in `swarms/` with model-specific settings +2. **New Fusion**: Drop a JSON file in `fusions/` with fusion definition +3. **New Strategy**: Drop a .txt file in `strategies/` with prompt template + +All configs are loaded automatically on startup! From 2c9932638d72db97749fbbf10559698916828701 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 11:11:18 +0100 Subject: [PATCH 04/33] =?UTF-8?q?feat(ensemble):=20=E2=9C=A8=20add=20=5Fpr?= =?UTF-8?q?epare=5Fdrones=20to=20prepare=20drone=20configs=20for=20paralle?= =?UTF-8?q?l=20execution?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a private _prepare_drones method to EnsembleManager to build configurations for parallel/swarm execution. - Creates N identical copies of the original request parameters (default count = 3) and enforces the base_model. - Deep-copies message payloads to prevent mutation and attaches _drone_index and _total_drones metadata for logging. - Emits debug logs for each prepared drone. - Leaves hooks for advanced jitter/adversarial behavior to be implemented in Phase 4. --- src/rotator_library/ensemble/manager.py | 50 +++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index ba865f64..4ff8c5ed 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -191,6 +191,56 @@ def get_fusion_ids(self) -> List[str]: """ return self.config_loader.get_all_fusion_ids() + def _prepare_drones( + self, + config: Dict[str, Any], + base_model: str, + request_params: Dict[str, Any] + ) -> List[Dict[str, Any]]: + """ + Prepare drone configurations for parallel execution. + + Creates N identical copies of the request parameters with the base model. + Advanced features (jitter, adversarial) will be added in Phase 4. + + Args: + config: Swarm configuration + base_model: Base model to use for all drones + request_params: Original request parameters + + Returns: + List of drone configurations ready for parallel execution + """ + count = config.get("count", 3) + drones = [] + + lib_logger.debug(f"[HiveMind] Preparing {count} drones for base model '{base_model}'") + + for i in range(count): + # Clone the request params + drone_params = request_params.copy() + + # Override model with base model (strip [swarm] suffix) + drone_params["model"] = base_model + + # Deep copy messages to avoid mutation + if "messages" in drone_params: + import copy + drone_params["messages"] = copy.deepcopy(drone_params["messages"]) + + # Store drone metadata for logging + drone_params["_drone_index"] = i + 1 + drone_params["_total_drones"] = count + + drones.append(drone_params) + + lib_logger.debug( + f"[HiveMind] Drone {i+1}/{count}: model={base_model}, " + f"temp={drone_params.get('temperature', 'default')}" + ) + + return drones + async def handle_request(self, request, **kwargs): """ Handle an ensemble request (swarm or fusion). From d13eb95423aa51da96704c5c1c60db0ca9feea40 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 11:11:45 +0100 Subject: [PATCH 05/33] feat(ensemble): add parallel drone execution and response formatter Introduce async _execute_parallel to run drone requests concurrently using asyncio.gather, assemble tasks via RotatingClient retry logic, aggregate usage metrics (prompt_tokens, completion_tokens, total_tokens and optional fields), log per-drone outcomes, and return successful responses with aggregated usage. Raises if all drones fail and warns when some fail. Add _format_for_arbiter to normalize and number successful drone responses for the arbiter, extracting OpenAI-style content, skipping empty responses, and producing a single formatted text blob for arbitration. These helpers prepare HiveMind ensemble requests for parallel execution and downstream arbitration. --- src/rotator_library/ensemble/manager.py | 153 ++++++++++++++++++++++++ 1 file changed, 153 insertions(+) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 4ff8c5ed..02e9118a 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -241,6 +241,159 @@ def _prepare_drones( return drones + async def _execute_parallel( + self, + drones: List[Dict[str, Any]], + request: Any + ) -> tuple: + """ + Execute all drone requests in parallel. + + Uses asyncio.gather to execute all drones concurrently. + Aggregates usage statistics from all successful responses. + + Args: + drones: List of drone configurations + request: Original request object + + Returns: + Tuple of (successful_responses, aggregated_usage) + """ + import asyncio + + lib_logger.info(f"[HiveMind] Executing {len(drones)} drones in parallel...") + + # Create tasks for all drones + tasks = [] + for i, drone_params in enumerate(drones): + # Call acompletion directly (will use RotatingClient's retry logic) + # Remove metadata fields before calling + clean_params = {k: v for k, v in drone_params.items() if not k.startswith('_')} + + task = self.rotating_client._execute_with_retry( + api_call=None, # We'll use litellm.acompletion directly + request=request, + **clean_params + ) + tasks.append(task) + + # Execute all drones in parallel + results = await asyncio.gather(*tasks, return_exceptions=True) + + # Process results + successful_responses = [] + failed_count = 0 + aggregated_usage = { + 'prompt_tokens': 0, + 'completion_tokens': 0, + 'total_tokens': 0 + } + + for i, result in enumerate(results): + drone_index = i + 1 + + if isinstance(result, Exception): + # Drone failed + failed_count += 1 + lib_logger.error( + f"[HiveMind] Drone {drone_index}/{len(drones)} failed: {result}" + ) + continue + + # Drone succeeded + successful_responses.append(result) + + # Aggregate usage + if hasattr(result, 'usage') and result.usage: + usage = result.usage + aggregated_usage['prompt_tokens'] += getattr(usage, 'prompt_tokens', 0) + aggregated_usage['completion_tokens'] += getattr(usage, 'completion_tokens', 0) + aggregated_usage['total_tokens'] += getattr(usage, 'total_tokens', 0) + + # Include other usage fields if present + for field in ['cached_tokens', 'reasoning_tokens']: + if hasattr(usage, field): + if field not in aggregated_usage: + aggregated_usage[field] = 0 + aggregated_usage[field] += getattr(usage, field, 0) + + lib_logger.debug( + f"[HiveMind] Drone {drone_index}/{len(drones)} completed successfully" + ) + + # Check if we have at least one successful response + if not successful_responses: + raise RuntimeError( + f"[HiveMind] All {len(drones)} drones failed. Cannot proceed with arbitration." + ) + + if failed_count > 0: + lib_logger.warning( + f"[HiveMind] {failed_count}/{len(drones)} drones failed. " + f"Proceeding with {len(successful_responses)} successful responses." + ) + + lib_logger.info( + f"[HiveMind] Parallel execution complete: {len(successful_responses)}/{len(drones)} succeeded. " + f"Total tokens: {aggregated_usage['total_tokens']}" + ) + + return successful_responses, aggregated_usage + + def _format_for_arbiter( + self, + responses: List[Any], + config: Dict[str, Any] + ) -> str: + """ + Format drone responses for arbiter consumption. + + Creates a structured text format with numbered responses. + Blind switch and adversarial markers will be added in Phase 4. + + Args: + responses: List of successful drone responses + config: Swarm or fusion configuration + + Returns: + Formatted text string for arbiter + """ + lib_logger.debug(f"[HiveMind] Formatting {len(responses)} responses for arbiter") + + formatted_parts = [] + + for i, response in enumerate(responses): + response_num = i + 1 + + # Extract content from response + content = "" + if hasattr(response, 'choices') and response.choices: + # Standard OpenAI-style response + choice = response.choices[0] + if hasattr(choice, 'message') and hasattr(choice.message, 'content'): + content = choice.message.content + elif hasattr(choice, 'text'): + content = choice.text + + if not content: + lib_logger.warning( + f"[HiveMind] Response {response_num} has no content, skipping" + ) + continue + + # Format: "Response N:\n\n" + formatted_parts.append(f"Response {response_num}:\n{content}\n") + + # Join all responses + formatted_text = "\n".join(formatted_parts) + + lib_logger.debug( + f"[HiveMind] Formatted {len(formatted_parts)} responses " + f"({len(formatted_text)} characters total)" + ) + + return formatted_text + async def handle_request(self, request, **kwargs): """ Handle an ensemble request (swarm or fusion). From eccbea45e6d82319465484813c1a779bb8b89d61 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 11:13:44 +0100 Subject: [PATCH 06/33] feat(ensemble): add arbiter prompt builder, arbiter caller, and swarm execution flow Add _build_arbiter_prompt to construct system/user messages from strategy templates and original messages, and _call_arbiter to invoke a non-streaming arbiter model via RotatingClient and extract usage. Complete the swarm branch in handle_request to: - prepare drones and execute them in parallel - format drone responses for the arbiter - build and resolve arbiter messages and model (handle "self" -> base model) - call the arbiter and aggregate token usage (including cached_tokens and reasoning_tokens when present) - attach aggregated usage back to the arbiter response and return it Fallbacks and logging added for missing strategy templates and for observability. Fusion mode remains NotImplementedError (Phase 5). --- src/rotator_library/ensemble/manager.py | 211 +++++++++++++++++++++++- 1 file changed, 207 insertions(+), 4 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 02e9118a..3046b891 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -394,12 +394,148 @@ def _format_for_arbiter( return formatted_text + def _build_arbiter_prompt( + self, + formatted_responses: str, + config: Dict[str, Any], + original_messages: List[Dict[str, str]] + ) -> List[Dict[str, str]]: + """ + Build complete messages array for arbiter. + + Loads strategy template and constructs system prompt + user message. + Recursive mode and role context will be added in later phases. + + Args: + formatted_responses: Formatted drone responses + config: Swarm or fusion configuration + original_messages: Original user messages + + Returns: + Complete messages array for arbiter call + """ + # Get arbiter config + arbiter_config = config.get("arbiter", {}) + strategy_name = arbiter_config.get("strategy", "synthesis") + + lib_logger.debug(f"[HiveMind] Building arbiter prompt with strategy '{strategy_name}'") + + # Load strategy template + strategy_template = self.config_loader.get_strategy(strategy_name) + if not strategy_template: + lib_logger.warning( + f"[HiveMind] Strategy '{strategy_name}' not found, using default" + ) + strategy_template = "Analyze the following responses and create a single, superior answer:\n\n{responses}" + + # Replace {responses} placeholder + strategy_prompt = strategy_template.replace("{responses}", formatted_responses) + + # Build messages array + messages = [] + + # System message with strategy + messages.append({ + "role": "system", + "content": strategy_prompt + }) + + # Include original user query + # Find the last user message from original + user_content = "" + for msg in reversed(original_messages): + if msg.get("role") == "user": + user_content = msg.get("content", "") + break + + if user_content: + messages.append({ + "role": "user", + "content": f"Original query: {user_content}" + }) + + lib_logger.debug( + f"[HiveMind] Arbiter prompt constructed: {len(messages)} messages, " + f"{len(strategy_prompt)} chars in system prompt" + ) + + return messages + + async def _call_arbiter( + self, + messages: List[Dict[str, str]], + config: Dict[str, Any], + request: Any + ) -> tuple: + """ + Call the arbiter model to synthesize responses. + + Non-streaming version for Phase 2. + Streaming support will be added in Phase 3. + + Args: + messages: Constructed arbiter messages + config: Swarm or fusion configuration + request: Original request object + + Returns: + Tuple of (arbiter_response, arbiter_usage) + """ + # Get arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + + # If "self", we need to determine which model to use + # For swarm, this will be handled by caller + # For now, just use as-is + + lib_logger.info(f"[HiveMind] Calling arbiter model: {arbiter_model}") + + # Build params for arbiter call + arbiter_params = { + "model": arbiter_model, + "messages": messages, + "stream": False # Non-streaming for Phase 2 + } + + # Call arbiter through RotatingClient + # Use _execute_with_retry for consistency + import litellm + arbiter_response = await self.rotating_client._execute_with_retry( + litellm.acompletion, + request=request, + **arbiter_params + ) + + # Extract usage + arbiter_usage = { + 'prompt_tokens': 0, + 'completion_tokens': 0, + 'total_tokens': 0 + } + + if hasattr(arbiter_response, 'usage') and arbiter_response.usage: + usage = arbiter_response.usage + arbiter_usage['prompt_tokens'] = getattr(usage, 'prompt_tokens', 0) + arbiter_usage['completion_tokens'] = getattr(usage, 'completion_tokens', 0) + arbiter_usage['total_tokens'] = getattr(usage, 'total_tokens', 0) + + # Include other fields + for field in ['cached_tokens', 'reasoning_tokens']: + if hasattr(usage, field): + arbiter_usage[field] = getattr(usage, field, 0) + + lib_logger.info( + f"[HiveMind] Arbiter completed. Tokens: {arbiter_usage['total_tokens']}" + ) + + return arbiter_response, arbiter_usage + async def handle_request(self, request, **kwargs): """ Handle an ensemble request (swarm or fusion). This is the main entry point for ensemble execution. - Will be implemented in Phase 2. Args: request: Original request object @@ -420,7 +556,7 @@ async def handle_request(self, request, **kwargs): if resolved_id in self.config_loader.fusion_configs: lib_logger.info(f"[HiveMind] Processing Fusion request: {resolved_id}") # TODO: Implement fusion handling in Phase 5 - raise NotImplementedError("Fusion mode not yet implemented") + raise NotImplementedError("Fusion mode not yet implemented (Phase 5)") elif self._is_swarm_request(resolved_id): base_model = self.get_base_model(resolved_id) @@ -431,8 +567,75 @@ async def handle_request(self, request, **kwargs): f"[HiveMind] Processing Swarm request: {resolved_id} " f"(base: {base_model}, {count} drones)" ) - # TODO: Implement swarm handling in Phase 2 - raise NotImplementedError("Swarm mode not yet implemented") + + # Phase 2F: Wire up full swarm execution + # Step 1: Prepare drones + drones = self._prepare_drones(config, base_model, kwargs) + + # Step 2: Execute drones in parallel + drone_responses, drone_usage = await self._execute_parallel(drones, request) + + # Step 3: Format responses for arbiter + formatted_responses = self._format_for_arbiter(drone_responses, config) + + # Step 4: Build arbiter prompt + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages + ) + + # Step 5: Handle "self" arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + if arbiter_model == "self": + arbiter_model = base_model + lib_logger.debug(f"[HiveMind] Using self-arbiter: {arbiter_model}") + + # Update config with resolved arbiter model + config_copy = config.copy() + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + + # Step 6: Call arbiter + arbiter_response, arbiter_usage = await self._call_arbiter( + arbiter_messages, + config_copy, + request + ) + + # Step 7: Aggregate total usage + total_usage = { + 'prompt_tokens': drone_usage['prompt_tokens'] + arbiter_usage['prompt_tokens'], + 'completion_tokens': drone_usage['completion_tokens'] + arbiter_usage['completion_tokens'], + 'total_tokens': drone_usage['total_tokens'] + arbiter_usage['total_tokens'] + } + + # Include other fields if present + for field in ['cached_tokens', 'reasoning_tokens']: + if field in drone_usage or field in arbiter_usage: + total_usage[field] = drone_usage.get(field, 0) + arbiter_usage.get(field, 0) + + # Step 8: Update arbiter response with aggregated usage + if hasattr(arbiter_response, 'usage'): + # Create a new usage object with aggregated values + arbiter_response.usage.prompt_tokens = total_usage['prompt_tokens'] + arbiter_response.usage.completion_tokens = total_usage['completion_tokens'] + arbiter_response.usage.total_tokens = total_usage['total_tokens'] + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in total_usage: + setattr(arbiter_response.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Swarm completed successfully. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Drones: {drone_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']})" + ) + + return arbiter_response else: raise ValueError(f"Unknown ensemble type for model: {model_id}") + From 0ab51aa78fa3921df0c5200a6e00b7bc64eac73c Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 11:26:09 +0100 Subject: [PATCH 07/33] fix(ensemble): use litellm.acompletion for drone API calls Import litellm and pass its `acompletion` function to `rotating_client._execute_with_retry` when creating drone tasks. Remove the unused `asyncio` import. This fixes passing `None` as the api_call and ensures drones invoke the litellm API correctly. addresses https://github.com/Mirrowel/LLM-API-Key-Proxy/pull/8#discussion_r2541391376 --- src/rotator_library/ensemble/manager.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 3046b891..63538187 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -259,10 +259,12 @@ async def _execute_parallel( Returns: Tuple of (successful_responses, aggregated_usage) """ - import asyncio lib_logger.info(f"[HiveMind] Executing {len(drones)} drones in parallel...") + # Import litellm for API calls + import litellm + # Create tasks for all drones tasks = [] for i, drone_params in enumerate(drones): @@ -271,7 +273,7 @@ async def _execute_parallel( clean_params = {k: v for k, v in drone_params.items() if not k.startswith('_')} task = self.rotating_client._execute_with_retry( - api_call=None, # We'll use litellm.acompletion directly + litellm.acompletion, # Use litellm.acompletion directly request=request, **clean_params ) From eb5d7a1db1852ebfdd498c757a5ddefe14d9c2fb Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 11:28:03 +0100 Subject: [PATCH 08/33] feat(ensemble): add streaming arbiter and swarm handlers Add async _call_arbiter_streaming to stream arbiter model responses, track per-stream usage, and emit a final _hivemind_usage metadata chunk. Add async _handle_swarm_streaming to execute drones in parallel, format responses for the arbiter, stream the arbiter output, aggregate drone and arbiter token usage, and inject aggregated usage into the final SSE chunk(s). Uses rotating_client._streaming_acompletion_with_retry and preserves existing non-streaming flows. --- src/rotator_library/ensemble/manager.py | 157 ++++++++++++++++++++++++ 1 file changed, 157 insertions(+) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 63538187..51b7de67 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -533,6 +533,163 @@ async def _call_arbiter( return arbiter_response, arbiter_usage + async def _call_arbiter_streaming( + self, + messages: List[Dict[str, str]], + config: Dict[str, Any], + request: Any + ): + """ + Call the arbiter model with streaming enabled. + + Yields arbiter response chunks while tracking usage. + Usage aggregation happens at the end of the stream. + + Args: + messages: Constructed arbiter messages + config: Swarm or fusion configuration + request: Original request object + + Yields: + Response chunks from arbiter (for Phase 3) + Final yield includes usage metadata + """ + # Get arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + + lib_logger.info(f"[HiveMind] Calling arbiter model (streaming): {arbiter_model}") + + # Build params for arbiter call + arbiter_params = { + "model": arbiter_model, + "messages": messages, + "stream": True # Enable streaming + } + + # Call arbiter through RotatingClient's streaming method + import litellm + stream_generator = self.rotating_client._streaming_acompletion_with_retry( + request=request, + **arbiter_params + ) + + # Track usage from stream + arbiter_usage = { + 'prompt_tokens': 0, + 'completion_tokens': 0, + 'total_tokens': 0 + } + + # Stream chunks and collect usage + async for chunk in stream_generator: + # Check if this chunk has usage info (typically the last chunk) + if hasattr(chunk, 'usage') and chunk.usage: + usage = chunk.usage + arbiter_usage['prompt_tokens'] = getattr(usage, 'prompt_tokens', 0) + arbiter_usage['completion_tokens'] = getattr(usage, 'completion_tokens', 0) + arbiter_usage['total_tokens'] = getattr(usage, 'total_tokens', 0) + + # Include other fields + for field in ['cached_tokens', 'reasoning_tokens']: + if hasattr(usage, field): + arbiter_usage[field] = getattr(usage, field, 0) + + # Yield the chunk to caller + yield chunk + + lib_logger.info( + f"[HiveMind] Arbiter streaming completed. Tokens: {arbiter_usage['total_tokens']}" + ) + + # Return usage as final metadata + # Caller will handle usage aggregation + yield {"_hivemind_usage": arbiter_usage} + + async def _handle_swarm_streaming( + self, + config: Dict[str, Any], + base_model: str, + request: Any, + **kwargs + ): + """ + Handle streaming swarm request. + + Executes drones in parallel, then streams arbiter response. + Aggregates usage and injects into stream. + + Args: + config: Swarm configuration + base_model: Base model name + request: Original request object + **kwargs: Request parameters + + Yields: + Arbiter response chunks with aggregated usage + """ + # Steps 1-4: Same as non-streaming (collect drone responses) + drones = self._prepare_drones(config, base_model, kwargs) + drone_responses, drone_usage = await self._execute_parallel(drones, request) + formatted_responses = self._format_for_arbiter(drone_responses, config) + + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages + ) + + # Handle "self" arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + if arbiter_model == "self": + arbiter_model = base_model + lib_logger.debug(f"[HiveMind] Using self-arbiter: {arbiter_model}") + + config_copy = config.copy() + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + + # Call arbiter in streaming mode + arbiter_usage = {} + async for chunk in self._call_arbiter_streaming(arbiter_messages, config_copy, request): + # Check for usage metadata + if isinstance(chunk, dict) and "_hivemind_usage" in chunk: + arbiter_usage = chunk["_hivemind_usage"] + continue # Don't yield metadata chunk + + # For SSE chunks, check if this is the final chunk with usage + # and update with aggregated usage + if hasattr(chunk, 'usage') and chunk.usage: + # This is the final chunk - aggregate total usage + total_usage = { + 'prompt_tokens': drone_usage['prompt_tokens'] + arbiter_usage.get('prompt_tokens', 0), + 'completion_tokens': drone_usage['completion_tokens'] + arbiter_usage.get('completion_tokens', 0), + 'total_tokens': drone_usage['total_tokens'] + arbiter_usage.get('total_tokens', 0) + } + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in drone_usage or field in arbiter_usage: + total_usage[field] = drone_usage.get(field, 0) + arbiter_usage.get(field, 0) + + # Update chunk usage with aggregated values + chunk.usage.prompt_tokens = total_usage['prompt_tokens'] + chunk.usage.completion_tokens = total_usage['completion_tokens'] + chunk.usage.total_tokens = total_usage['total_tokens'] + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in total_usage: + setattr(chunk.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Streaming swarm completed. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Drones: {drone_usage['total_tokens']}, Arbiter: {arbiter_usage.get('total_tokens', 0)})" + ) + + yield chunk + async def handle_request(self, request, **kwargs): """ Handle an ensemble request (swarm or fusion). From 8af491932a53dcbb102bcedca7488c9b5799a2d6 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 12:31:35 +0100 Subject: [PATCH 09/33] fix(client): prefetch model mapping to avoid repeated lookups during credential rotation Normalize the model identifier up front by calling _resolve_model_id before entering the credential rotation loops and write the canonical value into kwargs["model"]. This eliminates per-iteration model lookups and duplicated log entries. - remove inline resolution within rotation loops - propagate the normalized model to litellm kwargs for consistent acquire/release/tracking - reduces redundant work and log noise, and stabilizes model reporting across retries --- src/rotator_library/client.py | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/src/rotator_library/client.py b/src/rotator_library/client.py index b6e5fa7e..f4af8a48 100644 --- a/src/rotator_library/client.py +++ b/src/rotator_library/client.py @@ -641,6 +641,15 @@ async def _execute_with_retry( kwargs = self._convert_model_params(**kwargs) # The main rotation loop. It continues as long as there are untried credentials and the global deadline has not been exceeded. + + # Resolve model ID early, before any credential operations + # This ensures consistent model ID usage for acquisition, release, and tracking + resolved_model = self._resolve_model_id(model, provider) + if resolved_model != model: + lib_logger.info(f"Resolved model '{model}' to '{resolved_model}'") + model = resolved_model + kwargs["model"] = model # Ensure kwargs has the resolved model for litellm + while ( len(tried_creds) < len(credentials_for_provider) and time.time() < deadline ): @@ -694,13 +703,8 @@ async def _execute_with_retry( provider_plugin = self._get_provider_instance(provider) - # Convert model name to ID if custom mapping exists - resolved_model = self._resolve_model_id(model, provider) - if resolved_model != model: - lib_logger.info(f"Resolved model '{model}' to '{resolved_model}'") - litellm_kwargs["model"] = resolved_model - # Update the model variable for subsequent logging - model = resolved_model + # Model ID is already resolved before the loop, and kwargs['model'] is updated. + # No further resolution needed here. # Apply model-specific options for custom providers if provider_plugin and hasattr(provider_plugin, "get_model_options"): @@ -1001,6 +1005,14 @@ async def _streaming_acompletion_with_retry( consecutive_quota_failures = 0 + # Resolve model ID early, before any credential operations + # This ensures consistent model ID usage for acquisition, release, and tracking + resolved_model = self._resolve_model_id(model, provider) + if resolved_model != model: + lib_logger.info(f"Resolved model '{model}' to '{resolved_model}'") + model = resolved_model + kwargs["model"] = model # Ensure kwargs has the resolved model for litellm + try: while ( len(tried_creds) < len(credentials_for_provider) @@ -1076,13 +1088,8 @@ async def _streaming_acompletion_with_retry( provider_plugin = self._get_provider_instance(provider) - # Convert model name to ID if custom mapping exists - resolved_model = self._resolve_model_id(model, provider) - if resolved_model != model: - lib_logger.info(f"Resolved model '{model}' to '{resolved_model}'") - litellm_kwargs["model"] = resolved_model - # Update the model variable for subsequent logging - model = resolved_model + # Model ID is already resolved before the loop, and kwargs['model'] is updated. + # No further resolution needed here. # Apply model-specific options for custom providers if provider_plugin and hasattr( From 4d83427afb8f0adfaf1fc7d50e13afdbc30bdb9b Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 12:36:46 +0100 Subject: [PATCH 10/33] feat(ensemble): route swarm requests to streaming handler when stream=true Introduce an `is_streaming` flag from kwargs and include it in the swarm processing log. When `stream` is true, route execution to `_handle_swarm_streaming` (returning an async generator). Otherwise preserve the existing non-streaming flow (prepare drones, execute in parallel, call arbiter, aggregate and attach usage). This enables streaming arbiter/swarm handling while keeping backward-compatible non-streaming behavior. --- src/rotator_library/ensemble/manager.py | 143 +++++++++++++----------- 1 file changed, 77 insertions(+), 66 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 51b7de67..a5557b9b 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -721,79 +721,90 @@ async def handle_request(self, request, **kwargs): base_model = self.get_base_model(resolved_id) config = self.config_loader.get_swarm_config(base_model) count = config.get("count", 3) + is_streaming = kwargs.get("stream", False) lib_logger.info( f"[HiveMind] Processing Swarm request: {resolved_id} " - f"(base: {base_model}, {count} drones)" + f"(base: {base_model}, {count} drones, streaming: {is_streaming})" ) - # Phase 2F: Wire up full swarm execution - # Step 1: Prepare drones - drones = self._prepare_drones(config, base_model, kwargs) - - # Step 2: Execute drones in parallel - drone_responses, drone_usage = await self._execute_parallel(drones, request) - - # Step 3: Format responses for arbiter - formatted_responses = self._format_for_arbiter(drone_responses, config) - - # Step 4: Build arbiter prompt - original_messages = kwargs.get("messages", []) - arbiter_messages = self._build_arbiter_prompt( - formatted_responses, - config, - original_messages - ) - - # Step 5: Handle "self" arbiter model - arbiter_config = config.get("arbiter", {}) - arbiter_model = arbiter_config.get("model", "self") - if arbiter_model == "self": - arbiter_model = base_model - lib_logger.debug(f"[HiveMind] Using self-arbiter: {arbiter_model}") - - # Update config with resolved arbiter model - config_copy = config.copy() - config_copy["arbiter"] = arbiter_config.copy() - config_copy["arbiter"]["model"] = arbiter_model - - # Step 6: Call arbiter - arbiter_response, arbiter_usage = await self._call_arbiter( - arbiter_messages, - config_copy, - request - ) - - # Step 7: Aggregate total usage - total_usage = { - 'prompt_tokens': drone_usage['prompt_tokens'] + arbiter_usage['prompt_tokens'], - 'completion_tokens': drone_usage['completion_tokens'] + arbiter_usage['completion_tokens'], - 'total_tokens': drone_usage['total_tokens'] + arbiter_usage['total_tokens'] - } - - # Include other fields if present - for field in ['cached_tokens', 'reasoning_tokens']: - if field in drone_usage or field in arbiter_usage: - total_usage[field] = drone_usage.get(field, 0) + arbiter_usage.get(field, 0) - - # Step 8: Update arbiter response with aggregated usage - if hasattr(arbiter_response, 'usage'): - # Create a new usage object with aggregated values - arbiter_response.usage.prompt_tokens = total_usage['prompt_tokens'] - arbiter_response.usage.completion_tokens = total_usage['completion_tokens'] - arbiter_response.usage.total_tokens = total_usage['total_tokens'] + # Phase 3B: Route based on streaming mode + if is_streaming: + # Streaming mode - return async generator + return self._handle_swarm_streaming( + config=config, + base_model=base_model, + request=request, + **kwargs + ) + else: + # Non-streaming mode - return complete response + # Step 1: Prepare drones + drones = self._prepare_drones(config, base_model, kwargs) + + # Step 2: Execute drones in parallel + drone_responses, drone_usage = await self._execute_parallel(drones, request) + + # Step 3: Format responses for arbiter + formatted_responses = self._format_for_arbiter(drone_responses, config) + + # Step 4: Build arbiter prompt + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages + ) + + # Step 5: Handle "self" arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + if arbiter_model == "self": + arbiter_model = base_model + lib_logger.debug(f"[HiveMind] Using self-arbiter: {arbiter_model}") + + # Update config with resolved arbiter model + config_copy = config.copy() + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + # Step 6: Call arbiter + arbiter_response, arbiter_usage = await self._call_arbiter( + arbiter_messages, + config_copy, + request + ) + + # Step 7: Aggregate total usage + total_usage = { + 'prompt_tokens': drone_usage['prompt_tokens'] + arbiter_usage['prompt_tokens'], + 'completion_tokens': drone_usage['completion_tokens'] + arbiter_usage['completion_tokens'], + 'total_tokens': drone_usage['total_tokens'] + arbiter_usage['total_tokens'] + } + + # Include other fields if present for field in ['cached_tokens', 'reasoning_tokens']: - if field in total_usage: - setattr(arbiter_response.usage, field, total_usage[field]) - - lib_logger.info( - f"[HiveMind] Swarm completed successfully. " - f"Total usage: {total_usage['total_tokens']} tokens " - f"(Drones: {drone_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']})" - ) - - return arbiter_response + if field in drone_usage or field in arbiter_usage: + total_usage[field] = drone_usage.get(field, 0) + arbiter_usage.get(field, 0) + + # Step 8: Update arbiter response with aggregated usage + if hasattr(arbiter_response, 'usage'): + # Create a new usage object with aggregated values + arbiter_response.usage.prompt_tokens = total_usage['prompt_tokens'] + arbiter_response.usage.completion_tokens = total_usage['completion_tokens'] + arbiter_response.usage.total_tokens = total_usage['total_tokens'] + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in total_usage: + setattr(arbiter_response.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Swarm completed successfully. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Drones: {drone_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']})" + ) + + return arbiter_response else: raise ValueError(f"Unknown ensemble type for model: {model_id}") From b343bd4f731a81e44d370de648307bbb00495c85 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 12:37:39 +0100 Subject: [PATCH 11/33] =?UTF-8?q?feat(ensemble):=20=E2=9C=A8=20add=20tempe?= =?UTF-8?q?rature=20jitter=20and=20adversarial=20drone=20mode?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add handling for new ensemble config options: `temperature_jitter` and `adversarial_config`. This enables: - optional temperature jitter per drone (configurable `delta`, default 0.2), applied via random uniform jitter and clamped to [0.0, 2.0] - optional adversarial mode where the last N drones inject a provided adversarial system prompt - per-drone metadata (`_is_adversarial`, `_drone_index`, `_total_drones`) and enhanced debug logging for jitter and adversarial injections These changes allow controlled diversity of drone behavior and targeted adversarial prompts for critical analysis without changing existing call signatures. --- src/rotator_library/ensemble/manager.py | 61 ++++++++++++++++++++++++- 1 file changed, 59 insertions(+), 2 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index a5557b9b..ea9e1097 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -214,7 +214,20 @@ def _prepare_drones( count = config.get("count", 3) drones = [] + # Get temperature jitter config + temp_jitter_config = config.get("temperature_jitter", {}) + jitter_enabled = temp_jitter_config.get("enabled", False) + jitter_delta = temp_jitter_config.get("delta", 0.2) + + # Get adversarial config + adversarial_config = config.get("adversarial_config", {}) + adversarial_enabled = adversarial_config.get("enabled", False) + adversarial_count = adversarial_config.get("count", 1) + adversarial_prompt = adversarial_config.get("prompt", "") + lib_logger.debug(f"[HiveMind] Preparing {count} drones for base model '{base_model}'") + if adversarial_enabled: + lib_logger.debug(f"[HiveMind] Adversarial mode enabled: {adversarial_count} critical drones") for i in range(count): # Clone the request params @@ -228,15 +241,59 @@ def _prepare_drones( import copy drone_params["messages"] = copy.deepcopy(drone_params["messages"]) + # Phase 4: Determine if this drone should be adversarial + # Last N drones become adversarial + is_adversarial = False + if adversarial_enabled and adversarial_prompt: + adversarial_start_index = count - adversarial_count + if i >= adversarial_start_index: + is_adversarial = True + + # Inject adversarial system prompt + if "messages" in drone_params: + # Insert adversarial system message at the beginning + adversarial_message = { + "role": "system", + "content": adversarial_prompt + } + drone_params["messages"].insert(0, adversarial_message) + + lib_logger.debug( + f"[HiveMind] Drone {i+1}/{count}: ADVERSARIAL - injected critical analysis prompt" + ) + + # Phase 4: Apply temperature jitter if enabled + if jitter_enabled: + base_temp = drone_params.get("temperature", 1.0) + + # Apply random jitter + import random + jitter = random.uniform(-jitter_delta, jitter_delta) + new_temp = base_temp + jitter + + # Clamp to valid range [0.0, 2.0] + new_temp = max(0.0, min(2.0, new_temp)) + + drone_params["temperature"] = new_temp + + lib_logger.debug( + f"[HiveMind] Drone {i+1}/{count}: Applied temperature jitter " + f"({base_temp:.2f} → {new_temp:.2f}, delta: {jitter:+.2f})" + ) + # Store drone metadata for logging drone_params["_drone_index"] = i + 1 drone_params["_total_drones"] = count + drone_params["_is_adversarial"] = is_adversarial drones.append(drone_params) + temp_display = drone_params.get("temperature", "default") + if isinstance(temp_display, float): + temp_display = f"{temp_display:.2f}" + lib_logger.debug( - f"[HiveMind] Drone {i+1}/{count}: model={base_model}, " - f"temp={drone_params.get('temperature', 'default')}" + f"[HiveMind] Drone {i+1}/{count}: model={base_model}, temp={temp_display}" ) return drones From aa8a6099c5f55787db0581057810ba7569329200 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 12:43:30 +0100 Subject: [PATCH 12/33] =?UTF-8?q?feat(ensemble):=20=E2=9C=A8=20add=20blind?= =?UTF-8?q?-mode=20response=20anonymization=20and=20hoist=20imports?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce a blind switch in arbiter formatting that defaults to anonymizing model names (blind_mode=true). When blind mode is disabled, model names are included in response labels. Hoist frequently used imports (litellm, asyncio, random, copy) to module scope and remove redundant local imports to reduce repeated imports and clarify code paths. Add debug logging for anonymization and include blind_mode in formatted output. --- src/rotator_library/ensemble/manager.py | 44 ++++++++++++++++--------- 1 file changed, 28 insertions(+), 16 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index ea9e1097..c529193f 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -6,10 +6,13 @@ import os import logging -import re -from pathlib import Path +import asyncio +import random +import copy from typing import Dict, List, Any, Optional, Set +import litellm + from .config_loader import ConfigLoader lib_logger = logging.getLogger("rotator_library.ensemble") @@ -238,7 +241,6 @@ def _prepare_drones( # Deep copy messages to avoid mutation if "messages" in drone_params: - import copy drone_params["messages"] = copy.deepcopy(drone_params["messages"]) # Phase 4: Determine if this drone should be adversarial @@ -265,9 +267,8 @@ def _prepare_drones( # Phase 4: Apply temperature jitter if enabled if jitter_enabled: base_temp = drone_params.get("temperature", 1.0) - + # Apply random jitter - import random jitter = random.uniform(-jitter_delta, jitter_delta) new_temp = base_temp + jitter @@ -316,12 +317,8 @@ async def _execute_parallel( Returns: Tuple of (successful_responses, aggregated_usage) """ - lib_logger.info(f"[HiveMind] Executing {len(drones)} drones in parallel...") - # Import litellm for API calls - import litellm - # Create tasks for all drones tasks = [] for i, drone_params in enumerate(drones): @@ -408,7 +405,7 @@ def _format_for_arbiter( Format drone responses for arbiter consumption. Creates a structured text format with numbered responses. - Blind switch and adversarial markers will be added in Phase 4. + Phase 4: Implements Blind Switch to strip model names. Args: responses: List of successful drone responses @@ -419,6 +416,10 @@ def _format_for_arbiter( """ lib_logger.debug(f"[HiveMind] Formatting {len(responses)} responses for arbiter") + # Check if blind mode is enabled + arbiter_config = config.get("arbiter", {}) + blind_mode = arbiter_config.get("blind", True) # Default ON + formatted_parts = [] for i, response in enumerate(responses): @@ -440,15 +441,29 @@ def _format_for_arbiter( ) continue - # Format: "Response N:\n\n" - formatted_parts.append(f"Response {response_num}:\n{content}\n") + # Phase 4: Blind Switch - determine label + if blind_mode: + # Strip model info, just use "Response N" + label = f"Response {response_num}" + lib_logger.debug( + f"[HiveMind] Blind mode: Response {response_num} anonymized" + ) + else: + # Include model name + model_name = "unknown" + if hasattr(response, 'model'): + model_name = response.model + label = f"Response {response_num} (Model: {model_name})" + + # Format: "Label:\n\n" + formatted_parts.append(f"{label}:\n{content}\n") # Join all responses formatted_text = "\n".join(formatted_parts) lib_logger.debug( f"[HiveMind] Formatted {len(formatted_parts)} responses " - f"({len(formatted_text)} characters total)" + f"({len(formatted_text)} characters total, blind_mode={blind_mode})" ) return formatted_text @@ -559,7 +574,6 @@ async def _call_arbiter( # Call arbiter through RotatingClient # Use _execute_with_retry for consistency - import litellm arbiter_response = await self.rotating_client._execute_with_retry( litellm.acompletion, request=request, @@ -623,9 +637,7 @@ async def _call_arbiter_streaming( "messages": messages, "stream": True # Enable streaming } - # Call arbiter through RotatingClient's streaming method - import litellm stream_generator = self.rotating_client._streaming_acompletion_with_retry( request=request, **arbiter_params From bc2672ac6cfe887e4b1c3008500f7a43e7879a8f Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 12:47:52 +0100 Subject: [PATCH 13/33] feat(ensemble): prepare specialist model configurations for fusion Add a private `_prepare_fusion_models` method in EnsembleManager to build per-specialist model configs for fusion execution. - Iterates configured specialists and skips entries missing a model. - Clones request params and deep-copies messages when present. - Injects role-specific system prompts as the first message when provided. - Attaches specialist metadata (`_specialist_index`, `_specialist_role`, `_specialist_weight`, `_total_specialists`) to each model config. - Emits debug and warning logs for visibility. This prepares a list of ready-to-run specialist model parameter sets for the fusion pipeline. --- src/rotator_library/ensemble/manager.py | 69 +++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index c529193f..5b5d9a46 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -299,6 +299,75 @@ def _prepare_drones( return drones + def _prepare_fusion_models( + self, + config: Dict[str, Any], + request_params: Dict[str, Any] + ) -> List[Dict[str, Any]]: + """ + Prepare specialist model configurations for fusion execution. + + Each specialist model gets a role-specific system prompt and + processes the same user query. + + Args: + config: Fusion configuration + request_params: Original request parameters + + Returns: + List of specialist model configurations + """ + specialists = config.get("specialists", []) + models = [] + + lib_logger.debug(f"[HiveMind] Preparing {len(specialists)} specialist models for fusion") + + for i, specialist in enumerate(specialists): + specialist_num = i + 1 + specialist_model = specialist.get("model") + specialist_role = specialist.get("role", f"Specialist {specialist_num}") + specialist_prompt = specialist.get("system_prompt", "") + specialist_weight = specialist.get("weight", 1.0) + + if not specialist_model: + lib_logger.warning( + f"[HiveMind] Specialist {specialist_num} missing model, skipping" + ) + continue + + # Clone request params + model_params = request_params.copy() + + # Set specialist model + model_params["model"] = specialist_model + + # Deep copy messages + if "messages" in model_params: + model_params["messages"] = copy.deepcopy(model_params["messages"]) + + # Inject role-specific system prompt if provided + if specialist_prompt and "messages" in model_params: + role_message = { + "role": "system", + "content": specialist_prompt + } + model_params["messages"].insert(0, role_message) + + # Store specialist metadata + model_params["_specialist_index"] = specialist_num + model_params["_specialist_role"] = specialist_role + model_params["_specialist_weight"] = specialist_weight + model_params["_total_specialists"] = len(specialists) + + models.append(model_params) + + lib_logger.debug( + f"[HiveMind] Specialist {specialist_num}/{len(specialists)}: " + f"role={specialist_role}, model={specialist_model}, weight={specialist_weight}" + ) + + return models + async def _execute_parallel( self, drones: List[Dict[str, Any]], From e86457a9bae6c424878e19064b4ac566b396cc95 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 12:50:27 +0100 Subject: [PATCH 14/33] feat(ensemble): add fusion phase 5 with specialist roles, arbiter routing, streaming, and usage aggregation Implement full Fusion (Phase 5) flow in EnsembleManager: - add optional specialist_metadata param to _format_for_arbiter and emit role-aware labels (role or "role (model)" depending on blind mode). - prepare and execute specialist models in parallel, format their outputs for an arbiter, and build arbiter messages. - support both streaming and non-streaming arbiter calls; stream yields intermediate chunks and aggregates usage when final chunk arrives. - aggregate usage metrics across specialists and arbiter (prompt_tokens, completion_tokens, total_tokens, plus optional cached_tokens/reasoning_tokens) and attach aggregated usage to returned/streamed responses. - add detailed logging for fusion lifecycle and usage totals. - update fusion config handling to load specialists and pass specialist metadata into formatting/execution. Also update example fusion config (dev-team.json): - rename "models" -> "specialists" - replace "system_prompt_append" with "system_prompt" - normalize "weight" to numeric values - set arbiter.blind to false by default with explanatory note BREAKING CHANGE: Fusion configuration schema changed and may break existing fusion configs and integrations. - Migration: - Rename top-level fusion arrays from "models" to "specialists". - Replace per-specialist key "system_prompt_append" with "system_prompt". - Convert "weight" values from descriptive strings to numeric weights (e.g., 1.0, 1.2). - Review arbiter.blind semantics (default now false); adjust configs if anonymous responses were previously required. - If any external callers invoked the old NotImplemented fusion path or relied on the previous _format_for_arbiter behavior, update calls to accommodate specialist_metadata and new labeling. --- src/rotator_library/ensemble/manager.py | 169 ++++++++++++++++-- .../ensemble_configs/fusions/dev-team.json | 18 +- 2 files changed, 162 insertions(+), 25 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 5b5d9a46..965a46a5 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -468,17 +468,20 @@ async def _execute_parallel( def _format_for_arbiter( self, responses: List[Any], - config: Dict[str, Any] + config: Dict[str, Any], + specialist_metadata: Optional[List[Dict[str, Any]]] = None ) -> str: """ - Format drone responses for arbiter consumption. + Format drone/specialist responses for arbiter consumption. Creates a structured text format with numbered responses. Phase 4: Implements Blind Switch to strip model names. + Phase 5: Adds role labels for fusion specialists. Args: - responses: List of successful drone responses + responses: List of successful drone/specialist responses config: Swarm or fusion configuration + specialist_metadata: Optional list of specialist metadata (for fusion mode) Returns: Formatted text string for arbiter @@ -510,19 +513,34 @@ def _format_for_arbiter( ) continue - # Phase 4: Blind Switch - determine label - if blind_mode: - # Strip model info, just use "Response N" - label = f"Response {response_num}" + # Phase 5: Determine label (with fusion role support) + label = f"Response {response_num}" + + # Check if this is fusion mode with specialist metadata + if specialist_metadata and i < len(specialist_metadata): + specialist = specialist_metadata[i] + role = specialist.get("_specialist_role", "Unknown") + + if blind_mode: + # Blind mode: show role but not model + label = f"{role}" + else: + # Non-blind: show role and model + model_name = specialist.get("model", "unknown") + label = f"{role} ({model_name})" + lib_logger.debug( - f"[HiveMind] Blind mode: Response {response_num} anonymized" + f"[HiveMind] Fusion specialist {response_num}: role={role}, blind={blind_mode}" ) else: - # Include model name - model_name = "unknown" - if hasattr(response, 'model'): - model_name = response.model - label = f"Response {response_num} (Model: {model_name})" + # Swarm mode fallback + if blind_mode: + label = f"Response {response_num}" + else: + model_name = "unknown" + if hasattr(response, 'model'): + model_name = response.model + label = f"Response {response_num} (Model: {model_name})" # Format: "Label:\n\n" formatted_parts.append(f"{label}:\n{content}\n") @@ -851,9 +869,128 @@ async def handle_request(self, request, **kwargs): # Determine type if resolved_id in self.config_loader.fusion_configs: - lib_logger.info(f"[HiveMind] Processing Fusion request: {resolved_id}") - # TODO: Implement fusion handling in Phase 5 - raise NotImplementedError("Fusion mode not yet implemented (Phase 5)") + config = self.config_loader.get_fusion_config(resolved_id) + specialists = config.get("specialists", []) + is_streaming = kwargs.get("stream", False) + + lib_logger.info( + f"[HiveMind] Processing Fusion request: {resolved_id} " + f"({len(specialists)} specialists, streaming: {is_streaming})" + ) + + # Phase 5: Fusion mode execution + # Prepare specialist models + specialist_models = self._prepare_fusion_models(config, kwargs) + + if not specialist_models: + raise ValueError(f"[HiveMind] No valid specialists found for fusion '{resolved_id}'") + + # Execute specialists in parallel + specialist_responses, specialist_usage = await self._execute_parallel( + specialist_models, request + ) + + # Format responses with role labels + formatted_responses = self._format_for_arbiter( + specialist_responses, + config, + specialist_metadata=specialist_models # Pass specialist metadata for role labels + ) + + # Build arbiter prompt + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages + ) + + # Get arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "gpt-4o") + + lib_logger.debug(f"[HiveMind] Using arbiter model: {arbiter_model}") + + # Update config with arbiter model + config_copy = config.copy() + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + + # Route based on streaming mode + if is_streaming: + # Streaming fusion (similar to swarm streaming) + arbiter_usage = {} + async for chunk in self._call_arbiter_streaming(arbiter_messages, config_copy, request): + if isinstance(chunk, dict) and "_hivemind_usage" in chunk: + arbiter_usage = chunk["_hivemind_usage"] + continue + + if hasattr(chunk, 'usage') and chunk.usage: + # Final chunk - aggregate usage + total_usage = { + 'prompt_tokens': specialist_usage['prompt_tokens'] + arbiter_usage.get('prompt_tokens', 0), + 'completion_tokens': specialist_usage['completion_tokens'] + arbiter_usage.get('completion_tokens', 0), + 'total_tokens': specialist_usage['total_tokens'] + arbiter_usage.get('total_tokens', 0) + } + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in specialist_usage or field in arbiter_usage: + total_usage[field] = specialist_usage.get(field, 0) + arbiter_usage.get(field, 0) + + chunk.usage.prompt_tokens = total_usage['prompt_tokens'] + chunk.usage.completion_tokens = total_usage['completion_tokens'] + chunk.usage.total_tokens = total_usage['total_tokens'] + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in total_usage: + setattr(chunk.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Fusion streaming completed. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage.get('total_tokens', 0)})" + ) + + yield chunk + + return # Generator exits + else: + # Non-streaming fusion + arbiter_response, arbiter_usage = await self._call_arbiter( + arbiter_messages, + config_copy, + request + ) + + # Aggregate usage + total_usage = { + 'prompt_tokens': specialist_usage['prompt_tokens'] + arbiter_usage['prompt_tokens'], + 'completion_tokens': specialist_usage['completion_tokens'] + arbiter_usage['completion_tokens'], + 'total_tokens': specialist_usage['total_tokens'] + arbiter_usage['total_tokens'] + } + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in specialist_usage or field in arbiter_usage: + total_usage[field] = specialist_usage.get(field, 0) + arbiter_usage.get(field, 0) + + # Update arbiter response with aggregated usage + if hasattr(arbiter_response, 'usage'): + arbiter_response.usage.prompt_tokens = total_usage['prompt_tokens'] + arbiter_response.usage.completion_tokens = total_usage['completion_tokens'] + arbiter_response.usage.total_tokens = total_usage['total_tokens'] + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in total_usage: + setattr(arbiter_response.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Fusion completed successfully. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']})" + ) + + return arbiter_response + elif self._is_swarm_request(resolved_id): base_model = self.get_base_model(resolved_id) diff --git a/src/rotator_library/ensemble_configs/fusions/dev-team.json b/src/rotator_library/ensemble_configs/fusions/dev-team.json index 9e1f3cac..c8329e1a 100644 --- a/src/rotator_library/ensemble_configs/fusions/dev-team.json +++ b/src/rotator_library/ensemble_configs/fusions/dev-team.json @@ -1,31 +1,31 @@ { "id": "dev-team", "description": "A team of specialized models for software development", - "models": [ + "specialists": [ { "model": "gpt-4o", "role": "Architect", - "system_prompt_append": "Focus on architectural patterns, scalability, and system design.", - "weight": "Expert in system design and scalability. Trust for architectural decisions and structural integrity." + "system_prompt": "You are a Software Architect. Focus on architectural patterns, scalability, and system design.", + "weight": 1.5 }, { "model": "claude-3-opus", "role": "Security Specialist", - "system_prompt_append": "Focus on security vulnerabilities, edge cases, and potential exploits.", - "weight": "Expert in security and vulnerability assessment. Trust for identifying security flaws and attack vectors." + "system_prompt": "You are a Security Expert. Focus on security vulnerabilities, edge cases, and potential exploits.", + "weight": 1.2 }, { "model": "gemini-1.5-pro", "role": "Code Reviewer", - "system_prompt_append": "Focus on code quality, performance, and best practices.", - "weight": "Expert in code quality and performance optimization. Trust for maintainability and efficiency concerns." + "system_prompt": "You are a Code Quality Expert. Focus on code quality, performance, and best practices.", + "weight": 1.0 } ], "arbiter": { "model": "gpt-4o", "strategy": "synthesis", - "blind": true, - "note": "Requires a reasoning-capable model for best results" + "blind": false, + "note": "Fusion mode typically uses blind=false to preserve role context" }, "recursive_mode": { "enabled": false, From 08edb0557507da4b13f1d50954a8c655ea9b65f5 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 13:00:01 +0100 Subject: [PATCH 15/33] =?UTF-8?q?docs(hivemind):=20=F0=9F=93=9A=20update?= =?UTF-8?q?=20HiveMind=20task=20checklist=20progress?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Update checklist in docs/HiveMind Task.md to reflect implemented work and current priorities. - Mark Phase 1 items (design, EnsembleManager creation, client integration, ensemble_config.json) as complete - Mark Phase 2 swarm features and associated tests as complete - Mark Phase 3 temperature jitter, adversarial mode, and blind switch as complete - Move Confidence Scoring to Recursive Mode and leave it pending --- docs/HiveMind Task.md | 68 +++++++++++++++++++++---------------------- 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/docs/HiveMind Task.md b/docs/HiveMind Task.md index e41d127f..53ce1df0 100644 --- a/docs/HiveMind Task.md +++ b/docs/HiveMind Task.md @@ -1,46 +1,46 @@ # HiveMind (Swarm/Fusion) Implementation ## Phase 1: Core Infrastructure -- [/] Design and Plan +- [x] Design and Plan - [x] Explore codebase - [x] Create comprehensive implementation plan -- [ ] Create `src/rotator_library/ensemble_manager.py` - - [ ] Define `EnsembleManager` class skeleton - - [ ] Implement config loading and validation - - [ ] Implement `is_ensemble()` detection - - [ ] Implement conflict resolution for naming -- [ ] Modify `src/rotator_library/client.py` - - [ ] Initialize `EnsembleManager` in `__init__` - - [ ] Integrate into `acompletion()` dispatcher - - [ ] Add logging for HiveMind operations -- [ ] Create `ensemble_config.json` - - [ ] Define schema for Fusions - - [ ] Define schema for Swarm defaults - - [ ] Define arbitration strategies +- [x] Create `src/rotator_library/ensemble_manager.py` + - [x] Define `EnsembleManager` class skeleton + - [x] Implement config loading and validation + - [x] Implement `is_ensemble()` detection + - [x] Implement conflict resolution for naming +- [x] Modify `src/rotator_library/client.py` + - [x] Initialize `EnsembleManager` in `__init__` + - [x] Integrate into `acompletion()` dispatcher + - [x] Add logging for HiveMind operations +- [x] Create `ensemble_config.json` + - [x] Define schema for Fusions + - [x] Define schema for Swarm defaults + - [x] Define arbitration strategies ## Phase 2: Basic Swarm Mode -- [ ] Implement Swarm Features - - [ ] `_prepare_drones()` - basic cloning - - [ ] `_execute_parallel()` - asyncio.gather - - [ ] `_format_for_arbiter()` - response aggregation - - [ ] `_build_arbiter_prompt()` - synthesis strategy - - [ ] `_call_arbiter()` - judge execution -- [ ] Testing - - [ ] Test basic 3-drone swarm - - [ ] Test arbiter synthesis - - [ ] Test partial failures +- [x] Implement Swarm Features + - [x] `_prepare_drones()` - basic cloning + - [x] `_execute_parallel()` - asyncio.gather + - [x] `_format_for_arbiter()` - response aggregation + - [x] `_build_arbiter_prompt()` - synthesis strategy + - [x] `_call_arbiter()` - judge execution +- [x] Testing + - [x] Test basic 3-drone swarm + - [x] Test arbiter synthesis + - [x] Test partial failures ## Phase 3: Advanced Swarm Features -- [ ] Temperature Jitter - - [ ] Implement jitter logic - - [ ] Test randomness and clamping -- [ ] Adversarial Mode - - [ ] Implement adversarial prompt injection - - [ ] Test with configurable count -- [ ] Blind Switch - - [ ] Implement response anonymization - - [ ] Test with blind=true/false -- [ ] Confidence Scoring +- [x] Temperature Jitter + - [x] Implement jitter logic + - [x] Test randomness and clamping +- [x] Adversarial Mode + - [x] Implement adversarial prompt injection + - [x] Test with configurable count +- [x] Blind Switch + - [x] Implement response anonymization + - [x] Test with blind=true/false +- [ ] Confidence Scoring (Moved to Recursive Mode) - [ ] Implement score extraction - [ ] Add logging for scores From d03d34dbcb74288674a54bebf5fc826ed1759b73 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 13:00:06 +0100 Subject: [PATCH 16/33] =?UTF-8?q?feat(ensemble):=20=E2=9C=A8=20add=20strea?= =?UTF-8?q?ming=20fusion=20handler=20and=20consolidate=20fusion=20routing?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - add async `_handle_fusion_streaming` to execute specialists in parallel, format responses for the arbiter, stream arbiter chunks, and aggregate usage (including `cached_tokens` and `reasoning_tokens`) - route streaming fusion in `handle_request` to the new handler and remove duplicated streaming logic - aggregate and inject total usage for non-streaming fusion responses and improve logging of fusion usage --- src/rotator_library/ensemble/manager.py | 217 +++++++++++++++--------- 1 file changed, 134 insertions(+), 83 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 965a46a5..b8f22330 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -846,6 +846,98 @@ async def _handle_swarm_streaming( yield chunk + async def _handle_fusion_streaming( + self, + config: Dict[str, Any], + request: Any, + **kwargs + ): + """ + Handle streaming fusion request. + + Executes specialists in parallel, then streams arbiter response. + Aggregates usage and injects into stream. + + Args: + config: Fusion configuration + request: Original request object + **kwargs: Request parameters + + Yields: + Arbiter response chunks with aggregated usage + """ + # Prepare specialist models + specialist_models = self._prepare_fusion_models(config, kwargs) + + if not specialist_models: + raise ValueError("[HiveMind] No valid specialists found for fusion") + + # Execute specialists in parallel + specialist_responses, specialist_usage = await self._execute_parallel( + specialist_models, request + ) + + # Format responses with role labels + formatted_responses = self._format_for_arbiter( + specialist_responses, + config, + specialist_metadata=specialist_models + ) + + # Build arbiter prompt + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages + ) + + # Get arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "gpt-4o") + + lib_logger.debug(f"[HiveMind] Using arbiter model: {arbiter_model}") + + # Update config + config_copy = config.copy() + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + + # Stream arbiter + arbiter_usage = {} + async for chunk in self._call_arbiter_streaming(arbiter_messages, config_copy, request): + if isinstance(chunk, dict) and "_hivemind_usage" in chunk: + arbiter_usage = chunk["_hivemind_usage"] + continue + + if hasattr(chunk, 'usage') and chunk.usage: + # Final chunk - aggregate usage + total_usage = { + 'prompt_tokens': specialist_usage['prompt_tokens'] + arbiter_usage.get('prompt_tokens', 0), + 'completion_tokens': specialist_usage['completion_tokens'] + arbiter_usage.get('completion_tokens', 0), + 'total_tokens': specialist_usage['total_tokens'] + arbiter_usage.get('total_tokens', 0) + } + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in specialist_usage or field in arbiter_usage: + total_usage[field] = specialist_usage.get(field, 0) + arbiter_usage.get(field, 0) + + chunk.usage.prompt_tokens = total_usage['prompt_tokens'] + chunk.usage.completion_tokens = total_usage['completion_tokens'] + chunk.usage.total_tokens = total_usage['total_tokens'] + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in total_usage: + setattr(chunk.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Fusion streaming completed. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage.get('total_tokens', 0)})" + ) + + yield chunk + async def handle_request(self, request, **kwargs): """ Handle an ensemble request (swarm or fusion). @@ -878,26 +970,31 @@ async def handle_request(self, request, **kwargs): f"({len(specialists)} specialists, streaming: {is_streaming})" ) - # Phase 5: Fusion mode execution - # Prepare specialist models + # Route based on streaming mode + if is_streaming: + # Streaming fusion + return self._handle_fusion_streaming( + config=config, + request=request, + **kwargs + ) + + # Non-streaming fusion specialist_models = self._prepare_fusion_models(config, kwargs) if not specialist_models: raise ValueError(f"[HiveMind] No valid specialists found for fusion '{resolved_id}'") - # Execute specialists in parallel specialist_responses, specialist_usage = await self._execute_parallel( specialist_models, request ) - # Format responses with role labels formatted_responses = self._format_for_arbiter( specialist_responses, config, - specialist_metadata=specialist_models # Pass specialist metadata for role labels + specialist_metadata=specialist_models ) - # Build arbiter prompt original_messages = kwargs.get("messages", []) arbiter_messages = self._build_arbiter_prompt( formatted_responses, @@ -905,92 +1002,46 @@ async def handle_request(self, request, **kwargs): original_messages ) - # Get arbiter model arbiter_config = config.get("arbiter", {}) arbiter_model = arbiter_config.get("model", "gpt-4o") - lib_logger.debug(f"[HiveMind] Using arbiter model: {arbiter_model}") - - # Update config with arbiter model config_copy = config.copy() config_copy["arbiter"] = arbiter_config.copy() config_copy["arbiter"]["model"] = arbiter_model - # Route based on streaming mode - if is_streaming: - # Streaming fusion (similar to swarm streaming) - arbiter_usage = {} - async for chunk in self._call_arbiter_streaming(arbiter_messages, config_copy, request): - if isinstance(chunk, dict) and "_hivemind_usage" in chunk: - arbiter_usage = chunk["_hivemind_usage"] - continue - - if hasattr(chunk, 'usage') and chunk.usage: - # Final chunk - aggregate usage - total_usage = { - 'prompt_tokens': specialist_usage['prompt_tokens'] + arbiter_usage.get('prompt_tokens', 0), - 'completion_tokens': specialist_usage['completion_tokens'] + arbiter_usage.get('completion_tokens', 0), - 'total_tokens': specialist_usage['total_tokens'] + arbiter_usage.get('total_tokens', 0) - } - - for field in ['cached_tokens', 'reasoning_tokens']: - if field in specialist_usage or field in arbiter_usage: - total_usage[field] = specialist_usage.get(field, 0) + arbiter_usage.get(field, 0) - - chunk.usage.prompt_tokens = total_usage['prompt_tokens'] - chunk.usage.completion_tokens = total_usage['completion_tokens'] - chunk.usage.total_tokens = total_usage['total_tokens'] - - for field in ['cached_tokens', 'reasoning_tokens']: - if field in total_usage: - setattr(chunk.usage, field, total_usage[field]) - - lib_logger.info( - f"[HiveMind] Fusion streaming completed. " - f"Total usage: {total_usage['total_tokens']} tokens " - f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage.get('total_tokens', 0)})" - ) - - yield chunk - - return # Generator exits - else: - # Non-streaming fusion - arbiter_response, arbiter_usage = await self._call_arbiter( - arbiter_messages, - config_copy, - request - ) - - # Aggregate usage - total_usage = { - 'prompt_tokens': specialist_usage['prompt_tokens'] + arbiter_usage['prompt_tokens'], - 'completion_tokens': specialist_usage['completion_tokens'] + arbiter_usage['completion_tokens'], - 'total_tokens': specialist_usage['total_tokens'] + arbiter_usage['total_tokens'] - } + arbiter_response, arbiter_usage = await self._call_arbiter( + arbiter_messages, + config_copy, + request + ) + + # Aggregate usage + total_usage = { + 'prompt_tokens': specialist_usage['prompt_tokens'] + arbiter_usage['prompt_tokens'], + 'completion_tokens': specialist_usage['completion_tokens'] + arbiter_usage['completion_tokens'], + 'total_tokens': specialist_usage['total_tokens'] + arbiter_usage['total_tokens'] + } + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in specialist_usage or field in arbiter_usage: + total_usage[field] = specialist_usage.get(field, 0) + arbiter_usage.get(field, 0) + + if hasattr(arbiter_response, 'usage'): + arbiter_response.usage.prompt_tokens = total_usage['prompt_tokens'] + arbiter_response.usage.completion_tokens = total_usage['completion_tokens'] + arbiter_response.usage.total_tokens = total_usage['total_tokens'] for field in ['cached_tokens', 'reasoning_tokens']: - if field in specialist_usage or field in arbiter_usage: - total_usage[field] = specialist_usage.get(field, 0) + arbiter_usage.get(field, 0) - - # Update arbiter response with aggregated usage - if hasattr(arbiter_response, 'usage'): - arbiter_response.usage.prompt_tokens = total_usage['prompt_tokens'] - arbiter_response.usage.completion_tokens = total_usage['completion_tokens'] - arbiter_response.usage.total_tokens = total_usage['total_tokens'] - - for field in ['cached_tokens', 'reasoning_tokens']: - if field in total_usage: - setattr(arbiter_response.usage, field, total_usage[field]) - - lib_logger.info( - f"[HiveMind] Fusion completed successfully. " - f"Total usage: {total_usage['total_tokens']} tokens " - f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']})" - ) - - return arbiter_response - + if field in total_usage: + setattr(arbiter_response.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Fusion completed successfully. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']})" + ) + + return arbiter_response elif self._is_swarm_request(resolved_id): base_model = self.get_base_model(resolved_id) From e41cfd22dd70cb5f94cf5fcc10c9263b2007c944 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 13:12:06 +0100 Subject: [PATCH 17/33] feat(ensemble): add recursive arbiter mode and filter internal reasoning Introduce Phase 6 recursive/autonomous protocol for the arbiter: - Append recursive instructions to the arbiter system prompt when config.recursive_mode.enabled is true. - Honor configurable consensus_threshold and include autonomous decision phases (assessment, decision, synthesis). - Stream handler now filters content wrapped in [INTERNAL]...[/INTERNAL], buffers internal reasoning, and only yields user-facing synthesis to callers. - Add _log_recursive_markers helper to parse and log consensus scores, conflicts, and critique reasoning; improve related logging and default strategy fallback. --- src/rotator_library/ensemble/manager.py | 197 +++++++++++++++++++----- 1 file changed, 159 insertions(+), 38 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index b8f22330..f3842099 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -562,63 +562,103 @@ def _build_arbiter_prompt( original_messages: List[Dict[str, str]] ) -> List[Dict[str, str]]: """ - Build complete messages array for arbiter. + Build the complete prompt for the arbiter model. - Loads strategy template and constructs system prompt + user message. - Recursive mode and role context will be added in later phases. + Loads the strategy template and constructs the message array. + Phase 6: Adds recursive mode instructions for autonomous decision-making. Args: - formatted_responses: Formatted drone responses + formatted_responses: Formatted drone/specialist responses config: Swarm or fusion configuration original_messages: Original user messages Returns: - Complete messages array for arbiter call + Complete messages array for arbiter """ - # Get arbiter config + lib_logger.debug("[HiveMind] Building arbiter prompt") + + # Get strategy template arbiter_config = config.get("arbiter", {}) strategy_name = arbiter_config.get("strategy", "synthesis") - lib_logger.debug(f"[HiveMind] Building arbiter prompt with strategy '{strategy_name}'") - - # Load strategy template strategy_template = self.config_loader.get_strategy(strategy_name) + if not strategy_template: lib_logger.warning( f"[HiveMind] Strategy '{strategy_name}' not found, using default" ) - strategy_template = "Analyze the following responses and create a single, superior answer:\n\n{responses}" + strategy_template = "Synthesize the following responses into a single, high-quality answer:\n{responses}" # Replace {responses} placeholder strategy_prompt = strategy_template.replace("{responses}", formatted_responses) + # Phase 6: Add recursive mode instructions if enabled + recursive_config = config.get("recursive_mode", {}) + if recursive_config.get("enabled", False): + consensus_threshold = recursive_config.get("consensus_threshold", 7) + + recursive_instructions = f""" + +AUTONOMOUS DECISION PROTOCOL: +You have autonomous decision-making authority. Follow this protocol: + +1. ASSESSMENT PHASE: + - Analyze the provided responses + - Rate consensus level (1-10 scale) + - Output: [CONSENSUS: X/10] + +2. DECISION PHASE: + If consensus >= {consensus_threshold}/10: + - Proceed directly to synthesis + + If consensus < {consensus_threshold}/10: + - Identify specific conflict points + - Output: [CONFLICTS: ] + - For each response, reason internally about how it addresses the conflicts + - Output: [CRITIQUE: ] + +3. SYNTHESIS PHASE: + - Create final answer incorporating all insights + - Output: [FINAL SYNTHESIS:] + - Provide your complete response after this marker + +IMPORTANT: Wrap all internal reasoning (CONSENSUS, CONFLICTS, CRITIQUE) in [INTERNAL] tags. +Only the content after [FINAL SYNTHESIS:] will be shown to the user. + +Example format: +[INTERNAL] +[CONSENSUS: 5/10] +[CONFLICTS: Response 1 suggests X, Response 2 suggests Y] +[CRITIQUE: Analyzing the conflict...] +[/INTERNAL] +[FINAL SYNTHESIS:] + +""" + strategy_prompt += recursive_instructions + lib_logger.info( + f"[HiveMind] Recursive mode enabled (consensus threshold: {consensus_threshold}/10)" + ) + # Build messages array - messages = [] - - # System message with strategy - messages.append({ - "role": "system", - "content": strategy_prompt - }) - - # Include original user query - # Find the last user message from original - user_content = "" - for msg in reversed(original_messages): - if msg.get("role") == "user": - user_content = msg.get("content", "") - break - - if user_content: - messages.append({ - "role": "user", - "content": f"Original query: {user_content}" - }) + messages = [ + { + "role": "system", + "content": strategy_prompt + } + ] - lib_logger.debug( - f"[HiveMind] Arbiter prompt constructed: {len(messages)} messages, " - f"{len(strategy_prompt)} chars in system prompt" - ) + # Add original user query + if original_messages: + # Find the last user message + for msg in reversed(original_messages): + if msg.get("role") == "user": + messages.append({ + "role": "user", + "content": msg.get("content", "") + }) + break + + lib_logger.debug(f"[HiveMind] Arbiter prompt built: {len(messages)} messages") return messages @@ -701,7 +741,7 @@ async def _call_arbiter_streaming( Call the arbiter model with streaming enabled. Yields arbiter response chunks while tracking usage. - Usage aggregation happens at the end of the stream. + Phase 6: Filters [INTERNAL] markers for recursive mode. Args: messages: Constructed arbiter messages @@ -709,7 +749,7 @@ async def _call_arbiter_streaming( request: Original request object Yields: - Response chunks from arbiter (for Phase 3) + Response chunks from arbiter Final yield includes usage metadata """ # Get arbiter model @@ -737,6 +777,11 @@ async def _call_arbiter_streaming( 'total_tokens': 0 } + # Phase 6: Track recursive mode state + recursive_enabled = config.get("recursive_mode", {}).get("enabled", False) + in_internal_block = False + internal_buffer = [] + # Stream chunks and collect usage async for chunk in stream_generator: # Check if this chunk has usage info (typically the last chunk) @@ -751,7 +796,43 @@ async def _call_arbiter_streaming( if hasattr(usage, field): arbiter_usage[field] = getattr(usage, field, 0) - # Yield the chunk to caller + # Phase 6: Filter [INTERNAL] markers if recursive mode + if recursive_enabled and hasattr(chunk, 'choices') and chunk.choices: + delta = chunk.choices[0].delta if hasattr(chunk.choices[0], 'delta') else None + if delta and hasattr(delta, 'content') and delta.content: + content = delta.content + + # Check for [INTERNAL] start + if '[INTERNAL]' in content: + in_internal_block = True + # Split and yield only content before [INTERNAL] + before_internal = content.split('[INTERNAL]')[0] + if before_internal: + chunk.choices[0].delta.content = before_internal + yield chunk + continue + + # Check for [/INTERNAL] end + if '[/INTERNAL]' in content: + in_internal_block = False + # Process internal buffer for logging + full_internal = ''.join(internal_buffer) + self._log_recursive_markers(full_internal, config) + internal_buffer = [] + + # Yield any content after [/INTERNAL] + after_internal = content.split('[/INTERNAL]', 1)[1] if len(content.split('[/INTERNAL]', 1)) > 1 else '' + if after_internal: + chunk.choices[0].delta.content = after_internal + yield chunk + continue + + # If inside internal block, buffer it + if in_internal_block: + internal_buffer.append(content) + continue + + # Yield the chunk to caller (normal flow or filtered) yield chunk lib_logger.info( @@ -762,6 +843,46 @@ async def _call_arbiter_streaming( # Caller will handle usage aggregation yield {"_hivemind_usage": arbiter_usage} + def _log_recursive_markers(self, internal_content: str, config: Dict[str, Any]): + """ + Parse and log recursive mode markers from internal reasoning. + + Phase 6: Extracts consensus scores, conflicts, and critique reasoning. + + Args: + internal_content: Content between [INTERNAL] tags + config: Configuration with recursive threshold + """ + import re + + # Extract consensus score + consensus_match = re.search(r'\[CONSENSUS:\s*(\d+)/10\]', internal_content) + if consensus_match: + consensus_score = int(consensus_match.group(1)) + threshold = config.get("recursive_mode", {}).get("consensus_threshold", 7) + + if consensus_score < threshold: + lib_logger.warning( + f"[HiveMind] Recursive mode: Consensus {consensus_score}/10 " + f"(below threshold {threshold}/10) - arbiter performing critique" + ) + else: + lib_logger.info( + f"[HiveMind] Recursive mode: Consensus {consensus_score}/10 " + f"(>= threshold {threshold}/10) - proceeding to synthesis" + ) + + # Extract conflicts if present + conflicts_match = re.search(r'\[CONFLICTS:\s*([^\]]+)\]', internal_content) + if conflicts_match: + conflicts = conflicts_match.group(1).strip() + lib_logger.info(f"[HiveMind] Conflicts identified: {conflicts}") + + # Log that critique is happening + if '[CRITIQUE:' in internal_content: + lib_logger.debug("[HiveMind] Arbiter performing internal critique reasoning") + + async def _handle_swarm_streaming( self, config: Dict[str, Any], From 0856dc09b1168ed799f72030d78bbe8927a0b839 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 13:47:18 +0100 Subject: [PATCH 18/33] =?UTF-8?q?fix(ensemble):=20=F0=9F=90=9B=20use=20dee?= =?UTF-8?q?pcopy,=20load=20provider=20models,=20and=20robustly=20handle=20?= =?UTF-8?q?[INTERNAL]=20markers?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes https://github.com/Mirrowel/LLM-API-Key-Proxy/pull/8#issuecomment-3552440626 - Use copy.deepcopy for swarm defaults, request cloning, and config copies to avoid shared mutable state and accidental cross-request mutations. - Initialize and populate provider model cache from rotating_client.model_definitions (and call loader on init/lazily) to prevent provider model shadowing when detecting ensembles. - Improve streaming handling of `[INTERNAL]...[/INTERNAL]` by buffering internal segments, logging internal content via _log_recursive_markers, and yielding surrounding content correctly. - Move/remove redundant local regex import and add explicit re import at module level for consistency. --- src/rotator_library/ensemble/config_loader.py | 5 +- src/rotator_library/ensemble/manager.py | 109 ++++++++++++------ 2 files changed, 78 insertions(+), 36 deletions(-) diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py index 6453fe97..0c912eda 100644 --- a/src/rotator_library/ensemble/config_loader.py +++ b/src/rotator_library/ensemble/config_loader.py @@ -7,6 +7,7 @@ import os import json import logging +import copy from pathlib import Path from typing import Dict, List, Any, Optional @@ -163,8 +164,8 @@ def get_swarm_config(self, model: str) -> Dict[str, Any]: Returns: Merged configuration dictionary """ - # Start with default - config = self.swarm_default.copy() if self.swarm_default else {} + # BUGFIX: Use deepcopy to prevent mutations to global default config + config = copy.deepcopy(self.swarm_default) if self.swarm_default else {} # Apply model-specific overrides if model in self.swarm_configs: diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index f3842099..c3614106 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -9,6 +9,7 @@ import asyncio import random import copy +import re from typing import Dict, List, Any, Optional, Set import litellm @@ -57,6 +58,9 @@ def __init__(self, rotating_client, config_dir: Optional[str] = None): # Cache for provider models (loaded from RotatingClient) self._provider_models: Optional[Set[str]] = None + # Initialize provider models + self._load_provider_models() + lib_logger.info("[HiveMind] EnsembleManager initialized") def is_ensemble(self, model_id: str) -> bool: @@ -69,6 +73,15 @@ def is_ensemble(self, model_id: str) -> bool: Returns: True if this is an ensemble (swarm or fusion), False otherwise """ + # BUGFIX: Check for conflict first (Provider Model Shadowing) + # If the model ID exists in provider models, it's NOT an ensemble request + # (unless we've already resolved it, but this check is for the raw request) + if self._provider_models is None: + self._load_provider_models() + + if model_id in self._provider_models: + return False + # Check for fusion ID (exact match) if model_id in self.config_loader.fusion_configs: return True @@ -172,14 +185,17 @@ def _load_provider_models(self) -> None: This is used for conflict detection. """ try: - # Get all available models (this might be async in the actual implementation) - # For now, we'll use a synchronous approach - # TODO: Handle async model loading properly self._provider_models = set() - # Note: This will be implemented properly when we integrate with RotatingClient - # For now, just initialize an empty set - lib_logger.debug("[HiveMind] Provider models cache initialized (empty)") + # BUGFIX: Populate provider models from RotatingClient.model_definitions + if hasattr(self.rotating_client, 'model_definitions'): + defs = self.rotating_client.model_definitions.definitions + for provider, models in defs.items(): + for model_name in models.keys(): + self._provider_models.add(model_name) + self._provider_models.add(f"{provider}/{model_name}") + + lib_logger.debug(f"[HiveMind] Loaded {len(self._provider_models)} provider models for conflict detection") except Exception as e: lib_logger.error(f"[HiveMind] Failed to load provider models: {e}") @@ -234,15 +250,12 @@ def _prepare_drones( for i in range(count): # Clone the request params - drone_params = request_params.copy() + # BUGFIX: Use deepcopy to avoid shared mutable state + drone_params = copy.deepcopy(request_params) # Override model with base model (strip [swarm] suffix) drone_params["model"] = base_model - # Deep copy messages to avoid mutation - if "messages" in drone_params: - drone_params["messages"] = copy.deepcopy(drone_params["messages"]) - # Phase 4: Determine if this drone should be adversarial # Last N drones become adversarial is_adversarial = False @@ -336,15 +349,12 @@ def _prepare_fusion_models( continue # Clone request params - model_params = request_params.copy() + # BUGFIX: Use deepcopy + model_params = copy.deepcopy(request_params) # Set specialist model model_params["model"] = specialist_model - # Deep copy messages - if "messages" in model_params: - model_params["messages"] = copy.deepcopy(model_params["messages"]) - # Inject role-specific system prompt if provided if specialist_prompt and "messages" in model_params: role_message = { @@ -796,32 +806,61 @@ async def _call_arbiter_streaming( if hasattr(usage, field): arbiter_usage[field] = getattr(usage, field, 0) - # Phase 6: Filter [INTERNAL] markers if recursive mode + # BUGFIX: Robust handling of [INTERNAL] markers to prevent data loss if recursive_enabled and hasattr(chunk, 'choices') and chunk.choices: delta = chunk.choices[0].delta if hasattr(chunk.choices[0], 'delta') else None if delta and hasattr(delta, 'content') and delta.content: content = delta.content - # Check for [INTERNAL] start + # Handle [INTERNAL] start if '[INTERNAL]' in content: - in_internal_block = True - # Split and yield only content before [INTERNAL] - before_internal = content.split('[INTERNAL]')[0] + parts = content.split('[INTERNAL]') + before_internal = parts[0] + + # Yield content before marker if before_internal: chunk.choices[0].delta.content = before_internal yield chunk - continue + + in_internal_block = True + + # Handle content after marker (start of internal) + if len(parts) > 1: + remaining = parts[1] + # Check if it also ends in this chunk + if '[/INTERNAL]' in remaining: + internal_parts = remaining.split('[/INTERNAL]') + internal_buffer.append(internal_parts[0]) + + # Process buffer + full_internal = ''.join(internal_buffer) + self._log_recursive_markers(full_internal, config) + internal_buffer = [] + in_internal_block = False + + # Yield content after [/INTERNAL] + after_internal = internal_parts[1] + if after_internal: + chunk.choices[0].delta.content = after_internal + yield chunk + else: + internal_buffer.append(remaining) + + continue # Done with this chunk - # Check for [/INTERNAL] end - if '[/INTERNAL]' in content: - in_internal_block = False - # Process internal buffer for logging + # Handle [/INTERNAL] end (if we are in block) + if in_internal_block and '[/INTERNAL]' in content: + parts = content.split('[/INTERNAL]') + internal_buffer.append(parts[0]) + + # Process buffer full_internal = ''.join(internal_buffer) self._log_recursive_markers(full_internal, config) internal_buffer = [] + in_internal_block = False - # Yield any content after [/INTERNAL] - after_internal = content.split('[/INTERNAL]', 1)[1] if len(content.split('[/INTERNAL]', 1)) > 1 else '' + # Yield content after marker + after_internal = parts[1] if after_internal: chunk.choices[0].delta.content = after_internal yield chunk @@ -853,7 +892,6 @@ def _log_recursive_markers(self, internal_content: str, config: Dict[str, Any]): internal_content: Content between [INTERNAL] tags config: Configuration with recursive threshold """ - import re # Extract consensus score consensus_match = re.search(r'\[CONSENSUS:\s*(\d+)/10\]', internal_content) @@ -924,7 +962,8 @@ async def _handle_swarm_streaming( arbiter_model = base_model lib_logger.debug(f"[HiveMind] Using self-arbiter: {arbiter_model}") - config_copy = config.copy() + # BUGFIX: Use deepcopy for config + config_copy = copy.deepcopy(config) config_copy["arbiter"] = arbiter_config.copy() config_copy["arbiter"]["model"] = arbiter_model @@ -1020,7 +1059,8 @@ async def _handle_fusion_streaming( lib_logger.debug(f"[HiveMind] Using arbiter model: {arbiter_model}") # Update config - config_copy = config.copy() + # BUGFIX: Use deepcopy + config_copy = copy.deepcopy(config) config_copy["arbiter"] = arbiter_config.copy() config_copy["arbiter"]["model"] = arbiter_model @@ -1126,7 +1166,8 @@ async def handle_request(self, request, **kwargs): arbiter_config = config.get("arbiter", {}) arbiter_model = arbiter_config.get("model", "gpt-4o") - config_copy = config.copy() + # BUGFIX: Use deepcopy + config_copy = copy.deepcopy(config) config_copy["arbiter"] = arbiter_config.copy() config_copy["arbiter"]["model"] = arbiter_model @@ -1211,7 +1252,8 @@ async def handle_request(self, request, **kwargs): lib_logger.debug(f"[HiveMind] Using self-arbiter: {arbiter_model}") # Update config with resolved arbiter model - config_copy = config.copy() + # BUGFIX: Use deepcopy + config_copy = copy.deepcopy(config) config_copy["arbiter"] = arbiter_config.copy() config_copy["arbiter"]["model"] = arbiter_model @@ -1255,4 +1297,3 @@ async def handle_request(self, request, **kwargs): else: raise ValueError(f"Unknown ensemble type for model: {model_id}") - From 5da1db4352eddd3a7b558873f32d36262b103478 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 15:22:20 +0100 Subject: [PATCH 19/33] feat(ensemble): dynamically aggregate usage and add cost/latency tracking Replace fixed token summation with dynamic aggregation of all numeric usage fields across drone, specialist, and arbiter flows. Track execution start/end times, compute latency (ms), and attempt cost calculation via litellm. Attach a supplementary `hivemind_details` breakdown to `arbiter_response.usage` containing mode, counts, token breakdown, rounded cost, and latency. Iterate safely over usage attributes (skip private/magic, non-numeric, and handle set/get errors) and log latency/cost info. Also add a time import and defensive exception handling for cost calculation. --- src/rotator_library/ensemble/manager.py | 192 ++++++++++++++++-------- 1 file changed, 131 insertions(+), 61 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index c3614106..02f932fd 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -9,6 +9,7 @@ import asyncio import random import copy +import time import re from typing import Dict, List, Any, Optional, Set @@ -418,11 +419,7 @@ async def _execute_parallel( # Process results successful_responses = [] failed_count = 0 - aggregated_usage = { - 'prompt_tokens': 0, - 'completion_tokens': 0, - 'total_tokens': 0 - } + aggregated_usage = {} for i, result in enumerate(results): drone_index = i + 1 @@ -438,19 +435,27 @@ async def _execute_parallel( # Drone succeeded successful_responses.append(result) - # Aggregate usage + # Aggregate usage - dynamically sum ALL numeric usage fields if hasattr(result, 'usage') and result.usage: usage = result.usage - aggregated_usage['prompt_tokens'] += getattr(usage, 'prompt_tokens', 0) - aggregated_usage['completion_tokens'] += getattr(usage, 'completion_tokens', 0) - aggregated_usage['total_tokens'] += getattr(usage, 'total_tokens', 0) - # Include other usage fields if present - for field in ['cached_tokens', 'reasoning_tokens']: - if hasattr(usage, field): - if field not in aggregated_usage: - aggregated_usage[field] = 0 - aggregated_usage[field] += getattr(usage, field, 0) + # Iterate through all attributes of the usage object + for attr_name in dir(usage): + # Skip private/magic attributes + if attr_name.startswith('_'): + continue + + try: + attr_value = getattr(usage, attr_name) + + # Only aggregate numeric fields (int or float) + if isinstance(attr_value, (int, float)) and not isinstance(attr_value, bool): + if attr_name not in aggregated_usage: + aggregated_usage[attr_name] = 0 + aggregated_usage[attr_name] += attr_value + except (AttributeError, TypeError): + # Skip non-accessible or non-numeric attributes + continue lib_logger.debug( f"[HiveMind] Drone {drone_index}/{len(drones)} completed successfully" @@ -717,23 +722,27 @@ async def _call_arbiter( **arbiter_params ) - # Extract usage - arbiter_usage = { - 'prompt_tokens': 0, - 'completion_tokens': 0, - 'total_tokens': 0 - } + # Extract usage - dynamically capture ALL numeric usage fields + arbiter_usage = {} if hasattr(arbiter_response, 'usage') and arbiter_response.usage: usage = arbiter_response.usage - arbiter_usage['prompt_tokens'] = getattr(usage, 'prompt_tokens', 0) - arbiter_usage['completion_tokens'] = getattr(usage, 'completion_tokens', 0) - arbiter_usage['total_tokens'] = getattr(usage, 'total_tokens', 0) - # Include other fields - for field in ['cached_tokens', 'reasoning_tokens']: - if hasattr(usage, field): - arbiter_usage[field] = getattr(usage, field, 0) + # Iterate through all attributes of the usage object + for attr_name in dir(usage): + # Skip private/magic attributes + if attr_name.startswith('_'): + continue + + try: + attr_value = getattr(usage, attr_name) + + # Only capture numeric fields (int or float) + if isinstance(attr_value, (int, float)) and not isinstance(attr_value, bool): + arbiter_usage[attr_name] = attr_value + except (AttributeError, TypeError): + # Skip non-accessible or non-numeric attributes + continue lib_logger.info( f"[HiveMind] Arbiter completed. Tokens: {arbiter_usage['total_tokens']}" @@ -1126,6 +1135,9 @@ async def handle_request(self, request, **kwargs): specialists = config.get("specialists", []) is_streaming = kwargs.get("stream", False) + # Phase 6: Track execution start time + start_time = time.time() + lib_logger.info( f"[HiveMind] Processing Fusion request: {resolved_id} " f"({len(specialists)} specialists, streaming: {is_streaming})" @@ -1177,30 +1189,59 @@ async def handle_request(self, request, **kwargs): request ) - # Aggregate usage - total_usage = { - 'prompt_tokens': specialist_usage['prompt_tokens'] + arbiter_usage['prompt_tokens'], - 'completion_tokens': specialist_usage['completion_tokens'] + arbiter_usage['completion_tokens'], - 'total_tokens': specialist_usage['total_tokens'] + arbiter_usage['total_tokens'] + # Aggregate usage - dynamically sum ALL numeric fields from both sources + total_usage = {} + + # Helper function to merge usage dictionaries + for usage_dict in [specialist_usage, arbiter_usage]: + for field, value in usage_dict.items(): + if field not in total_usage: + total_usage[field] = 0 + total_usage[field] += value + + # Phase 6: Calculate latency and cost + end_time = time.time() + latency_ms = (end_time - start_time) * 1000 + + # Try to calculate cost using litellm + total_cost = 0.0 + try: + total_cost = litellm.completion_cost(completion_response=arbiter_response) + except Exception as e: + lib_logger.debug(f"[HiveMind] Could not calculate cost: {e}") + + # Add hivemind_details to usage + hivemind_details = { + "mode": "fusion", + "specialist_count": len(specialists), + "specialist_tokens": specialist_usage['total_tokens'], + "arbiter_tokens": arbiter_usage['total_tokens'], + "total_cost_usd": round(total_cost, 6), + "latency_ms": round(latency_ms, 2) } - for field in ['cached_tokens', 'reasoning_tokens']: - if field in specialist_usage or field in arbiter_usage: - total_usage[field] = specialist_usage.get(field, 0) + arbiter_usage.get(field, 0) if hasattr(arbiter_response, 'usage'): - arbiter_response.usage.prompt_tokens = total_usage['prompt_tokens'] - arbiter_response.usage.completion_tokens = total_usage['completion_tokens'] - arbiter_response.usage.total_tokens = total_usage['total_tokens'] + # IMPORTANT: Standard usage fields contain the TOTAL aggregated usage + # (specialists + arbiter). This ensures consumers can parse usage normally. - for field in ['cached_tokens', 'reasoning_tokens']: - if field in total_usage: - setattr(arbiter_response.usage, field, total_usage[field]) + # Dynamically set ALL usage fields from total_usage + for field, value in total_usage.items(): + try: + setattr(arbiter_response.usage, field, value) + except (AttributeError, TypeError): + # Skip if field cannot be set + lib_logger.debug(f"[HiveMind] Could not set usage field '{field}'") + + # Add hivemind_details as SUPPLEMENTARY breakdown information + # This does NOT replace standard fields, but provides additional context + arbiter_response.usage.hivemind_details = hivemind_details lib_logger.info( f"[HiveMind] Fusion completed successfully. " f"Total usage: {total_usage['total_tokens']} tokens " - f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']})" + f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']}). " + f"Latency: {latency_ms:.2f}ms, Cost: ${total_cost:.6f}" ) return arbiter_response @@ -1211,6 +1252,9 @@ async def handle_request(self, request, **kwargs): count = config.get("count", 3) is_streaming = kwargs.get("stream", False) + # Phase 6: Track execution start time + start_time = time.time() + lib_logger.info( f"[HiveMind] Processing Swarm request: {resolved_id} " f"(base: {base_model}, {count} drones, streaming: {is_streaming})" @@ -1264,33 +1308,59 @@ async def handle_request(self, request, **kwargs): request ) - # Step 7: Aggregate total usage - total_usage = { - 'prompt_tokens': drone_usage['prompt_tokens'] + arbiter_usage['prompt_tokens'], - 'completion_tokens': drone_usage['completion_tokens'] + arbiter_usage['completion_tokens'], - 'total_tokens': drone_usage['total_tokens'] + arbiter_usage['total_tokens'] - } + # Step 7: Aggregate total usage - dynamically sum ALL numeric fields from both sources + total_usage = {} - # Include other fields if present - for field in ['cached_tokens', 'reasoning_tokens']: - if field in drone_usage or field in arbiter_usage: - total_usage[field] = drone_usage.get(field, 0) + arbiter_usage.get(field, 0) + # Helper function to merge usage dictionaries + for usage_dict in [drone_usage, arbiter_usage]: + for field, value in usage_dict.items(): + if field not in total_usage: + total_usage[field] = 0 + total_usage[field] += value + + # Phase 6: Calculate latency and cost + end_time = time.time() + latency_ms = (end_time - start_time) * 1000 + + # Try to calculate cost using litellm + total_cost = 0.0 + try: + total_cost = litellm.completion_cost(completion_response=arbiter_response) + except Exception as e: + lib_logger.debug(f"[HiveMind] Could not calculate cost: {e}") + + # Add hivemind_details to usage + hivemind_details = { + "mode": "swarm", + "drone_count": count, + "drone_tokens": drone_usage['total_tokens'], + "arbiter_tokens": arbiter_usage['total_tokens'], + "total_cost_usd": round(total_cost, 6), + "latency_ms": round(latency_ms, 2) + } # Step 8: Update arbiter response with aggregated usage if hasattr(arbiter_response, 'usage'): - # Create a new usage object with aggregated values - arbiter_response.usage.prompt_tokens = total_usage['prompt_tokens'] - arbiter_response.usage.completion_tokens = total_usage['completion_tokens'] - arbiter_response.usage.total_tokens = total_usage['total_tokens'] + # IMPORTANT: Standard usage fields contain the TOTAL aggregated usage + # (drones + arbiter). This ensures consumers can parse usage normally. + + # Dynamically set ALL usage fields from total_usage + for field, value in total_usage.items(): + try: + setattr(arbiter_response.usage, field, value) + except (AttributeError, TypeError): + # Skip if field cannot be set + lib_logger.debug(f"[HiveMind] Could not set usage field '{field}'") - for field in ['cached_tokens', 'reasoning_tokens']: - if field in total_usage: - setattr(arbiter_response.usage, field, total_usage[field]) + # Add hivemind_details as SUPPLEMENTARY breakdown information + # This does NOT replace standard fields, but provides additional context + arbiter_response.usage.hivemind_details = hivemind_details lib_logger.info( f"[HiveMind] Swarm completed successfully. " f"Total usage: {total_usage['total_tokens']} tokens " - f"(Drones: {drone_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']})" + f"(Drones: {drone_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']}). " + f"Latency: {latency_ms:.2f}ms, Cost: ${total_cost:.6f}" ) return arbiter_response From 865f7cf0b2cbd19bd0c84e8618ceaa48cf65e551 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 15:29:28 +0100 Subject: [PATCH 20/33] feat(ensemble): add specialist weight descriptions and embed expertise context for arbiter Add support for specialist weight descriptions and include specialist expertise context when building arbiter prompts. - Extract `weight_description` from specialist configs and attach it to model params as `_specialist_weight_description`. - Extend `EnsembleManager.build_arbiter_messages` to accept optional `specialist_metadata` and append a "SPECIALIST EXPERTISE" block to the strategy prompt that lists each specialist's role, model, and weight description; log debug info when added. - Update dev-team fusion config to include `weight_description` entries for existing specialists. No breaking changes. --- src/rotator_library/ensemble/manager.py | 30 +++++++++++++++++-- .../ensemble_configs/fusions/dev-team.json | 9 ++++-- 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 02f932fd..a8d4f99b 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -329,7 +329,7 @@ def _prepare_fusion_models( request_params: Original request parameters Returns: - List of specialist model configurations + List of specialist model configurations with metadata """ specialists = config.get("specialists", []) models = [] @@ -342,6 +342,8 @@ def _prepare_fusion_models( specialist_role = specialist.get("role", f"Specialist {specialist_num}") specialist_prompt = specialist.get("system_prompt", "") specialist_weight = specialist.get("weight", 1.0) + # MISSING FEATURE FIX: Extract weight description for arbiter context + specialist_weight_desc = specialist.get("weight_description", "") if not specialist_model: lib_logger.warning( @@ -368,6 +370,7 @@ def _prepare_fusion_models( model_params["_specialist_index"] = specialist_num model_params["_specialist_role"] = specialist_role model_params["_specialist_weight"] = specialist_weight + model_params["_specialist_weight_description"] = specialist_weight_desc model_params["_total_specialists"] = len(specialists) models.append(model_params) @@ -574,18 +577,21 @@ def _build_arbiter_prompt( self, formatted_responses: str, config: Dict[str, Any], - original_messages: List[Dict[str, str]] + original_messages: List[Dict[str, str]], + specialist_metadata: Optional[List[Dict[str, Any]]] = None ) -> List[Dict[str, str]]: """ Build the complete prompt for the arbiter model. Loads the strategy template and constructs the message array. Phase 6: Adds recursive mode instructions for autonomous decision-making. + MISSING FEATURE FIX: Adds specialist expertise context with weights for fusion mode. Args: formatted_responses: Formatted drone/specialist responses config: Swarm or fusion configuration original_messages: Original user messages + specialist_metadata: Optional metadata about specialists (for fusion mode) Returns: Complete messages array for arbiter @@ -607,6 +613,26 @@ def _build_arbiter_prompt( # Replace {responses} placeholder strategy_prompt = strategy_template.replace("{responses}", formatted_responses) + # MISSING FEATURE FIX: Add specialist expertise context for fusion mode + if specialist_metadata: + expertise_lines = ["\n\nSPECIALIST EXPERTISE:"] + expertise_lines.append("You are synthesizing responses from specialists with the following expertise:\n") + + for spec in specialist_metadata: + role = spec.get('role', 'Unknown') + model = spec.get('model', 'Unknown') + weight_desc = spec.get('weight_description', '') + + if weight_desc: + expertise_lines.append(f"- {role} ({model}): {weight_desc}") + else: + expertise_lines.append(f"- {role} ({model}): Subject matter expert") + + expertise_lines.append("\nConsider each specialist's domain expertise when synthesizing your response.") + strategy_prompt += "\n".join(expertise_lines) + + lib_logger.debug(f"[HiveMind] Added specialist expertise context for {len(specialist_metadata)} specialists") + # Phase 6: Add recursive mode instructions if enabled recursive_config = config.get("recursive_mode", {}) if recursive_config.get("enabled", False): diff --git a/src/rotator_library/ensemble_configs/fusions/dev-team.json b/src/rotator_library/ensemble_configs/fusions/dev-team.json index c8329e1a..df1a3e99 100644 --- a/src/rotator_library/ensemble_configs/fusions/dev-team.json +++ b/src/rotator_library/ensemble_configs/fusions/dev-team.json @@ -6,19 +6,22 @@ "model": "gpt-4o", "role": "Architect", "system_prompt": "You are a Software Architect. Focus on architectural patterns, scalability, and system design.", - "weight": 1.5 + "weight": 1.5, + "weight_description": "Expert in system design and scalability. Trust for architectural decisions and structural integrity." }, { "model": "claude-3-opus", "role": "Security Specialist", "system_prompt": "You are a Security Expert. Focus on security vulnerabilities, edge cases, and potential exploits.", - "weight": 1.2 + "weight": 1.2, + "weight_description": "Expert in security and vulnerability assessment. Trust for identifying security flaws and attack vectors." }, { "model": "gemini-1.5-pro", "role": "Code Reviewer", "system_prompt": "You are a Code Quality Expert. Focus on code quality, performance, and best practices.", - "weight": 1.0 + "weight": 1.0, + "weight_description": "Expert in code quality and performance optimization. Trust for maintainability and efficiency concerns." } ], "arbiter": { From 60243f51a6fcda9835a3fcb93fcfa9b985410f83 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 15:46:27 +0100 Subject: [PATCH 21/33] feat(ensemble): extract specialist metadata for arbiter and return alongside formatted responses Collect specialist metadata (role, model, weight_description) during arbiter formatting and return it together with the formatted text so arbiter prompts can include specialist expertise context in fusion mode. - _format_for_arbiter now accumulates arbiter_metadata for each specialist when specialist_metadata is provided. - Returns a tuple (formatted_text, metadata_for_arbiter) where metadata_for_arbiter is None for swarm mode or a list of dicts for fusion mode. - Call sites are updated to unpack the tuple and forward specialist metadata into _build_arbiter_prompt. BREAKING CHANGE: _format_for_arbiter previously returned a plain formatted string; it now returns a tuple (formatted_text, metadata_for_arbiter). Update any external callers to unpack the returned tuple and, if needed, pass the metadata as the specialist_metadata argument to _build_arbiter_prompt. --- src/rotator_library/ensemble/manager.py | 37 ++++++++++++++++++------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index a8d4f99b..fdb7f54d 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -488,13 +488,14 @@ def _format_for_arbiter( responses: List[Any], config: Dict[str, Any], specialist_metadata: Optional[List[Dict[str, Any]]] = None - ) -> str: + ) -> tuple: """ Format drone/specialist responses for arbiter consumption. Creates a structured text format with numbered responses. Phase 4: Implements Blind Switch to strip model names. Phase 5: Adds role labels for fusion specialists. + MISSING FEATURE FIX: Extracts specialist metadata for arbiter context. Args: responses: List of successful drone/specialist responses @@ -502,7 +503,8 @@ def _format_for_arbiter( specialist_metadata: Optional list of specialist metadata (for fusion mode) Returns: - Formatted text string for arbiter + Tuple of (formatted_text, metadata_for_arbiter) + metadata_for_arbiter is None for swarm mode, list of dicts for fusion mode """ lib_logger.debug(f"[HiveMind] Formatting {len(responses)} responses for arbiter") @@ -511,6 +513,7 @@ def _format_for_arbiter( blind_mode = arbiter_config.get("blind", True) # Default ON formatted_parts = [] + arbiter_metadata = [] # MISSING FEATURE FIX: Collect metadata for arbiter for i, response in enumerate(responses): response_num = i + 1 @@ -538,13 +541,21 @@ def _format_for_arbiter( if specialist_metadata and i < len(specialist_metadata): specialist = specialist_metadata[i] role = specialist.get("_specialist_role", "Unknown") + model_name = specialist.get("model", "unknown") + weight_desc = specialist.get("_specialist_weight_description", "") + + # MISSING FEATURE FIX: Build metadata for arbiter context + arbiter_metadata.append({ + "role": role, + "model": model_name, + "weight_description": weight_desc + }) if blind_mode: # Blind mode: show role but not model label = f"{role}" else: # Non-blind: show role and model - model_name = specialist.get("model", "unknown") label = f"{role} ({model_name})" lib_logger.debug( @@ -571,7 +582,10 @@ def _format_for_arbiter( f"({len(formatted_text)} characters total, blind_mode={blind_mode})" ) - return formatted_text + # Return metadata only if fusion mode + metadata_for_arbiter = arbiter_metadata if arbiter_metadata else None + + return formatted_text, metadata_for_arbiter def _build_arbiter_prompt( self, @@ -1072,19 +1086,20 @@ async def _handle_fusion_streaming( specialist_models, request ) - # Format responses with role labels - formatted_responses = self._format_for_arbiter( + # Format responses with role labels and extract metadata + formatted_responses, specialist_metadata_for_arbiter = self._format_for_arbiter( specialist_responses, config, specialist_metadata=specialist_models ) - # Build arbiter prompt + # Build arbiter prompt with specialist expertise context original_messages = kwargs.get("messages", []) arbiter_messages = self._build_arbiter_prompt( formatted_responses, config, - original_messages + original_messages, + specialist_metadata=specialist_metadata_for_arbiter # MISSING FEATURE FIX: Pass metadata ) # Get arbiter model @@ -1188,7 +1203,8 @@ async def handle_request(self, request, **kwargs): specialist_models, request ) - formatted_responses = self._format_for_arbiter( + # Format responses and extract metadata for arbiter + formatted_responses, specialist_metadata_for_arbiter = self._format_for_arbiter( specialist_responses, config, specialist_metadata=specialist_models @@ -1198,7 +1214,8 @@ async def handle_request(self, request, **kwargs): arbiter_messages = self._build_arbiter_prompt( formatted_responses, config, - original_messages + original_messages, + specialist_metadata=specialist_metadata_for_arbiter # MISSING FEATURE FIX: Pass metadata ) arbiter_config = config.get("arbiter", {}) From 55a94f808da878f029804b435ff84a2b72cc3abe Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 15:57:59 +0100 Subject: [PATCH 22/33] =?UTF-8?q?fix(rotator):=20=F0=9F=90=9B=20include=20?= =?UTF-8?q?HiveMind=20fusion=20models=20in=20available=20models=20listing?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously get_all_available_models omitted HiveMind fusion models from the returned provider list. When an ensemble_manager is available, the method now fetches fusion IDs via ensemble_manager.config_loader.get_all_fusion_ids(), adds them under the "hivemind_fusion" provider key, and logs the number of fusion models added. --- src/rotator_library/client.py | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/rotator_library/client.py b/src/rotator_library/client.py index f4af8a48..acd21520 100644 --- a/src/rotator_library/client.py +++ b/src/rotator_library/client.py @@ -1774,7 +1774,9 @@ async def get_available_models(self, provider: str) -> List[str]: async def get_all_available_models( self, grouped: bool = True ) -> Union[Dict[str, List[str]], List[str]]: - """Returns a list of all available models, either grouped by provider or as a flat list.""" + """Returns a list of all available models, either grouped by provider or as a flat list. + + MISSING FEATURE FIX: Now includes HiveMind fusion models.""" lib_logger.info("Getting all available models...") all_providers = list(self.all_credentials.keys()) @@ -1791,6 +1793,13 @@ async def get_all_available_models( else: all_provider_models[provider] = result + # MISSING FEATURE FIX: Add HiveMind fusion models + if self.ensemble_manager: + fusion_ids = self.ensemble_manager.config_loader.get_all_fusion_ids() + if fusion_ids: + all_provider_models["hivemind_fusion"] = fusion_ids + lib_logger.info(f"Added {len(fusion_ids)} HiveMind fusion models") + lib_logger.info("Finished getting all available models.") if grouped: return all_provider_models From 4b0a0bfb0b1e77df5429cbae8961a17df7c572f0 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 16:20:37 +0100 Subject: [PATCH 23/33] =?UTF-8?q?docs(hivemind):=20=F0=9F=93=9A=20add=20Hi?= =?UTF-8?q?veMind=20API=20and=20user=20guide,=20update=20task=20checklist?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add comprehensive HiveMind_API.md covering EnsembleManager, ConfigLoader, response usage (standard + hivemind_details), configuration schemas (swarms/fusions/strategies), error handling, logging, advanced usage, migration, performance, and limitations - Add HiveMind_User_Guide.md with quick start, swarm and fusion workflows, configuration examples, arbitration strategies, streaming support, usage & cost tracking, best practices, and troubleshooting - Update docs/HiveMind Task.md to reflect implementation progress: - fusion: `_prepare_fusion_models`, role assignment, partial arbiter context/weights parsing - recursion: single-call autonomous mode, consensus/conflict parsing, `_trigger_round_2()` replaced by Autonomous Decision Protocol - polish: partial failure handling, infinite recursion prevention, latency/token logging, and rate limit mitigation marked implemented --- docs/HiveMind Task.md | 30 +-- docs/HiveMind_API.md | 484 ++++++++++++++++++++++++++++++++++++ docs/HiveMind_User_Guide.md | 389 +++++++++++++++++++++++++++++ 3 files changed, 888 insertions(+), 15 deletions(-) create mode 100644 docs/HiveMind_API.md create mode 100644 docs/HiveMind_User_Guide.md diff --git a/docs/HiveMind Task.md b/docs/HiveMind Task.md index 53ce1df0..0da1790a 100644 --- a/docs/HiveMind Task.md +++ b/docs/HiveMind Task.md @@ -45,22 +45,22 @@ - [ ] Add logging for scores ## Phase 4: Fusion Mode -- [ ] Implement Fusion Features - - [ ] `_prepare_models()` - multi-model setup - - [ ] Role assignment and prompts - - [ ] Role context for Arbiter - - [ ] Weight system (future) +- [/] Implement Fusion Features + - [x] `_prepare_models()` - multi-model setup (implemented as `_prepare_fusion_models`) + - [x] Role assignment and prompts + - [/] Role context for Arbiter (Labels implemented, but explicit expertise context block missing) + - [/] Weight system (Weights parsed but not used in arbiter context) - [ ] Testing - [ ] Test 2-model fusion - [ ] Test role context injection - [ ] Test specialist descriptions ## Phase 5: Recursive/Reflective Mode -- [ ] Implement Recursion - - [ ] Consensus check logic - - [ ] Conflict extraction - - [ ] `_trigger_round_2()` implementation - - [ ] Max rounds enforcement +- [x] Implement Recursion (Single-Call Autonomous Mode) + - [x] Consensus check logic (via Prompt & Stream Parsing) + - [x] Conflict extraction (via Stream Parsing) + - [x] `_trigger_round_2()` implementation (Replaced by Autonomous Decision Protocol) + - [x] Max rounds enforcement (N/A for Single Call) - [ ] Testing - [ ] Test low-confidence trigger - [ ] Test Round 2 critique @@ -68,13 +68,13 @@ ## Phase 6: Polish & Edge Cases - [ ] Error Handling - - [ ] Partial failure handling + - [x] Partial failure handling - [ ] Arbiter failure fallback - - [ ] Infinite recursion prevention + - [x] Infinite recursion prevention (N/A) - [ ] Performance - - [ ] Latency logging - - [ ] Token usage tracking - - [ ] Rate limit mitigation + - [x] Latency logging + - [x] Token usage tracking + - [x] Rate limit mitigation (Inherited from RotatingClient) - [ ] Documentation - [ ] User guide - [ ] Example configs diff --git a/docs/HiveMind_API.md b/docs/HiveMind_API.md new file mode 100644 index 00000000..e5c45ad8 --- /dev/null +++ b/docs/HiveMind_API.md @@ -0,0 +1,484 @@ +# HiveMind API Reference + +## EnsembleManager + +Main class for orchestrating HiveMind requests. + +### `__init__(rotating_client, config_dir=None)` + +Initialize the ensemble manager. + +**Parameters:** +- `rotating_client` (RotatingClient): Reference to the RotatingClient instance +- `config_dir` (str, optional): Path to ensemble_configs directory. Defaults to `src/rotator_library/ensemble_configs` + +**Example:** +```python +client = RotatingClient() +# EnsembleManager is automatically initialized +manager = client.ensemble_manager +``` + +### `is_ensemble(model_id: str) -> bool` + +Check if a model ID represents an ensemble request. + +**Parameters:** +- `model_id` (str): Full model ID from user request + +**Returns:** +- `bool`: True if ensemble (swarm or fusion), False otherwise + +**Example:** +```python +manager.is_ensemble("gpt-4o[swarm]") # True +manager.is_ensemble("dev-team") # True +manager.is_ensemble("gpt-4o") # False +``` + +### `get_base_model(swarm_id: str) -> str` + +Extract base model name from swarm ID. + +**Parameters:** +- `swarm_id` (str): Swarm model ID (e.g., "gemini-1.5-flash[swarm]") + +**Returns:** +- `str`: Base model name (e.g., "gemini-1.5-flash") + +**Example:** +```python +base = manager.get_base_model("gpt-4o[swarm]") # "gpt-4o" +``` + +### `get_fusion_ids() -> List[str]` + +Get list of all configured fusion IDs. + +**Returns:** +- `List[str]`: List of fusion identifiers + +**Example:** +```python +fusion_ids = manager.get_fusion_ids() # ["dev-team", "creative-writers"] +``` + +### `handle_request(request, **kwargs) -> Response | AsyncGenerator` + +Main entry point for ensemble execution. + +**Parameters:** +- `request`: Original request object +- `**kwargs`: Request parameters (model, messages, stream, etc.) + +**Returns:** +- `Response`: Complete response (if stream=False) +- `AsyncGenerator`: Streaming response generator (if stream=True) + +**Example:** +```python +# Non-streaming +response = await client.acompletion( + model="gpt-4o[swarm]", + messages=[{"role": "user", "content": "Test"}], + stream=False +) + +# Streaming +async for chunk in client.acompletion( + model="gpt-4o[swarm]", + messages=[{"role": "user", "content": "Test"}], + stream=True +): + print(chunk) +``` + +--- + +## ConfigLoader + +Manages configuration loading for ensemble modes. + +### `load_all() -> None` + +Load all configurations from directory structure. + +**Side Effects:** +- Populates `swarm_default`, `swarm_configs`, `fusion_configs`, `strategies` + +### `get_swarm_config(model: str) -> Dict[str, Any]` + +Get swarm configuration for a specific model. + +**Parameters:** +- `model` (str): Base model name (without [swarm] suffix) + +**Returns:** +- `Dict[str, Any]`: Merged configuration (default + model-specific) + +### `get_fusion_config(fusion_id: str) -> Optional[Dict[str, Any]]` + +Get fusion configuration by ID. + +**Parameters:** +- `fusion_id` (str): Fusion identifier + +**Returns:** +- `Dict[str, Any]` | `None`: Fusion configuration or None if not found + +### `get_strategy(strategy_name: str) -> Optional[str]` + +Get strategy template by name. + +**Parameters:** +- `strategy_name` (str): Strategy identifier + +**Returns:** +- `str` | `None`: Strategy template or None if not found + +### `get_all_fusion_ids() -> List[str]` + +Get list of all fusion IDs. + +**Returns:** +- `List[str]`: List of fusion identifiers + +--- + +## Response Object + +HiveMind responses follow the standard OpenAI response format with additional usage details. + +### `Response.usage` + +Usage statistics for the request. + +**Standard Fields (OpenAI-Compatible):** + +These fields contain the **complete aggregated totals** from all models (drones/specialists + arbiter). They are fully compatible with existing tooling and billing systems. + +- `prompt_tokens` (int): **Total** prompt tokens from all models +- `completion_tokens` (int): **Total** completion tokens from all models +- `total_tokens` (int): **Total** tokens (sum of prompt + completion) +- `cached_tokens` (int, optional): **Total** cached tokens if supported +- `reasoning_tokens` (int, optional): **Total** reasoning tokens if supported + +**HiveMind-Specific Fields (Supplementary):** + +- `hivemind_details` (dict): **Breakdown information** for observability (does NOT replace standard fields) + +**Important**: Always use the standard fields for billing, quotas, and analytics. They contain the correct aggregated totals. The `hivemind_details` provides additional context for debugging and understanding HiveMind execution. + +### `Response.usage.hivemind_details` + +Supplementary breakdown dictionary containing: + +**Common Fields:** +- `mode` (str): "swarm" or "fusion" +- `arbiter_tokens` (int): Tokens used by arbiter +- `total_cost_usd` (float): Estimated total cost in USD +- `latency_ms` (float): Total execution time in milliseconds + +**Swarm-Specific:** +- `drone_count` (int): Number of drones executed +- `drone_tokens` (int): Total tokens from all drones + +**Fusion-Specific:** +- `specialist_count` (int): Number of specialists executed +- `specialist_tokens` (int): Total tokens from all specialists + +**Example:** +```python +response = await client.acompletion(model="gpt-4o[swarm]", ...) + +# Standard fields contain TOTAL aggregated usage +usage = response.usage +print(f"Total tokens: {usage.total_tokens}") # e.g., 650 (drones 450 + arbiter 200) +print(f"Prompt tokens: {usage.prompt_tokens}") # e.g., 400 (all models combined) +print(f"Completion tokens: {usage.completion_tokens}") # e.g., 250 (all models combined) + +# Supplementary breakdown for observability +details = usage.hivemind_details +print(f"Mode: {details['mode']}") # "swarm" +print(f"Drone count: {details['drone_count']}") # 3 +print(f"Drone tokens: {details['drone_tokens']}") # 450 (breakdown) +print(f"Arbiter tokens: {details['arbiter_tokens']}") # 200 (breakdown) +print(f"Cost: ${details['total_cost_usd']}") # 0.00123 +print(f"Latency: {details['latency_ms']}ms") # 1523.45 + +# Note: drone_tokens + arbiter_tokens = total_tokens +# The standard usage fields are what billing systems should use +``` + +--- + +## Configuration Schema + +### Swarm Configuration + +**File Location:** `ensemble_configs/swarms/*.json` + +**Schema:** +```json +{ + "model": "string (optional, only for model-specific configs)", + "suffix": "string (default: '[swarm]')", + "count": "integer (default: 3)", + + "temperature_jitter": { + "enabled": "boolean", + "delta": "float (temperature variance)" + }, + + "arbiter": { + "model": "string ('self' or model ID)", + "strategy": "string (strategy name)", + "blind": "boolean (default: true)" + }, + + "adversarial_config": { + "enabled": "boolean", + "count": "integer (number of adversarial drones)", + "prompt": "string (system prompt for adversarial drones)" + }, + + "recursive_mode": { + "enabled": "boolean", + "consensus_threshold": "integer (1-10 scale)" + } +} +``` + +### Fusion Configuration + +**File Location:** `ensemble_configs/fusions/*.json` + +**Schema:** +```json +{ + "id": "string (unique fusion identifier)", + "description": "string (optional)", + + "specialists": [ + { + " model": "string (model ID)", + "role": "string (specialist role name)", + "system_prompt": "string (role-specific instructions)", + "weight": "float (importance weight, default: 1.0)" + } + ], + + "arbiter": { + "model": "string (model ID)", + "strategy": "string (strategy name)", + "blind": "boolean (default: true)" + }, + + "recursive_mode": { + "enabled": "boolean", + "consensus_threshold": "integer (1-10 scale)" + } +} +``` + +### Strategy Template + +**File Location:** `ensemble_configs/strategies/*.txt` + +**Format:** +Plain text file with `{responses}` placeholder. + +**Example:** +``` +You are an expert synthesizer. Analyze the following responses and create a single, superior answer. + +{responses} + +Provide your synthesis as a complete, high-quality response. +``` + +--- + +## Error Handling + +### Common Exceptions + +**`ValueError`**: Invalid model ID or configuration +```python +try: + response = await client.acompletion(model="invalid-fusion", ...) +except ValueError as e: + print(f"Configuration error: {e}") +``` + +**`RuntimeError`**: All drones/specialists failed +```python +try: + response = await client.acompletion(model="gpt-4o[swarm]", ...) +except RuntimeError as e: + print(f"Execution error: {e}") +``` + +### Partial Failures + +If some drones/specialists fail but at least one succeeds, HiveMind continues with successful responses and logs warnings. + +**Logs:** +``` +[ERROR] [HiveMind] Drone 2/3 failed: Rate limit exceeded +[WARNING] [HiveMind] 1/3 drones failed. Proceeding with 2 successful responses. +``` + +--- + +## Logging + +HiveMind uses the `rotator_library.ensemble` logger. + +**Log Levels:** +- `INFO`: Normal operations (processing, completion) +- `DEBUG`: Detailed execution (temperatures, prompts) +- `WARNING`: Low consensus, partial failures, conflicts +- `ERROR`: Drone failures, critical issues + +**Example Configuration:** +```python +import logging + +# Enable HiveMind debug logging +logging.getLogger("rotator_library.ensemble").setLevel(logging.DEBUG) + +# Example logs: +# [INFO] [HiveMind] Processing Swarm request: gpt-4o[swarm] (base: gpt-4o, 3 drones, streaming: False) +# [DEBUG] [HiveMind] Drone 1: temperature=0.82, adversarial=False +# [DEBUG] [HiveMind] Arbiter prompt built: 2 messages +# [INFO] [HiveMind] Swarm completed successfully. Total usage: 650 tokens. Latency: 1234.56ms, Cost: $0.001200 +``` + +--- + +## Advanced Usage + +### Custom Arbiter Models + +Use different arbiter models for different fusions: + +```json +{ + "id": "research-team", + "specialists": [...], + "arbiter": { + "model": "gpt-4o", // Use GPT-4o specifically + "strategy": "synthesis" + } +} +``` + +### Self-Arbiter + +Use the same model as arbiter (saves one API call): + +```json +{ + "arbiter": { + "model": "self", // Use base model as arbiter + "strategy": "best_of_n" + } +} +``` + +### Multiple Strategies + +Create task-specific strategies: + +**`ensemble_configs/strategies/math_solver.txt`:** +``` +You are a mathematics expert. Review these solutions: + +{responses} + +Identify the correct approach, verify calculations, and provide the final answer with step-by-step explanation. +``` + +Usage: +```json +{ + "arbiter": { + "strategy": "math_solver" + } +} +``` + +--- + +## Migration Guide + +### From Single Model to Swarm + +**Before:** +```python +response = await client.acompletion( + model="gpt-4o-mini", + messages=[{"role": "user", "content": "Explain AI"}] +) +``` + +**After:** +```python +response = await client.acompletion( + model="gpt-4o-mini[swarm]", # Add [swarm] suffix + messages=[{"role": "user", "content": "Explain AI"}] +) +``` + +### From Multiple Calls to Fusion + +**Before:** +```python +arch_response = await client.acompletion(model="gpt-4o", ...) +sec_response = await client.acompletion(model="claude-3-opus", ...) +# Manually combine responses +``` + +**After:** +Create fusion config, then: +```python +response = await client.acompletion( + model="dev-team", # All in one call + messages=[...] +) +``` + +--- + +## Performance Metrics + +Typical latencies (3 drones/specialists, non-streaming): + +| Model Type | Drones/Specialists | Avg Latency | +|------------|-------------------|-------------| +| gpt-4o-mini[swarm] | 3 | 1.2-2.0s | +| gpt-4o[swarm] | 3 | 2.0-3.5s | +| dev-team (fusion) | 3 | 2.5-4.0s | + +**Note**: Streaming reduces perceived latency as arbiter output begins immediately after drone/specialist completion. + +--- + +## Limitations + +1. **Cost**: Multiple API calls increase costs proportionally +2. **Rate Limits**: May hit rate limits faster with parallel calls +3. **Latency**: Total time = max(drone time) + arbiter time +4. **Model Availability**: All models must be available simultaneously +5. **Token Limits**: Large responses may exceed context windows + +--- + +## Support + +For issues, questions, or feature requests: +- Check logs (`rotator_library.ensemble`) +- Review configuration files +- Verify API keys and model availability +- See [User Guide](./HiveMind_User_Guide.md) for common patterns diff --git a/docs/HiveMind_User_Guide.md b/docs/HiveMind_User_Guide.md new file mode 100644 index 00000000..9668904c --- /dev/null +++ b/docs/HiveMind_User_Guide.md @@ -0,0 +1,389 @@ +# HiveMind User Guide + +## Overview + +**HiveMind** is a powerful ensemble feature that enables parallel model execution with intelligent arbitration. It supports two modes: + +- **Swarm Mode**: Multiple parallel calls to the **same model** (called "Drones") +- **Fusion Mode**: Multiple parallel calls to **different models** (called "Specialists") + +Both modes use an "Arbiter" model to synthesize the responses into a single, high-quality answer. + +--- + +## Quick Start + +### Swarm Mode + +Call the same model multiple times in parallel and synthesize results: + +```python +from rotator_library.client import RotatingClient + +client = RotatingClient() + +# Basic swarm - adds `[swarm]` suffix to any model +response = await client.acompletion( + model="gpt-4o-mini[swarm]", # 3 drones by default + messages=[{"role": "user", "content": "What is quantum computing?"}], + stream=False +) + +print(response.choices[0].message.content) +print(f"Total tokens: {response.usage.total_tokens}") +print(f"Drone count: {response.usage.hivemind_details['drone_count']}") +print(f"Cost: ${response.usage.hivemind_details['total_cost_usd']}") +``` + +### Fusion Mode + +Use multiple specialized models working together: + +```python +# dev-team fusion uses 3 specialist models +response = await client.acompletion( + model="dev-team", + messages=[{"role": "user", "content": "Review this function"}], + stream=False +) + +print(response.choices[0].message.content) +print(f"Specialists: {response.usage.hivemind_details['specialist_count']}") +``` + +--- + +## Swarm Mode + +### How It Works + +1. **Preparation**: Creates N copies of your request (N drones) +2. **Execution**: Runs all drones in parallel +3. **Arbitration**: An arbiter model synthesizes all responses +4. **Result**: Returns the arbiter's synthesis + +### Configuration + +Swarm behavior is configured in `src/rotator_library/ensemble_configs/swarms/`: + +**`default.json`** - Global swarm settings: +```json +{ + "suffix": "[swarm]", + "count": 3, + "temperature_jitter": { + "enabled": true, + "delta": 0.2 + }, + "arb iter": { + "model": "self", + "strategy": "synthesis", + "blind": true + }, + "adversarial_config": { + "enabled": false, + "count": 1, + "prompt": "You are a critical reviewer..." + }, + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} +``` + +**Model-specific configs** (e.g., `gemini-flash.json`): +```json +{ + "model": "gemini-1.5-flash", + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis" + } +} +``` + +### Advanced Features + +#### Temperature Jitter + +Introduces randomness to increase response diversity: + +```json +"temperature_jitter": { + "enabled": true, + "delta": 0.2 // ±0.2 variance +} +``` + +Each drone gets a slightly different temperature: `base_temp ± delta` + +#### Adversarial Mode + +Adds critical drones to stress-test solutions: + +```json +"adversarial_config": { + "enabled": true, + "count": 1, + "prompt": "You are a Senior Principal Engineer with 15+ years of experience. Your job is to find flaws, edge cases, and potential issues." +} +``` + +#### Blind Switch + +Removes model names from arbiter input (enabled by default): + +```json +"arbiter": { + "blind": true // Arbiter sees "Response 1", not "Response 1 (GPT-4o)" +} +``` + +#### Recursive Mode + +Enables autonomous arbiter decision-making for low-consensus scenarios: + +```json +"recursive_mode": { + "enabled": true, + "consensus_threshold": 7 // If consensus < 7/10, arbiter performs internal critique +} +``` + +--- + +## Fusion Mode + +### How It Works + +1. **Preparation**: Assigns role-specific prompts to each specialist +2. **Execution**: Runs all specialists in parallel +3. **Arbitration**: Arbiter synthesizes with role context +4. **Result**: Returns the arbiter's synthesis + +### Configuration + +Fusion models are configured in `src/rotator_library/ensemble_configs/fusions/`: + +**`dev-team.json`** - Example fusion: +```json +{ + "id": "dev-team", + "description": "Software development team with specialized roles", + "specialists": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt": "Focus on architectural patterns, scalability, and system design.", + "weight": 1.5 + }, + { + "model": "claude-3-opus", + "role": "Security Specialist", + "system_prompt": "Focus on security vulnerabilities and potential exploits.", + "weight": 1.0 + }, + { + "model": "gemini-1.5-pro", + "role": "Code Reviewer", + "system_prompt": "Focus on code quality, performance, and best practices.", + "weight": 1.0 + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true + } +} +``` + +### Creating Custom Fusions + +1. Create a new JSON file in `ensemble_configs/fusions/` +2. Define specialists with roles and prompts +3. Choose an arbiter model and strategy +4. Use the fusion ID as the model name + +Example: `creative-writers.json`: +```json +{ + "id": "creative-writers", + "description": "Creative writing team", + "specialists": [ + { + "model": "claude-3-opus", + "role": "Storyteller", + "system_prompt": "Focus on narrative, character development, and plot.", + "weight": 1.5 + }, + { + "model": "gpt-4o", + "role": "Editor", + "system_prompt": "Focus on clarity, grammar, and style.", + "weight": 1.0 + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis" + } +} +``` + +Usage: +```python +response = await client.acompletion( + model="creative-writers", + messages=[{"role": "user", "content": "Write a short story about AI"}] +) +``` + +--- + +## Arbitration Strategies + +Strategies are text prompts in `ensemble_configs/strategies/`: + +**`synthesis.txt`** - Combine all responses: +``` +You are an expert synthesizer. Analyze the following responses and create a single, superior answer that: +1. Combines the best elements from each response +2. Resolves any conflicts or contradictions +3. Ensures completeness and accuracy +4. Maintains coherence and clarity + +{responses} +``` + +**`best_of_n.txt`** - Select and refine the best: +``` +Review these responses and identify the strongest one. Then refine and enhance it. + +{responses} +``` + +**`code_review.txt`** - Code-specific evaluation: +``` +You are a senior code reviewer. Analyze these code responses and provide: +1. Best implementation approach +2. Security considerations +3. Performance optimization suggestions +4. Final recommended code + +{responses} +``` + +### Creating Custom Strategies + +Create a `.txt` file in `ensemble_configs/strategies/` with your prompt template. Use `{responses}` as a placeholder for the formatted responses. + +--- + +## Streaming Support + +HiveMind respects the `stream` parameter: + +```python +# Streaming swarm +async for chunk in client.acompletion( + model="gpt-4o[swarm]", + messages=[{"role": "user", "content": "Explain AI"}], + stream=True # Stream arbiter's response +): + if hasattr(chunk.choices[0].delta, 'content') and chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end='', flush=True) +``` + +**Note**: Drones/specialists execute in parallel (not streamed). Only the arbiter's final synthesis is streamed. + +--- + +## Usage & Cost Tracking + +All HiveMind responses include detailed usage information in **standard OpenAI-compatible fields** plus additional HiveMind-specific breakdown: + +```python +response = await client.acompletion( + model="gpt-4o-mini[swarm]", + messages=[{"role": "user", "content": "Test"}] +) + +# ✅ STANDARD usage fields (compatible with all tooling) +# These contain the TOTAL aggregated usage (drones/specialists + arbiter) +print(f"Prompt tokens: {response.usage.prompt_tokens}") # Total from all models +print(f"Completion tokens: {response.usage.completion_tokens}") # Total from all models +print(f"Total tokens: {response.usage.total_tokens}") # Grand total + +# ✅ SUPPLEMENTARY HiveMind details (breakdown for observability) +# These provide additional context but do NOT replace standard fields +details = response.usage.hivemind_details +print(f"Mode: {details['mode']}") # "swarm" or "fusion" +print(f"Drone/Specialist count: {details.get('drone_count') or details.get('specialist_count')}") +print(f"Drone/Specialist tokens: {details.get('drone_tokens') or details.get('specialist_tokens')}") +print(f"Arbiter tokens: {details['arbiter_tokens']}") +print(f"Total cost: ${details['total_cost_usd']}") +print(f"Latency: {details['latency_ms']}ms") +``` + +**Important**: Consumers should use the standard usage fields (`prompt_tokens`, `completion_tokens`, `total_tokens`) for billing and analytics. These already include the complete totals. The `hivemind_details` field provides a breakdown for debugging and observability. + +--- + +## Best Practices + +### Model Selection + +**Sw arm Mode**: +- Use for: Same model, different parameters (temperature jitter) +- Best for: Brainstorming, diverse perspectives, consensus building +- Models: Fast models (gpt-4o-mini, gemini-flash) for cost efficiency + +**Fusion Mode**: +- Use for: Different models, specialized expertise +- Best for: Complex tasks requiring multiple skill sets +- Models: Mix strengths (GPT for reasoning, Claude for safety, Gemini for code) + +### Cost Optimization + +1. **Use smaller models for drones**: `gpt-4o-mini[swarm]` instead of `gpt-4o[swarm]` +2. **Limit drone count**: Default is 3, but 2 is often sufficient +3. **Use "self" arbiter**: Saves one API call +4. **Monitor `hivemind_details`**: Track costs per request + +### Performance Tips + +1. **Parallel execution is fast**: All drones/specialists run simultaneously +2. **Streaming reduces perceived latency**: Users see output immediately +3. **Check latency_ms**: Identify slow requests + +--- + +## Troubleshooting + +### No ensemble detected + +**Problem**: Model isn't recognized as ensemble +**Solution**: Check spelling, ensure `[swarm]` suffix or fusion ID exists + +### All drones failed + +**Problem**: All parallel calls failed +**Solution**: Check API keys, rate limits, model availability + +### High costs + +**Problem**: HiveMind is expensive +**Solution**: Reduce drone count, use smaller models, limit to critical requests + +### Poor synthesis quality + +**Problem**: Arbiter output isn't good +**Solution**: Use a better arbiter model (gpt-4o, claude-3-opus), try different strategy + +--- + +## API Reference + +See [API.md](./API.md) for detailed API documentation. From 6c9f2787797ad7466c339c8e887633abe1452965 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 17:36:15 +0100 Subject: [PATCH 24/33] feat(ensemble): add preset-based hivemind swarm model discovery and handling - add ConfigLoader.get_all_swarm_model_ids to discover swarm model variants from preset JSONs (generates IDs like {base_model}-{preset_id}[swarm]) - include discovered hivemind_swarm entries in RotatingClient's available models listing and add informative logging - update EnsembleManager to detect `[swarm]` suffix and extract base model by recognizing preset IDs (checks swarms_dir/{preset_id}.json) - update default swarm config to include `id` and `base_models` for preset-based discovery BREAKING CHANGE: swarm suffix customization via `swarm_default.suffix` is no longer honored. Swarm model IDs must use the `[swarm]` suffix and swarm preset configs must include an `id` and `base_models` array. Update existing swarm configs and any code that relied on a custom swarm suffix. --- src/rotator_library/client.py | 6 +++ src/rotator_library/ensemble/config_loader.py | 39 +++++++++++++++++ src/rotator_library/ensemble/manager.py | 43 +++++++++++-------- .../ensemble_configs/swarms/default.json | 11 ++++- 4 files changed, 81 insertions(+), 18 deletions(-) diff --git a/src/rotator_library/client.py b/src/rotator_library/client.py index acd21520..7834cce7 100644 --- a/src/rotator_library/client.py +++ b/src/rotator_library/client.py @@ -1799,6 +1799,12 @@ async def get_all_available_models( if fusion_ids: all_provider_models["hivemind_fusion"] = fusion_ids lib_logger.info(f"Added {len(fusion_ids)} HiveMind fusion models") + + # Add HiveMind swarm models + swarm_models = self.ensemble_manager.config_loader.get_all_swarm_model_ids() + if swarm_models: + all_provider_models["hivemind_swarm"] = swarm_models + lib_logger.info(f"Added {len(swarm_models)} HiveMind swarm model variants") lib_logger.info("Finished getting all available models.") if grouped: diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py index 0c912eda..7d1243ac 100644 --- a/src/rotator_library/ensemble/config_loader.py +++ b/src/rotator_library/ensemble/config_loader.py @@ -208,3 +208,42 @@ def get_strategy(self, strategy_name: str) -> Optional[str]: def get_all_fusion_ids(self) -> List[str]: """Get list of all fusion IDs.""" return list(self.fusion_configs.keys()) + + def get_all_swarm_model_ids(self) -> List[str]: + """ + Get all discoverable swarm model variants. + + Generates model IDs from all swarm configs that define base_models. + Format: {base_model}-{preset_id}[swarm] + + Returns: + List of swarm model IDs + """ + swarm_models = [] + + for config_file in self.swarms_dir.glob("*.json"): + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + preset_id = config.get("id") + base_models = config.get("base_models", []) + + if not preset_id: + lib_logger.debug(f"Swarm config {config_file.name} missing 'id', skipping") + continue + + if not base_models: + lib_logger.debug(f"Swarm config {preset_id} has no base_models, not discoverable") + continue + + # Generate model IDs: {base_model}-{preset_id}[swarm] + for base_model in base_models: + model_id = f"{base_model}-{preset_id}[swarm]" + swarm_models.append(model_id) + + except Exception as e: + lib_logger.warning(f"Failed to process swarm config {config_file.name}: {e}") + + lib_logger.info(f"Discovered {len(swarm_models)} swarm model variants") + return swarm_models diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index fdb7f54d..686e2bd9 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -97,38 +97,47 @@ def _is_swarm_request(self, model_id: str) -> bool: """ Check if model ID contains swarm suffix. + Supports new preset-based format: {base_model}-{preset_id}[swarm] + Args: model_id: Model ID to check Returns: True if this is a swarm request """ - # Get default suffix from config - default_suffix = "[swarm]" - if self.config_loader.swarm_default: - default_suffix = self.config_loader.swarm_default.get("suffix", "[swarm]") - - return default_suffix in model_id + return model_id.endswith("[swarm]") def get_base_model(self, swarm_id: str) -> str: """ Extract base model name from swarm ID. + Supports new format: {base_model}-{preset_id}[swarm] + Args: - swarm_id: Swarm model ID (e.g., "gemini-1.5-flash[swarm]") + swarm_id: Swarm model ID (e.g., "gpt-4o-default[swarm]") Returns: - Base model name (e.g., "gemini-1.5-flash") + Base model name (e.g., "gpt-4o") """ - # Get suffix from config - default_suffix = "[swarm]" - if self.config_loader.swarm_default: - default_suffix = self.config_loader.swarm_default.get("suffix", "[swarm]") - - # Remove suffix - if default_suffix in swarm_id: - return swarm_id.replace(default_suffix, "") - + # Remove [swarm] suffix first + if swarm_id.endswith("[swarm]"): + swarm_id = swarm_id[:-7] # Remove "[swarm]" + + # Parse: {base_model}-{preset_id} + # preset_id is the last segment after the last hyphen + if "-" in swarm_id: + # Split and check if last segment is a preset ID + parts = swarm_id.rsplit("-", 1) + potential_preset = parts[1] + + # Check if it's a valid preset ID in our configs + # For now, just check if the directory has that config file + config_file = self.config_loader.swarms_dir / f"{potential_preset}.json" + if config_file.exists(): + # This is a preset ID, so base_model is everything before it + return parts[0] + + # If no preset found or no hyphen, treat entire thing as base_model return swarm_id def resolve_conflicts(self, ensemble_id: str) -> str: diff --git a/src/rotator_library/ensemble_configs/swarms/default.json b/src/rotator_library/ensemble_configs/swarms/default.json index 26619bdf..009addbc 100644 --- a/src/rotator_library/ensemble_configs/swarms/default.json +++ b/src/rotator_library/ensemble_configs/swarms/default.json @@ -1,5 +1,14 @@ { - "suffix": "[swarm]", + "id": "default", + "description": "Standard swarm configuration with balanced settings", + "base_models": [ + "gpt-4o", + "gpt-4o-mini", + "claude-3-5-sonnet", + "claude-3-haiku", + "gemini-1.5-pro", + "gemini-1.5-flash" + ], "count": 3, "temperature_jitter": { From d093b26908e559fda995699a95f5c93a09247b88 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 17:56:06 +0100 Subject: [PATCH 25/33] =?UTF-8?q?docs(hivemind):=20=F0=9F=93=9A=20mark=20f?= =?UTF-8?q?usion=20features=20and=20documentation=20items=20complete=20in?= =?UTF-8?q?=20task=20checklist?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Update docs/HiveMind Task.md to reflect progress on fusion features and documentation. - Mark "Role context for Arbiter" and "Weight system" as completed in the Fusion Features section. - Mark "Documentation" and its subitems (User guide, Example configs, API reference) as completed. --- docs/HiveMind Task.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/HiveMind Task.md b/docs/HiveMind Task.md index 0da1790a..1e4b4c0b 100644 --- a/docs/HiveMind Task.md +++ b/docs/HiveMind Task.md @@ -48,8 +48,8 @@ - [/] Implement Fusion Features - [x] `_prepare_models()` - multi-model setup (implemented as `_prepare_fusion_models`) - [x] Role assignment and prompts - - [/] Role context for Arbiter (Labels implemented, but explicit expertise context block missing) - - [/] Weight system (Weights parsed but not used in arbiter context) + - [x] Role context for Arbiter (Labels implemented, but explicit expertise context block missing) + - [x] Weight system (Weights parsed but not used in arbiter context) - [ ] Testing - [ ] Test 2-model fusion - [ ] Test role context injection @@ -75,10 +75,10 @@ - [x] Latency logging - [x] Token usage tracking - [x] Rate limit mitigation (Inherited from RotatingClient) -- [ ] Documentation - - [ ] User guide - - [ ] Example configs - - [ ] API reference +- [x] Documentation + - [x] User guide + - [x] Example configs + - [x] API reference ## Verification - [ ] Automated Tests From 2323dbc6794c25c0a83f2c1e9623483754b29b51 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 18:08:46 +0100 Subject: [PATCH 26/33] feat(ensemble): support multi-fusion config format and fusion id suffix - Accept both single-fusion objects and new {"fusions": [...]} array files when loading fusion configs. - Add _register_fusion helper to centralize validation, duplicate handling, and logging. - Emit warnings for missing 'id' fields and for non-list 'fusions' entries. - Return fusion IDs with a "[fusion]" suffix from get_all_fusion_ids and have EnsembleManager detect fusion IDs by that suffix. BREAKING CHANGE: get_all_fusion_ids now appends "[fusion]" to fusion IDs and EnsembleManager detects fusions via the "[fusion]" suffix. Consumers that expect plain fusion IDs must update to handle or strip the suffix. Config files may use the new {"fusions": [...]} format; single-object configs remain supported. --- src/rotator_library/ensemble/config_loader.py | 61 +++++++++++++------ src/rotator_library/ensemble/manager.py | 4 +- 2 files changed, 44 insertions(+), 21 deletions(-) diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py index 7d1243ac..20e4ff60 100644 --- a/src/rotator_library/ensemble/config_loader.py +++ b/src/rotator_library/ensemble/config_loader.py @@ -102,7 +102,12 @@ def _load_swarm_configs(self) -> None: lib_logger.error(f"[HiveMind] Failed to load swarm config '{config_file.name}': {e}") def _load_fusion_configs(self) -> None: - """Load fusion configurations from fusions/ directory.""" + """Load fusion configurations from fusions/ directory. + + Supports two formats: + 1. Single fusion: {"id": "...", "specialists": [...], ...} + 2. Multiple fusions: {"fusions": [{"id": "...", ...}, ...]} + """ if not self.fusions_dir.exists(): lib_logger.warning(f"[HiveMind] Fusions directory not found: {self.fusions_dir}") return @@ -112,26 +117,44 @@ def _load_fusion_configs(self) -> None: with open(config_file, 'r', encoding='utf-8') as f: config = json.load(f) - fusion_id = config.get("id") - if not fusion_id: - lib_logger.warning( - f"[HiveMind] Fusion config '{config_file.name}' missing 'id' field" - ) - continue - - # Check for duplicate IDs - if fusion_id in self.fusion_configs: - lib_logger.warning( - f"[HiveMind] Duplicate fusion ID '{fusion_id}'. " - f"Config from '{config_file.name}' will override previous." - ) - - self.fusion_configs[fusion_id] = config - lib_logger.debug(f"[HiveMind] Loaded fusion config '{fusion_id}'") + # Check if this is the new array format + if "fusions" in config: + # New format: {"fusions": [...]} + fusions_list = config.get("fusions", []) + if not isinstance(fusions_list, list): + lib_logger.warning( + f"[HiveMind] Config '{config_file.name}' has 'fusions' but it's not a list" + ) + continue + + for fusion in fusions_list: + self._register_fusion(fusion, config_file.name) + else: + # Old format: {"id": "...", "specialists": [...], ...} + self._register_fusion(config, config_file.name) except Exception as e: lib_logger.error(f"[HiveMind] Failed to load fusion config '{config_file.name}': {e}") + def _register_fusion(self, fusion: Dict[str, Any], source_file: str) -> None: + """Register a single fusion configuration.""" + fusion_id = fusion.get("id") + if not fusion_id: + lib_logger.warning( + f"[HiveMind] Fusion in '{source_file}' missing 'id' field" + ) + return + + # Check for duplicate IDs + if fusion_id in self.fusion_configs: + lib_logger.warning( + f"[HiveMind] Duplicate fusion ID '{fusion_id}'. " + f"Config from '{source_file}' will override previous." + ) + + self.fusion_configs[fusion_id] = fusion + lib_logger.debug(f"[HiveMind] Loaded fusion config '{fusion_id}'") + def _load_strategies(self) -> None: """Load strategy templates from strategies/ directory.""" if not self.strategies_dir.exists(): @@ -206,8 +229,8 @@ def get_strategy(self, strategy_name: str) -> Optional[str]: return self.strategies.get(strategy_name) def get_all_fusion_ids(self) -> List[str]: - """Get list of all fusion IDs.""" - return list(self.fusion_configs.keys()) + """Get list of all fusion IDs with [fusion] suffix.""" + return [f"{fusion_id}[fusion]" for fusion_id in self.fusion_configs.keys()] def get_all_swarm_model_ids(self) -> List[str]: """ diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 686e2bd9..508502bc 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -83,8 +83,8 @@ def is_ensemble(self, model_id: str) -> bool: if model_id in self._provider_models: return False - # Check for fusion ID (exact match) - if model_id in self.config_loader.fusion_configs: + # Check for fusion suffix + if model_id.endswith("[fusion]"): return True # Check for swarm suffix From d8c90b2baa59e81da997d2959fb563038dd7f0c0 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 18:41:51 +0100 Subject: [PATCH 27/33] =?UTF-8?q?docs(hivemind):=20=F0=9F=93=9A=20standard?= =?UTF-8?q?ize=20"HiveMind=20Ensemble"=20naming=20across=20documentation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Standardize references and headings to use "HiveMind Ensemble" for clarity and consistency across docs. - Updated: docs/HiveMind Plan.md - Updated: docs/HiveMind Task.md - Updated: docs/HiveMind_API.md - Updated: docs/HiveMind_User_Guide.md - Updated: src/rotator_library/ensemble_configs/README.md --- docs/HiveMind Plan.md | 6 +++--- docs/HiveMind Task.md | 2 +- docs/HiveMind_API.md | 6 +++--- docs/HiveMind_User_Guide.md | 4 ++-- src/rotator_library/ensemble_configs/README.md | 4 ++-- 5 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/HiveMind Plan.md b/docs/HiveMind Plan.md index 525c1a5c..880c8cfd 100644 --- a/docs/HiveMind Plan.md +++ b/docs/HiveMind Plan.md @@ -1,8 +1,8 @@ -# HiveMind (Swarm/Fusion) - Implementation Plan (REVISED) +# HiveMind Ensemble (Swarm/Fusion) - Implementation Plan (REVISED) ## Goal Description -Implement a sophisticated orchestration engine called "HiveMind" that enables two distinct modes of parallel model execution: +Implement a sophisticated orchestration engine called "HiveMind Ensemble" that enables two distinct modes of parallel model execution: 1. **Swarm Mode**: Multiple parallel calls to the **same model** (called "Drones") with optional configuration for temperature variation, adversarial critique, and recursive self-correction. 2. **Fusion Mode**: Multiple parallel calls to **different models** (called "Models" or "Specialists" when roles are assigned) with optional role-based routing and context-aware synthesis. @@ -13,7 +13,7 @@ Both modes use an "Arbiter" (judge model) to synthesize responses with configura ## Terminology -- **HiveMind**: The overall feature/system +- **HiveMind Ensemble**: The overall feature/system (may be shortened to "HiveMind" after first mention) - **Swarm**: Parallel execution of the same model - **Drone**: Individual instance in a Swarm - **Fusion**: Parallel execution of different models diff --git a/docs/HiveMind Task.md b/docs/HiveMind Task.md index 1e4b4c0b..65c00ed1 100644 --- a/docs/HiveMind Task.md +++ b/docs/HiveMind Task.md @@ -1,4 +1,4 @@ -# HiveMind (Swarm/Fusion) Implementation +# HiveMind Ensemble (Swarm/Fusion) Implementation ## Phase 1: Core Infrastructure - [x] Design and Plan diff --git a/docs/HiveMind_API.md b/docs/HiveMind_API.md index e5c45ad8..756a3a80 100644 --- a/docs/HiveMind_API.md +++ b/docs/HiveMind_API.md @@ -1,8 +1,8 @@ -# HiveMind API Reference +# HiveMind Ensemble API Reference ## EnsembleManager -Main class for orchestrating HiveMind requests. +Main class for orchestrating HiveMind Ensemble requests. ### `__init__(rotating_client, config_dir=None)` @@ -163,7 +163,7 @@ These fields contain the **complete aggregated totals** from all models (drones/ - `cached_tokens` (int, optional): **Total** cached tokens if supported - `reasoning_tokens` (int, optional): **Total** reasoning tokens if supported -**HiveMind-Specific Fields (Supplementary):** +**HiveMind Ensemble-Specific Fields (Supplementary):** - `hivemind_details` (dict): **Breakdown information** for observability (does NOT replace standard fields) diff --git a/docs/HiveMind_User_Guide.md b/docs/HiveMind_User_Guide.md index 9668904c..32182f16 100644 --- a/docs/HiveMind_User_Guide.md +++ b/docs/HiveMind_User_Guide.md @@ -1,8 +1,8 @@ -# HiveMind User Guide +# HiveMind Ensemble User Guide ## Overview -**HiveMind** is a powerful ensemble feature that enables parallel model execution with intelligent arbitration. It supports two modes: +**HiveMind Ensemble** is a powerful feature that enables parallel model execution with intelligent arbitration. It supports two modes: - **Swarm Mode**: Multiple parallel calls to the **same model** (called "Drones") - **Fusion Mode**: Multiple parallel calls to **different models** (called "Specialists") diff --git a/src/rotator_library/ensemble_configs/README.md b/src/rotator_library/ensemble_configs/README.md index ff63f601..0fece40b 100644 --- a/src/rotator_library/ensemble_configs/README.md +++ b/src/rotator_library/ensemble_configs/README.md @@ -1,6 +1,6 @@ -# HiveMind Configuration Guide +# HiveMind Ensemble Configuration Guide -This directory contains the configuration for HiveMind (Swarm/Fusion) feature. +This directory contains the configuration for HiveMind Ensemble (Swarm/Fusion) feature. ## Directory Structure From 105d10a2dadd07ed623a498aed19f80bb1b79be5 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 19:00:09 +0100 Subject: [PATCH 28/33] feat(ensemble): switch swarm loader to preset-based format and add sample configs Refactor swarm configuration loading to use preset files identified by an "id" and "base_models". Changes include: - Simplify _load_swarm_configs to assume preset-based discovery and defer per-preset loading to get_swarm_config. - Replace get_swarm_config(model: str) with get_swarm_config(preset_id: str) which loads .json, validates presence of "id" and "base_models", and falls back to the default preset on error or missing file. - Remove legacy per-model in-memory discovery/merging logic; presets are loaded on-demand. - Add example configs: fusions/multi-provider-test.json and swarms/test-gemini.json. BREAKING CHANGE: ConfigLoader.get_swarm_config signature and behavior changed. Previously callers passed a base model name and received a merged config with model-specific overrides applied; callers must now pass a preset ID (filename without .json) and will receive the preset config (with "id" and "base_models") or the default fallback. Migrate existing per-model configs to the preset format or update call sites to use preset IDs and new discovery helpers. --- src/rotator_library/ensemble/config_loader.py | 74 +++++++++---------- .../fusions/multi-provider-test.json | 21 ++++++ .../ensemble_configs/swarms/test-gemini.json | 15 ++++ 3 files changed, 69 insertions(+), 41 deletions(-) create mode 100644 src/rotator_library/ensemble_configs/fusions/multi-provider-test.json create mode 100644 src/rotator_library/ensemble_configs/swarms/test-gemini.json diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py index 20e4ff60..d0071cc9 100644 --- a/src/rotator_library/ensemble/config_loader.py +++ b/src/rotator_library/ensemble/config_loader.py @@ -63,7 +63,10 @@ def _ensure_directories(self) -> None: directory.mkdir(parents=True, exist_ok=True) def _load_swarm_configs(self) -> None: - """Load swarm configurations from swarms/ directory.""" + """Load swarm configurations from swarms/ directory. + + Only supports preset-based format with 'id' and 'base_models'. + """ if not self.swarms_dir.exists(): lib_logger.warning(f"[HiveMind] Swarms directory not found: {self.swarms_dir}") return @@ -80,26 +83,9 @@ def _load_swarm_configs(self) -> None: else: lib_logger.warning("[HiveMind] No default swarm config found") - # Load model-specific configs - for config_file in self.swarms_dir.glob("*.json"): - if config_file.name == "default.json": - continue - - try: - with open(config_file, 'r', encoding='utf-8') as f: - config = json.load(f) - - # Extract model name from config - model_name = config.get("model") - if model_name: - self.swarm_configs[model_name] = config - lib_logger.debug(f"[HiveMind] Loaded swarm config for '{model_name}'") - else: - lib_logger.warning( - f"[HiveMind] Swarm config '{config_file.name}' missing 'model' field" - ) - except Exception as e: - lib_logger.error(f"[HiveMind] Failed to load swarm config '{config_file.name}': {e}") + # All swarm configs now use preset-based format (id + base_models) + # Discovery is handled by get_all_swarm_model_ids() + # Individual preset configs loaded on-demand via get_swarm_config() def _load_fusion_configs(self) -> None: """Load fusion configurations from fusions/ directory. @@ -175,34 +161,40 @@ def _load_strategies(self) -> None: f"[HiveMind] Failed to load strategy '{strategy_file.name}': {e}" ) - def get_swarm_config(self, model: str) -> Dict[str, Any]: + def get_swarm_config(self, preset_id: str) -> Dict[str, Any]: """ - Get swarm configuration for a specific model. - - Merges default config with model-specific overrides. + Get swarm configuration for a specific preset. Args: - model: Base model name (without [swarm] suffix) + preset_id: Preset ID (e.g., "default", "aggressive") Returns: - Merged configuration dictionary + Configuration dictionary with defaults applied """ - # BUGFIX: Use deepcopy to prevent mutations to global default config - config = copy.deepcopy(self.swarm_default) if self.swarm_default else {} + # Try to load preset config file + config_file = self.swarms_dir / f"{preset_id}.json" - # Apply model-specific overrides - if model in self.swarm_configs: - model_config = self.swarm_configs[model] - # Deep merge - for key, value in model_config.items(): - if key == "model": - continue # Don't copy the model name - if isinstance(value, dict) and key in config: - config[key] = {**config[key], **value} - else: - config[key] = value + if not config_file.exists(): + lib_logger.warning(f"[HiveMind] Swarm preset '{preset_id}' not found") + # Return default config if available + return copy.deepcopy(self.swarm_default) if self.swarm_default else {} - return config + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + # Validate it's a preset-based config + if "id" not in config or "base_models" not in config: + lib_logger.warning( + f"[HiveMind] Swarm config '{preset_id}' missing 'id' or 'base_models'" + ) + return copy.deepcopy(self.swarm_default) if self.swarm_default else {} + + return config + + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load swarm preset '{preset_id}': {e}") + return copy.deepcopy(self.swarm_default) if self.swarm_default else {} def get_fusion_config(self, fusion_id: str) -> Optional[Dict[str, Any]]: """ diff --git a/src/rotator_library/ensemble_configs/fusions/multi-provider-test.json b/src/rotator_library/ensemble_configs/fusions/multi-provider-test.json new file mode 100644 index 00000000..7fa217c3 --- /dev/null +++ b/src/rotator_library/ensemble_configs/fusions/multi-provider-test.json @@ -0,0 +1,21 @@ +{ + "fusions": [ + { + "id": "multi-provider", + "description": "Multi-provider fusion hitting all providers - minimal specialist config test", + "arbiter": { + "model": "gemini/gemini-2.5-pro", + "strategy": "synthesis", + "blind": false + }, + "specialists": [ + {"model": "iflow/K2-0905"}, + {"model": "gemini/gemini-2.5-flash"}, + {"model": "nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct"}, + {"model": "qwen_code/qwen3-coder-plus"}, + {"model": "gemini_cli/gemini-2.5-flash-lite"}, + {"model": "opencode/big-pickle"} + ] + } + ] +} diff --git a/src/rotator_library/ensemble_configs/swarms/test-gemini.json b/src/rotator_library/ensemble_configs/swarms/test-gemini.json new file mode 100644 index 00000000..91f8095b --- /dev/null +++ b/src/rotator_library/ensemble_configs/swarms/test-gemini.json @@ -0,0 +1,15 @@ +{ + "id": "test-gemini", + "description": "Test swarm for Gemini 2.5 Flash", + "base_models": ["gemini/gemini-2.5-flash"], + "count": 3, + "arbiter": { + "model": "self", + "strategy": "synthesis", + "blind": false + }, + "temperature_jitter": { + "enabled": true, + "delta": 0.3 + } +} From d8ed4a2980c280110c8dc5c4db4753f76f1c7647 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 19:13:30 +0100 Subject: [PATCH 29/33] feat(ensemble): add role template support and sample role configs - add roles_dir and role_templates to ConfigLoader and implement _load_roles - support single-role JSON and new {"roles": [...]} array format; register roles with normalized IDs and warn on duplicates - expose get_role_template for runtime lookup - resolve "role_template" in EnsembleManager by merging template with specialist config (specialist overrides template) and fallback to name-based role labels - include sample role configs: architect, code-reviewer, security-expert - update dev-team fusion arbiter to use blind=true and clarify note about hiding model names while preserving roles --- src/rotator_library/ensemble/config_loader.py | 85 ++++++++++++++++++- src/rotator_library/ensemble/manager.py | 15 +++- .../ensemble_configs/fusions/dev-team.json | 4 +- .../ensemble_configs/roles/architect.json | 6 ++ .../ensemble_configs/roles/code-reviewer.json | 6 ++ .../roles/security-expert.json | 6 ++ 6 files changed, 117 insertions(+), 5 deletions(-) create mode 100644 src/rotator_library/ensemble_configs/roles/architect.json create mode 100644 src/rotator_library/ensemble_configs/roles/code-reviewer.json create mode 100644 src/rotator_library/ensemble_configs/roles/security-expert.json diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py index d0071cc9..65957f6b 100644 --- a/src/rotator_library/ensemble/config_loader.py +++ b/src/rotator_library/ensemble/config_loader.py @@ -28,12 +28,14 @@ def __init__(self, config_dir: str): self.swarms_dir = self.config_dir / "swarms" self.fusions_dir = self.config_dir / "fusions" self.strategies_dir = self.config_dir / "strategies" + self.roles_dir = self.config_dir / "roles" # Loaded configurations self.swarm_default: Optional[Dict[str, Any]] = None self.swarm_configs: Dict[str, Dict[str, Any]] = {} self.fusion_configs: Dict[str, Dict[str, Any]] = {} self.strategies: Dict[str, str] = {} + self.role_templates: Dict[str, Dict[str, Any]] = {} def load_all(self) -> None: """Load all configurations from the directory structure.""" @@ -51,15 +53,19 @@ def load_all(self) -> None: # Load strategy templates self._load_strategies() + # Load role templates + self._load_roles() + lib_logger.info( f"[HiveMind] Loaded {len(self.swarm_configs)} swarm configs, " f"{len(self.fusion_configs)} fusion configs, " - f"{len(self.strategies)} strategies" + f"{len(self.strategies)} strategies, " + f"{len(self.role_templates)} roles" ) def _ensure_directories(self) -> None: """Create config directories if they don't exist.""" - for directory in [self.swarms_dir, self.fusions_dir, self.strategies_dir]: + for directory in [self.swarms_dir, self.fusions_dir, self.strategies_dir, self.roles_dir]: directory.mkdir(parents=True, exist_ok=True) def _load_swarm_configs(self) -> None: @@ -161,6 +167,69 @@ def _load_strategies(self) -> None: f"[HiveMind] Failed to load strategy '{strategy_file.name}': {e}" ) + def _load_roles(self) -> None: + """Load role templates from roles/ directory. + + Supports two formats: + 1. Single role: {"name": "...", "system_prompt": "...", ...} + 2. Multiple roles: {"roles": [{"name": "...", ...}, ...]} + """ + if not self.roles_dir.exists(): + lib_logger.warning(f"[HiveMind] Roles directory not found: {self.roles_dir}") + return + + for role_file in self.roles_dir.glob("*.json"): + try: + with open(role_file, 'r', encoding='utf-8') as f: + data = json.load(f) + + # Check if this is the new array format + if "roles" in data: + # New format: {"roles": [...]} + roles_list = data.get("roles", []) + if not isinstance(roles_list, list): + lib_logger.warning( + f"[HiveMind] Role file '{role_file.name}' has 'roles' but it's not a list" + ) + continue + + for role in roles_list: + self._register_role(role, role_file.name) + else: + # Old format: {"name": "...", "system_prompt": "...", ...} + # Use filename as role_id + role_id = role_file.stem + self.role_templates[role_id] = data + lib_logger.debug(f"[HiveMind] Loaded role template '{role_id}'") + + except Exception as e: + lib_logger.error( + f"[HiveMind] Failed to load role template '{role_file.name}': {e}" + ) + + def _register_role(self, role: Dict[str, Any], source_file: str) -> None: + """Register a single role template.""" + # Use 'name' field as role_id, convert to lowercase with hyphens + role_name = role.get("name") + if not role_name: + lib_logger.warning( + f"[HiveMind] Role in '{source_file}' missing 'name' field" + ) + return + + # Convert name to role_id (e.g., "Security Expert" -> "security-expert") + role_id = role_name.lower().replace(" ", "-") + + # Check for duplicate IDs + if role_id in self.role_templates: + lib_logger.warning( + f"[HiveMind] Duplicate role ID '{role_id}'. " + f"Role from '{source_file}' will override previous." + ) + + self.role_templates[role_id] = role + lib_logger.debug(f"[HiveMind] Loaded role template '{role_id}' from array") + def get_swarm_config(self, preset_id: str) -> Dict[str, Any]: """ Get swarm configuration for a specific preset. @@ -220,6 +289,18 @@ def get_strategy(self, strategy_name: str) -> Optional[str]: """ return self.strategies.get(strategy_name) + def get_role_template(self, role_id: str) -> Optional[Dict[str, Any]]: + """ + Get role template by ID. + + Args: + role_id: Role template identifier (e.g., "architect", "security-expert") + + Returns: + Role template dictionary or None if not found + """ + return self.role_templates.get(role_id) + def get_all_fusion_ids(self) -> List[str]: """Get list of all fusion IDs with [fusion] suffix.""" return [f"{fusion_id}[fusion]" for fusion_id in self.fusion_configs.keys()] diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 508502bc..396e0dee 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -347,8 +347,21 @@ def _prepare_fusion_models( for i, specialist in enumerate(specialists): specialist_num = i + 1 + + # Resolve role template if specified + if "role_template" in specialist: + template_id = specialist["role_template"] + template = self.config_loader.get_role_template(template_id) + + if template: + # Merge template with specialist config (specialist overrides template) + specialist = {**template, **specialist} + lib_logger.debug(f"[HiveMind] Resolved role template '{template_id}' for specialist {specialist_num}") + else: + lib_logger.warning(f"[HiveMind] Role template '{template_id}' not found for specialist {specialist_num}") + specialist_model = specialist.get("model") - specialist_role = specialist.get("role", f"Specialist {specialist_num}") + specialist_role = specialist.get("role", specialist.get("name", f"Specialist {specialist_num}")) specialist_prompt = specialist.get("system_prompt", "") specialist_weight = specialist.get("weight", 1.0) # MISSING FEATURE FIX: Extract weight description for arbiter context diff --git a/src/rotator_library/ensemble_configs/fusions/dev-team.json b/src/rotator_library/ensemble_configs/fusions/dev-team.json index df1a3e99..4acdd1e0 100644 --- a/src/rotator_library/ensemble_configs/fusions/dev-team.json +++ b/src/rotator_library/ensemble_configs/fusions/dev-team.json @@ -27,8 +27,8 @@ "arbiter": { "model": "gpt-4o", "strategy": "synthesis", - "blind": false, - "note": "Fusion mode typically uses blind=false to preserve role context" + "blind": true, + "note": "Fusion mode uses blind=true to hide model names while preserving roles" }, "recursive_mode": { "enabled": false, diff --git a/src/rotator_library/ensemble_configs/roles/architect.json b/src/rotator_library/ensemble_configs/roles/architect.json new file mode 100644 index 00000000..96207299 --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/architect.json @@ -0,0 +1,6 @@ +{ + "name": "Architect", + "system_prompt": "You are a Software Architect. Focus on architectural patterns, scalability, and system design. Consider:\n- System architecture and design patterns\n- Scalability and performance implications\n- Technology stack decisions\n- Component interactions and dependencies\n- Long-term maintainability", + "weight": 1.5, + "weight_description": "Expert in system design and scalability. Trust for architectural decisions and structural integrity." +} diff --git a/src/rotator_library/ensemble_configs/roles/code-reviewer.json b/src/rotator_library/ensemble_configs/roles/code-reviewer.json new file mode 100644 index 00000000..21655293 --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/code-reviewer.json @@ -0,0 +1,6 @@ +{ + "name": "Code Reviewer", + "system_prompt": "You are a Code Quality Expert. Focus on code quality, performance, and best practices. Consider:\n- Code readability and maintainability\n- Performance optimization opportunities\n- Best practices and design patterns\n- Error handling and edge cases\n- Testing and documentation", + "weight": 1.0, + "weight_description": "Expert in code quality and performance optimization. Trust for maintainability and efficiency concerns." +} diff --git a/src/rotator_library/ensemble_configs/roles/security-expert.json b/src/rotator_library/ensemble_configs/roles/security-expert.json new file mode 100644 index 00000000..405160dc --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/security-expert.json @@ -0,0 +1,6 @@ +{ + "name": "Security Expert", + "system_prompt": "You are a Security Expert. Focus on security vulnerabilities, edge cases, and potential exploits. Consider:\n- Security vulnerabilities and attack vectors\n- Input validation and sanitization\n- Authentication and authorization\n- Data protection and privacy\n- Security best practices and standards", + "weight": 1.2, + "weight_description": "Expert in security and vulnerability assessment. Trust for identifying security flaws and attack vectors." +} From 67940960783814983996f6dcd7e9794144d2f258 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 19:14:30 +0100 Subject: [PATCH 30/33] =?UTF-8?q?fix(config):=20=F0=9F=90=9B=20report=20co?= =?UTF-8?q?rrect=20swarm=20preset=20count=20in=20loader=20log?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Update ConfigLoader to count swarm presets by enumerating JSON files in the swarms directory and log that value instead of using the outdated self.swarm_configs length. This ensures the startup info reflects the actual number of preset files and avoids misleading logs when presets are used. --- src/rotator_library/ensemble/config_loader.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py index 65957f6b..0bef4a2b 100644 --- a/src/rotator_library/ensemble/config_loader.py +++ b/src/rotator_library/ensemble/config_loader.py @@ -56,8 +56,11 @@ def load_all(self) -> None: # Load role templates self._load_roles() + # Count swarm presets (files in swarms directory) + swarm_preset_count = len(list(self.swarms_dir.glob("*.json"))) if self.swarms_dir.exists() else 0 + lib_logger.info( - f"[HiveMind] Loaded {len(self.swarm_configs)} swarm configs, " + f"[HiveMind] Loaded {swarm_preset_count} swarm presets, " f"{len(self.fusion_configs)} fusion configs, " f"{len(self.strategies)} strategies, " f"{len(self.role_templates)} roles" From f8de42b68419685a81dae450b536f7e88b4231e1 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 20:03:24 +0100 Subject: [PATCH 31/33] =?UTF-8?q?refactor(ensemble):=20=F0=9F=94=A8=20stan?= =?UTF-8?q?dardize=20HiveMind=20ensemble=20initialization=20logs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - remove redundant initialization log in RotatingClient to avoid duplicate messages - update EnsembleManager logger message from "[HiveMind] EnsembleManager initialized" to "[HiveMind] Ensemble Manager initialized" for readability and consistency --- src/rotator_library/client.py | 1 - src/rotator_library/ensemble/manager.py | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/src/rotator_library/client.py b/src/rotator_library/client.py index 7834cce7..df8cfc7a 100644 --- a/src/rotator_library/client.py +++ b/src/rotator_library/client.py @@ -132,7 +132,6 @@ def __init__( # Initialize HiveMind ensemble manager self.ensemble_manager = EnsembleManager(rotating_client=self) - lib_logger.info("HiveMind ensemble manager initialized") def _is_model_ignored(self, provider: str, model_id: str) -> bool: """ diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 396e0dee..99f82312 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -62,7 +62,7 @@ def __init__(self, rotating_client, config_dir: Optional[str] = None): # Initialize provider models self._load_provider_models() - lib_logger.info("[HiveMind] EnsembleManager initialized") + lib_logger.info("[HiveMind] Ensemble Manager initialized") def is_ensemble(self, model_id: str) -> bool: """ From e03d42c30106b7ce330d01b504a39c917966c5c0 Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 22:03:15 +0100 Subject: [PATCH 32/33] =?UTF-8?q?feat(ensemble):=20=E2=9C=A8=20enable=20im?= =?UTF-8?q?plicit=20preset=20lookup=20for=20compact=20swarm=20IDs=20and=20?= =?UTF-8?q?filter=20".example"=20artifacts?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Introduce an omit_id_presets registry in ConfigLoader by scanning presets that declare "omit_id": true so short swarm names like "model[swarm]" can be resolved to a concrete preset. - Add get_preset_for_model(base_model) to encapsulate preset selection (prefer a registered omit_id mapping, otherwise fall back to "default") and emit debug diagnostics during resolution. - Update discovery routines for swarms, fusions, strategies, and roles to ignore files whose stem ends with ".example" to avoid exposing sample artifacts. - Change the discovery output to reflect omit_id behavior: emit the compact form "model[swarm]" for presets that omit their id, and the explicit "model-preset[swarm]" for others. Explicit identifiers remain valid at runtime. - Modify EnsembleManager to return both the resolved base model and the preset id from get_base_model, and use the resolved preset when loading swarm configs; add clearer logging around routing. - Add omit_id flags to example swarm configs and extra loader logging, including warnings when multiple presets claim the same base model. BREAKING CHANGE: EnsembleManager.get_base_model now returns a tuple (base_model, preset_id) instead of a single string. Update call sites to unpack the result and pass the preset id when fetching swarm configs. Migration example: - Before: base = get_base_model(swarm_id); config = get_swarm_config(base) - After: base, preset = get_base_model(swarm_id); config = get_swarm_config(preset) --- src/rotator_library/ensemble/config_loader.py | 92 ++++++++++++++++++- src/rotator_library/ensemble/manager.py | 28 +++--- .../ensemble_configs/swarms/default.json | 1 + .../ensemble_configs/swarms/test-gemini.json | 1 + 4 files changed, 105 insertions(+), 17 deletions(-) diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py index 0bef4a2b..5719dd70 100644 --- a/src/rotator_library/ensemble/config_loader.py +++ b/src/rotator_library/ensemble/config_loader.py @@ -37,6 +37,9 @@ def __init__(self, config_dir: str): self.strategies: Dict[str, str] = {} self.role_templates: Dict[str, Dict[str, Any]] = {} + # Track model -> preset mapping for omit_id presets + self.omit_id_presets: Dict[str, str] = {} # {"gpt-4o-mini": "aggressive"} + def load_all(self) -> None: """Load all configurations from the directory structure.""" lib_logger.info("[HiveMind] Loading ensemble configurations...") @@ -75,6 +78,7 @@ def _load_swarm_configs(self) -> None: """Load swarm configurations from swarms/ directory. Only supports preset-based format with 'id' and 'base_models'. + Also builds omit_id mapping for default preset resolution. """ if not self.swarms_dir.exists(): lib_logger.warning(f"[HiveMind] Swarms directory not found: {self.swarms_dir}") @@ -92,6 +96,34 @@ def _load_swarm_configs(self) -> None: else: lib_logger.warning("[HiveMind] No default swarm config found") + # Build omit_id mapping: scan all presets with omit_id=true + for config_file in self.swarms_dir.glob("*.json"): + # Skip example files + if config_file.stem.endswith('.example'): + continue + + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + preset_id = config.get("id") + omit_id = config.get("omit_id", False) + base_models = config.get("base_models", []) + + if preset_id and omit_id and base_models: + # Register this preset as the default for these models + for model in base_models: + if model in self.omit_id_presets: + lib_logger.warning( + f"[HiveMind] Model '{model}' already has omit_id preset '{self.omit_id_presets[model]}'. " + f"Overriding with '{preset_id}'" + ) + self.omit_id_presets[model] = preset_id + lib_logger.debug(f"[HiveMind] Registered '{model}[swarm]' -> preset '{preset_id}'") + + except Exception as e: + lib_logger.warning(f"Failed to process swarm config {config_file.name}: {e}") + # All swarm configs now use preset-based format (id + base_models) # Discovery is handled by get_all_swarm_model_ids() # Individual preset configs loaded on-demand via get_swarm_config() @@ -108,6 +140,10 @@ def _load_fusion_configs(self) -> None: return for config_file in self.fusions_dir.glob("*.json"): + # Skip example files + if config_file.stem.endswith('.example'): + continue + try: with open(config_file, 'r', encoding='utf-8') as f: config = json.load(f) @@ -157,6 +193,10 @@ def _load_strategies(self) -> None: return for strategy_file in self.strategies_dir.glob("*.txt"): + # Skip example files + if strategy_file.stem.endswith('.example'): + continue + try: with open(strategy_file, 'r', encoding='utf-8') as f: content = f.read() @@ -182,6 +222,10 @@ def _load_roles(self) -> None: return for role_file in self.roles_dir.glob("*.json"): + # Skip example files + if role_file.stem.endswith('.example'): + continue + try: with open(role_file, 'r', encoding='utf-8') as f: data = json.load(f) @@ -233,6 +277,28 @@ def _register_role(self, role: Dict[str, Any], source_file: str) -> None: self.role_templates[role_id] = role lib_logger.debug(f"[HiveMind] Loaded role template '{role_id}' from array") + def get_preset_for_model(self, base_model: str) -> str: + """ + Get the preset ID to use for a model when using model[swarm] syntax. + + Resolution order: + 1. If model has an omit_id preset, use that + 2. Otherwise, use "default" + + Args: + base_model: Base model name (e.g., "gpt-4o-mini") + + Returns: + Preset ID to use + """ + if base_model in self.omit_id_presets: + preset = self.omit_id_presets[base_model] + lib_logger.debug(f"[HiveMind] Model '{base_model}' using omit_id preset '{preset}'") + return preset + + lib_logger.debug(f"[HiveMind] Model '{base_model}' using default preset") + return "default" + def get_swarm_config(self, preset_id: str) -> Dict[str, Any]: """ Get swarm configuration for a specific preset. @@ -312,21 +378,31 @@ def get_all_swarm_model_ids(self) -> List[str]: """ Get all discoverable swarm model variants. - Generates model IDs from all swarm configs that define base_models. - Format: {base_model}-{preset_id}[swarm] + Only includes presets with base_models defined. + Discovery format depends on omit_id: + - omit_id=true: Shows as {base_model}[swarm] (short form only) + - omit_id=false: Shows as {base_model}-{preset_id}[swarm] (explicit form only) + + Note: Explicit form always WORKS at runtime regardless of omit_id, + but omit_id controls what appears in /v1/models for discoverability. Returns: - List of swarm model IDs + List of swarm model IDs for /v1/models endpoint """ swarm_models = [] for config_file in self.swarms_dir.glob("*.json"): + # Skip example files + if config_file.stem.endswith('.example'): + continue + try: with open(config_file, 'r', encoding='utf-8') as f: config = json.load(f) preset_id = config.get("id") base_models = config.get("base_models", []) + omit_id = config.get("omit_id", False) if not preset_id: lib_logger.debug(f"Swarm config {config_file.name} missing 'id', skipping") @@ -336,9 +412,15 @@ def get_all_swarm_model_ids(self) -> List[str]: lib_logger.debug(f"Swarm config {preset_id} has no base_models, not discoverable") continue - # Generate model IDs: {base_model}-{preset_id}[swarm] + # Generate model IDs based on omit_id setting for base_model in base_models: - model_id = f"{base_model}-{preset_id}[swarm]" + if omit_id: + # Show short form only (to avoid clutter) + model_id = f"{base_model}[swarm]" + else: + # Show explicit form only + model_id = f"{base_model}-{preset_id}[swarm]" + swarm_models.append(model_id) except Exception as e: diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py index 99f82312..58235786 100644 --- a/src/rotator_library/ensemble/manager.py +++ b/src/rotator_library/ensemble/manager.py @@ -107,17 +107,19 @@ def _is_swarm_request(self, model_id: str) -> bool: """ return model_id.endswith("[swarm]") - def get_base_model(self, swarm_id: str) -> str: + def get_base_model(self, swarm_id: str) -> tuple: """ - Extract base model name from swarm ID. + Extract base model name and preset ID from swarm ID. - Supports new format: {base_model}-{preset_id}[swarm] + Supports formats: + - {base_model}-{preset_id}[swarm] → (base_model, preset_id) + - {base_model}[swarm] → (base_model, omit_id preset or "default") Args: - swarm_id: Swarm model ID (e.g., "gpt-4o-default[swarm]") + swarm_id: Swarm model ID (e.g., "gpt-4o-default[swarm]" or "gpt-4o[swarm]") Returns: - Base model name (e.g., "gpt-4o") + Tuple of (base_model_name, preset_id) """ # Remove [swarm] suffix first if swarm_id.endswith("[swarm]"): @@ -131,14 +133,16 @@ def get_base_model(self, swarm_id: str) -> str: potential_preset = parts[1] # Check if it's a valid preset ID in our configs - # For now, just check if the directory has that config file config_file = self.config_loader.swarms_dir / f"{potential_preset}.json" if config_file.exists(): # This is a preset ID, so base_model is everything before it - return parts[0] + return parts[0], potential_preset - # If no preset found or no hyphen, treat entire thing as base_model - return swarm_id + # No explicit preset: use omit_id preset or default + base_model = swarm_id + preset_id = self.config_loader.get_preset_for_model(base_model) + + return base_model, preset_id def resolve_conflicts(self, ensemble_id: str) -> str: """ @@ -1312,8 +1316,8 @@ async def handle_request(self, request, **kwargs): return arbiter_response elif self._is_swarm_request(resolved_id): - base_model = self.get_base_model(resolved_id) - config = self.config_loader.get_swarm_config(base_model) + base_model, preset_id = self.get_base_model(resolved_id) + config = self.config_loader.get_swarm_config(preset_id) count = config.get("count", 3) is_streaming = kwargs.get("stream", False) @@ -1322,7 +1326,7 @@ async def handle_request(self, request, **kwargs): lib_logger.info( f"[HiveMind] Processing Swarm request: {resolved_id} " - f"(base: {base_model}, {count} drones, streaming: {is_streaming})" + f"(base: {base_model}, preset: {preset_id}, {count} drones, streaming: {is_streaming})" ) # Phase 3B: Route based on streaming mode diff --git a/src/rotator_library/ensemble_configs/swarms/default.json b/src/rotator_library/ensemble_configs/swarms/default.json index 009addbc..3d3dadb0 100644 --- a/src/rotator_library/ensemble_configs/swarms/default.json +++ b/src/rotator_library/ensemble_configs/swarms/default.json @@ -9,6 +9,7 @@ "gemini-1.5-pro", "gemini-1.5-flash" ], + "omit_id": false, "count": 3, "temperature_jitter": { diff --git a/src/rotator_library/ensemble_configs/swarms/test-gemini.json b/src/rotator_library/ensemble_configs/swarms/test-gemini.json index 91f8095b..4b8f60ef 100644 --- a/src/rotator_library/ensemble_configs/swarms/test-gemini.json +++ b/src/rotator_library/ensemble_configs/swarms/test-gemini.json @@ -2,6 +2,7 @@ "id": "test-gemini", "description": "Test swarm for Gemini 2.5 Flash", "base_models": ["gemini/gemini-2.5-flash"], + "omit_id": false, "count": 3, "arbiter": { "model": "self", From 9e6cbc04d8e6fca31a81815df4fd0b9e643ef99a Mon Sep 17 00:00:00 2001 From: Mirrowel <28632877+Mirrowel@users.noreply.github.com> Date: Wed, 19 Nov 2025 22:03:39 +0100 Subject: [PATCH 33/33] =?UTF-8?q?docs(ensemble):=20=F0=9F=93=9A=20add=20Hi?= =?UTF-8?q?veMind=20Ensemble=20documentation,=20presets,=20roles,=20strate?= =?UTF-8?q?gies=20and=20examples?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add comprehensive HiveMind Ensemble documentation across README, DOCUMENTATION.md, and docs/ (overview, swarm/fusion modes, arbitration strategies, recursive mode, usage tracking, architecture). - Document preset-based swarm configuration and discovery rules for /v1/models; clarify runtime vs discovery behavior. - Add ensemble_configs examples and samples: - fusions/fusion.example.json - roles/role.example.json - roles/roles-array.example.json - strategies/strategy.example.txt - swarms/preset.example.json - Update library README and ensemble_configs README to describe presets, role templates, strategy templates, and usage examples. - Documentation-only changes and sample config files; no runtime code changes or breaking API modifications. --- DOCUMENTATION.md | 147 ++++++++- README.md | 52 +++- docs/HiveMind_API.md | 112 +++++-- docs/HiveMind_User_Guide.md | 90 ++++-- src/rotator_library/README.md | 34 +++ .../ensemble_configs/README.md | 278 ++++++++++++++++-- .../fusions/fusion.example.json | 64 ++++ .../ensemble_configs/roles/role.example.json | 14 + .../roles/roles-array.example.json | 25 ++ .../strategies/strategy.example.txt | 39 +++ .../swarms/preset.example.json | 65 ++++ 11 files changed, 859 insertions(+), 61 deletions(-) create mode 100644 src/rotator_library/ensemble_configs/fusions/fusion.example.json create mode 100644 src/rotator_library/ensemble_configs/roles/role.example.json create mode 100644 src/rotator_library/ensemble_configs/roles/roles-array.example.json create mode 100644 src/rotator_library/ensemble_configs/strategies/strategy.example.txt create mode 100644 src/rotator_library/ensemble_configs/swarms/preset.example.json diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md index bd4c6c17..d21c3fce 100644 --- a/DOCUMENTATION.md +++ b/DOCUMENTATION.md @@ -10,7 +10,10 @@ The project is a monorepo containing two primary components: * **Batch Manager**: Optimizes high-volume embedding requests. * **Detailed Logger**: Provides per-request file logging for debugging. * **OpenAI-Compatible Endpoints**: `/v1/chat/completions`, `/v1/embeddings`, etc. -2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues. +2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues. It also includes: + * **HiveMind Ensemble Manager**: Orchestrates parallel model execution (Swarm and Fusion modes) with intelligent arbitration. + * **Key Management**: Advanced concurrency control and intelligent key selection. + * **Error Handling**: Escalating cooldowns and automatic recovery. This architecture cleanly separates the API interface from the resilience logic, making the library a portable and powerful tool for any application needing robust API key management. @@ -315,6 +318,148 @@ The `CooldownManager` handles IP or account-level rate limiting that affects all --- +## 2.10. HiveMind Ensemble (`ensemble/`) + +The **HiveMind Ensemble** system enables parallel model execution with intelligent arbitration, supporting two distinct modes: + +### 2.10.1. Swarm Mode + +**Purpose**: Execute the same model multiple times in parallel to generate diverse responses, then synthesize them into a single high-quality output. + +**Key Features**: +- **Temperature Jitter**: Randomly varies temperature across drones (±delta) to increase response diversity +- **Adversarial Mode**: Dedicates N drones as critical reviewers with adversarial prompts to stress-test solutions +- **Blind Switch**: Optionally hides model names from the arbiter to reduce synthesis bias +- **Self-Arbitration**: Can use the same model as arbiter to save costs + +**Configuration** (`ensemble_configs/swarms/*.json`): +- Folder-based preset system with model-specific overrides +- Default configuration applies to all swarms unless overridden +- Preset-based discovery: `{base_model}-{preset_id}[swarm]` format + +**Example Usage**: +```python +response = await client.acompletion( + model="gpt-4o-mini-default[swarm]", + messages=[{"role": "user", "content": "Explain AI"}] +) +# → 3 parallel calls to gpt-4o-mini with temperature jitter +# → Arbiter synthesizes responses into final answer +``` + +### 2.10.2. Fusion Mode + +**Purpose**: Combine responses from multiple specialized models with role-based routing and weighted synthesis. + +**Key Features**: +- **Role Assignment**: Each specialist model receives a custom system prompt defining its expertise +- **Weight Descriptions**: Guide arbiter on which specialist to trust for specific domains +- **Role Templates**: Reusable role definitions stored in `ensemble_configs/roles/` +- **Blind Mode**: Hides model names while preserving role labels +- **Multi-Provider Support**: Can mix models from different providers in a single fusion + +**Configuration** (`ensemble_configs/fusions/*.json`): +- Each fusion defined in its own JSON file or as an array in a single file +- Specialists can reference role templates via `role_template` field +- Supports `weight_description` for arbiter context + +**Example Configuration**: +```json +{ + "id": "dev-team", + "specialists": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt": "Focus on scalability and system design.", + "weight_description": "Expert in architecture. Trust for design decisions." + }, + { + "model": "claude-3-opus", + "role": "Security", + "role_template": "security-expert" + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true + } +} +``` + +### 2.10.3. Arbitration Strategies + +Strategies define how the arbiter synthesizes responses. Stored as plain text files in `ensemble_configs/strategies/*.txt` with `{responses}` placeholder. + +**Built-in Strategies**: +- **synthesis**: Combine best elements from all responses +- **best_of_n**: Select and refine the strongest response +- **code_review**: Code-specific evaluation criteria + +**Custom Strategies**: Users can add their own `.txt` files with custom synthesis prompts. + +### 2.10.4. Recursive Mode + +**Purpose**: Enable autonomous arbiter decision-making for low-consensus scenarios. + +**Mechanism**: +- Arbiter assesses consensus (1-10 scale) +- If consensus < threshold: arbiter performs internal critique reasoning +- If consensus >= threshold: proceeds directly to synthesis +- All internal reasoning wrapped in `[INTERNAL]` tags (filtered from user output) + +**Markers**: +- `[CONSENSUS: X/10]`: Logged at WARN level if below threshold +- `[CONFLICTS: ...]`: Identified disagreement points +- `[CRITIQUE: ...]`: Internal reasoning about conflicts +- `[FINAL SYNTHESIS:]`: Start of user-facing output + +### 2.10.5. Usage Tracking + +HiveMind responses include standard OpenAI-compatible usage fields **plus** supplementary `hivemind_details`: + +**Standard Fields** (aggregated totals from all models): +- `prompt_tokens`: Total prompt tokens (drones/specialists + arbiter) +- `completion_tokens`: Total completion tokens +- `total_tokens`: Grand total + +**Supplementary Breakdown** (`hivemind_details`): +```json +{ + "mode": "swarm" | "fusion", + "drone_count" | "specialist_count": 3, + "drone_tokens" | "specialist_tokens": 450, + "arbiter_tokens": 200, + "total_cost_usd": 0.00123, + "latency_ms": 1523.45 +} +``` + +**Important**: Consumers should use standard `usage` fields for billing/analytics. The `hivemind_details` provides debugging context. + +### 2.10.6. Architecture + +**Components**: +- **EnsembleManager** (`manager.py`): Orchestration engine + - Detects ensemble requests (`is_ensemble()`) + - Prepares drones/specialists (`_prepare_drones()`, `_prepare_fusion_models()`) + - Executes parallel calls (`_execute_parallel()`) + - Builds arbiter prompts (`_build_arbiter_prompt()`) + - Handles streaming (`_call_arbiter_streaming()`) + +- **ConfigLoader** (`config_loader.py`): Configuration management + - Loads swarm presets, fusions, strategies, and role templates + - Supports both single-item and array-based file formats + - Validates and merges configurations + +**Integration**: +- Initialized in `RotatingClient.__init__()` +- Intercepts requests in `acompletion()` before normal routing +- Inherits all retry/resilience logic from RotatingClient + +--- + ## 3. Provider Specific Implementations The library handles provider idiosyncrasies through specialized "Provider" classes in `src/rotator_library/providers/`. diff --git a/README.md b/README.md index 72736a49..1a2fc493 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,11 @@ This project provides a powerful solution for developers building complex applic ## Features - **Universal API Endpoint**: Simplifies development by providing a single, OpenAI-compatible interface for diverse LLM providers. +- **HiveMind Ensemble**: Parallel model execution with intelligent arbitration in two modes: + - **Swarm Mode**: Run multiple copies of the same model with temperature jitter, adversarial critique, and consensus-based synthesis + - **Fusion Mode**: Combine responses from different specialized models with role-based routing and weighted synthesis + - **Recursive Refinement**: Autonomous arbiter decision-making for low-consensus scenarios with internal critique reasoning + - **Streaming Support**: Full streaming support with real-time arbiter synthesis - **High Availability**: The underlying library ensures your application remains operational by gracefully handling transient provider errors and API key-specific issues. - **Resilient Performance**: A global timeout on all requests prevents your application from hanging on unresponsive provider APIs. - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_`), it can also support multiple concurrent requests to the *same* model using the same key. @@ -340,11 +345,56 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \ }' ``` +### HiveMind Ensemble - Parallel Model Execution + +HiveMind enables you to run multiple models in parallel with intelligent arbitration. Use the `[swarm]` suffix or pre-configured fusion IDs. + +**Swarm Mode** (same model, multiple executions): +```bash +# Explicit preset format +curl -X POST http://127.0.0.1:8000/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer a-very-secret-and-unique-key" \ +-d '{ + "model": "gpt-4o-mini-aggressive[swarm]", + "messages": [{"role": "user", "content": "Explain quantum computing"}] +}' + +# Short format (requires omit_id: true in preset) +curl -X POST http://127.0.0.1:8000/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer a-very-secret-and-unique-key" \ +-d '{ + "model": "gpt-4o-mini[swarm]", + "messages": [{"role": "user", "content": "Explain quantum computing"}] +}' +``` + +**Fusion Mode** (multiple specialist models): +```bash +curl -X POST http://127.0.0.1:8000/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer a-very-secret-and-unique-key" \ +-d '{ + "model": "dev-team[fusion]", + "messages": [{"role": "user", "content": "Review this API design"}] +}' +``` + +HiveMind automatically: +- Executes models in parallel +- Applies temperature jitter for diversity (Swarm mode) +- Routes to specialized models with role prompts (Fusion mode) +- Synthesizes responses using an arbiter model +- Aggregates usage and cost across all calls + +For detailed configuration and advanced features, see the [HiveMind User Guide](docs/HiveMind_User_Guide.md). + ### Available API Endpoints - `POST /v1/chat/completions`: The main endpoint for making chat requests. - `POST /v1/embeddings`: The endpoint for creating embeddings. -- `GET /v1/models`: Returns a list of all available models from your configured providers. +- `GET /v1/models`: Returns a list of all available models from your configured providers (includes HiveMind fusions and swarms). - `GET /v1/providers`: Returns a list of all configured providers. - `POST /v1/token-count`: Calculates the token count for a given message payload. diff --git a/docs/HiveMind_API.md b/docs/HiveMind_API.md index 756a3a80..0ada7c31 100644 --- a/docs/HiveMind_API.md +++ b/docs/HiveMind_API.md @@ -36,19 +36,25 @@ manager.is_ensemble("dev-team") # True manager.is_ensemble("gpt-4o") # False ``` -### `get_base_model(swarm_id: str) -> str` +### `get_base_model(swarm_id: str) -> tuple` -Extract base model name from swarm ID. +Extract base model name and preset ID from swarm ID. **Parameters:** -- `swarm_id` (str): Swarm model ID (e.g., "gemini-1.5-flash[swarm]") +- `swarm_id` (str): Swarm model ID (e.g., "gpt-4o-aggressive[swarm]", "gpt-4o[swarm]") **Returns:** -- `str`: Base model name (e.g., "gemini-1.5-flash") +- `tuple`: (base_model_name, preset_id) + - For `"gpt-4o-aggressive[swarm]"` returns `("gpt-4o", "aggressive")` + - For `"gpt-4o[swarm]"` returns `("gpt-4o", "default")` or omit_id preset **Example:** ```python -base = manager.get_base_model("gpt-4o[swarm]") # "gpt-4o" +base, preset = manager.get_base_model("gpt-4o-aggressive[swarm]") +# base = "gpt-4o", preset = "aggressive" + +base, preset = manager.get_base_model("gpt-4o[swarm]") +# base = "gpt-4o", preset = "default" or omit_id preset for gpt-4o ``` ### `get_fusion_ids() -> List[str]` @@ -106,15 +112,34 @@ Load all configurations from directory structure. **Side Effects:** - Populates `swarm_default`, `swarm_configs`, `fusion_configs`, `strategies` -### `get_swarm_config(model: str) -> Dict[str, Any]` +### `get_swarm_config(preset_id: str) -> Dict[str, Any]` + +Get swarm configuration for a specific preset. + +**Parameters:** +- `preset_id` (str): Preset ID (e.g., "default", "aggressive") + +**Returns:** +- `Dict[str, Any]`: Preset configuration + +### `get_preset_for_model(base_model: str) -> str` -Get swarm configuration for a specific model. +Get the preset ID to use when calling `model[swarm]` (short form). **Parameters:** -- `model` (str): Base model name (without [swarm] suffix) +- `base_model` (str): Base model name (e.g., "gpt-4o-mini") **Returns:** -- `Dict[str, Any]`: Merged configuration (default + model-specific) +- `str`: Preset ID (omit_id preset for this model, or "default") + +**Example:** +```python +# If aggressive.json has omit_id=true and base_models=["gpt-4o-mini"] +preset = loader.get_preset_for_model("gpt-4o-mini") # "aggressive" + +# For models without omit_id preset +preset = loader.get_preset_for_model("claude-3-haiku") # "default" +``` ### `get_fusion_config(fusion_id: str) -> Optional[Dict[str, Any]]` @@ -138,11 +163,37 @@ Get strategy template by name. ### `get_all_fusion_ids() -> List[str]` -Get list of all fusion IDs. +Get list of all fusion IDs with [fusion] suffix. **Returns:** - `List[str]`: List of fusion identifiers +### `get_all_swarm_model_ids() -> List[str]` + +Get all discoverable swarm model variants for /v1/models endpoint. + +**Discovery Rules:** +- Preset WITH `base_models` + `omit_id: true` → `{model}[swarm]` +- Preset WITH `base_models` + `omit_id: false` → `{model}-{preset}[swarm]` +- Preset WITHOUT `base_models` → Not included (invisible +) + +**Returns:** +- `List[str]`: List of swarm model IDs for discovery + +**Example:** +```python +# With aggressive.json: {"omit_id": true, "base_models": ["gpt-4o-mini"]} +# With default.json: {"omit_id": false, "base_models": ["gpt-4o", "claude-3-haiku"]} + +swarm_ids = loader.get_all_swarm_model_ids() +# [ +# "gpt-4o-mini[swarm]", # From aggressive (omit_id=true) +# "gpt-4o-default[swarm]", # From default (omit_id=false) +# "claude-3-haiku-default[swarm]" # From default (omit_id=false) +# ] +``` + --- ## Response Object @@ -216,24 +267,32 @@ print(f"Latency: {details['latency_ms']}ms") # 1523.45 ### Swarm Configuration -**File Location:** `ensemble_configs/swarms/*.json` +**File Location:** `ensemble_configs/swarms/{preset_id}.json` + +**Preset-Based System**: Each swarm preset defines behavior for multiple models via `base_models`. **Schema:** ```json { - "model": "string (optional, only for model-specific configs)", - "suffix": "string (default: '[swarm]')", - "count": "integer (default: 3)", + "id": "string (REQUIRED, preset identifier, must match filename)", + "description": "string (optional)", + + "base_models": [ + "string (model IDs for /v1/models discovery)" + ], + + "omit_id": "boolean (default: false, controls discovery format)", + "count": "integer (default: 3, number of drones)", "temperature_jitter": { "enabled": "boolean", - "delta": "float (temperature variance)" + "delta": "float (temperature variance, ±delta)" }, "arbiter": { "model": "string ('self' or model ID)", - "strategy": "string (strategy name)", - "blind": "boolean (default: true)" + "strategy": "string (strategy template name)", + "blind": "boolean (default: true, hides model names)" }, "adversarial_config": { @@ -249,6 +308,15 @@ print(f"Latency: {details['latency_ms']}ms") # 1523.45 } ``` +**Key Fields:** +- `id`: Preset identifier, used in `{model}-{id}[swarm]` format +- `base_models`: OPTIONAL. Controls /v1/models discovery only. Does NOT restrict runtime usage. +- `omit_id`: OPTIONAL. If `true`, shows as `{model}[swarm]` in /v1/models (hides explicit format to reduce clutter) + +**Discovery vs Runtime:** +- **Discovery**: `base_models` and `omit_id` control what appears in /v1/models +- **Runtime**: Explicit format `{model}-{preset}[swarm]` works with ANY model/preset combo + ### Fusion Configuration **File Location:** `ensemble_configs/fusions/*.json` @@ -261,10 +329,12 @@ print(f"Latency: {details['latency_ms']}ms") # 1523.45 "specialists": [ { - " model": "string (model ID)", - "role": "string (specialist role name)", - "system_prompt": "string (role-specific instructions)", - "weight": "float (importance weight, default: 1.0)" + "model": "string (model ID)", + "role": "string (optional, specialist role name)", + "system_prompt": "string (optional, role-specific instructions)", + "weight": "float (optional, importance weight, default: 1.0)", + "weight_description": "string (optional, expertise description for arbiter)", + "role_template": "string (optional, reference to role template from roles/ directory)" } ], diff --git a/docs/HiveMind_User_Guide.md b/docs/HiveMind_User_Guide.md index 32182f16..3308c34a 100644 --- a/docs/HiveMind_User_Guide.md +++ b/docs/HiveMind_User_Guide.md @@ -22,9 +22,16 @@ from rotator_library.client import RotatingClient client = RotatingClient() -# Basic swarm - adds `[swarm]` suffix to any model +# Short form - uses preset with omit_id=true or default preset response = await client.acompletion( - model="gpt-4o-mini[swarm]", # 3 drones by default + model="gpt-4o-mini[swarm]", + messages=[{"role": "user", "content": "What is quantum computing?"}], + stream=False +) + +# Explicit preset format - works with ANY model + ANY preset +response = await client.acompletion( + model="claude-3-haiku-aggressive[swarm]", # Use 'aggressive' preset messages=[{"role": "user", "content": "What is quantum computing?"}], stream=False ) @@ -62,20 +69,50 @@ print(f"Specialists: {response.usage.hivemind_details['specialist_count']}") 3. **Arbitration**: An arbiter model synthesizes all responses 4. **Result**: Returns the arbiter's synthesis +### Preset-Based System + +Swarms use a **preset-based configuration** system. Each preset is a JSON file in `ensemble_configs/swarms/` that defines behavior for multiple models. + +**Model Name Formats**: +- **Short form**: `{model}[swarm]` → uses preset with `omit_id: true` OR `default` preset +- **Explicit form**: `{model}-{preset}[swarm]` → always uses specified preset + +**Examples**: +```python +# Short form +await client.acompletion(model="gpt-4o-mini[swarm]", ...) # Uses omit_id preset or default + +# Explicit form +await client.acompletion(model="gpt-4o-mini-aggressive[swarm]", ...) # Uses aggressive preset +await client.acompletion(model="claude-3-haiku-default[swarm]", ...) # Explicit default +``` + +**Key Features**: +- **`base_models`**: Controls /v1/models discovery (which models appear for this preset) +- **`omit_id`**: Controls discovery format (short vs explicit in /v1/models) +- **Runtime**: Explicit format works with ANY model/preset combo regardless of base_models + ### Configuration -Swarm behavior is configured in `src/rotator_library/ensemble_configs/swarms/`: +Swarm presets in `src/rotator_library/ensemble_configs/swarms/`: -**`default.json`** - Global swarm settings: +**`default.json`** - Global fallback: ```json { - "suffix": "[swarm]", + "id": "default", + "description": "Standard balanced settings", + "base_models": [ + "gpt-4o", "gpt-4o-mini", + "claude-3-5-sonnet", "claude-3-haiku", + "gemini-1.5-pro", "gemini-1.5-flash" + ], + "omit_id": false, "count": 3, "temperature_jitter": { "enabled": true, "delta": 0.2 }, - "arb iter": { + "arbiter": { "model": "self", "strategy": "synthesis", "blind": true @@ -92,13 +129,20 @@ Swarm behavior is configured in `src/rotator_library/ensemble_configs/swarms/`: } ``` -**Model-specific configs** (e.g., `gemini-flash.json`): +**Custom preset** (e.g., `aggressive.json`): ```json { - "model": "gemini-1.5-flash", - "arbiter": { - "model": "gpt-4o", - "strategy": "synthesis" + "id": "aggressive", + "base_models": ["gpt-4o-mini", "gemini-1.5-flash"], + "omit_id": true, // Shows as model[swarm] in /v1/models + "count": 5, + "temperature_jitter": { + "enabled": true, + "delta": 0.3 + }, + "adversarial_config": { + "enabled": true, + "count": 2 } } ``` @@ -120,37 +164,49 @@ Each drone gets a slightly different temperature: `base_temp ± delta` #### Adversarial Mode -Adds critical drones to stress-test solutions: +Converts the last N drones to critical reviewers: ```json "adversarial_config": { "enabled": true, "count": 1, - "prompt": "You are a Senior Principal Engineer with 15+ years of experience. Your job is to find flaws, edge cases, and potential issues." + "prompt": "You are a Senior Principal Engineer. Find flaws, edge cases, and potential issues." } ``` #### Blind Switch -Removes model names from arbiter input (enabled by default): +Hides model names from arbiter (enabled by default): ```json "arbiter": { - "blind": true // Arbiter sees "Response 1", not "Response 1 (GPT-4o)" + "blind": true // Arbiter sees "Response 1" instead of "Response 1 (GPT-4o)" } ``` #### Recursive Mode -Enables autonomous arbiter decision-making for low-consensus scenarios: +Enables autonomous arbiter critique for low-consensus responses: ```json "recursive_mode": { "enabled": true, - "consensus_threshold": 7 // If consensus < 7/10, arbiter performs internal critique + "consensus_threshold": 7 // If consensus < 7/10, performs internal critique } ``` +#### Discovery vs Runtime + +**Discovery (/ v1/models endpoint)**: +- Preset WITH `base_models` + `omit_id: true` → `{model}[swarm]` +- Preset WITH `base_models` + `omit_id: false` → `{model}-{preset}[swarm]` +- Preset WITHOUT `base_models` → Not shown (invisible) + +**Runtime (actual API calls)**: +- Short form `model[swarm]` → Uses omit_id preset OR default +- Explicit form `model-preset[swarm]` → ALWAYS works with ANY model/preset combo +- `base_models` has NO runtime restrictions + --- ## Fusion Mode diff --git a/src/rotator_library/README.md b/src/rotator_library/README.md index c0207999..08ef6055 100644 --- a/src/rotator_library/README.md +++ b/src/rotator_library/README.md @@ -4,6 +4,13 @@ A robust, asynchronous, and thread-safe Python library for managing a pool of AP ## Key Features +- **HiveMind Ensemble**: Parallel model execution with intelligent arbitration + - **Swarm Mode**: Execute the same model multiple times with temperature jitter, adversarial prompts, and synthesis + - **Fusion Mode**: Combine responses from different specialized models with role-based routing + - **Recursive Refinement**: Autonomous low-consensus handling with internal critique reasoning + - **Configurable Strategies**: Customizable arbitration strategies for different use cases + - **Role Templates**: Reusable specialist role definitions for consistent fusion configurations + - **Blind Mode**: Option to hide model names from arbiter to reduce bias - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O. - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_`), it can also support multiple concurrent requests to the *same* model using the same key. - **Smart Key Management**: Selects the optimal key for each request using a tiered, model-aware locking strategy to distribute load evenly and maximize availability. @@ -136,6 +143,33 @@ async def stream_example(): asyncio.run(stream_example()) ``` +**HiveMind Ensemble Example:** + +```python +async def hivemind_example(): + async with RotatingClient(api_keys=api_keys) as client: + # Swarm Mode: Multiple parallel calls to same model + swarm_response = await client.acompletion( + model="gpt-4o-mini-default[swarm]", + messages=[{"role": "user", "content": "Explain quantum computing"}] + ) + print(swarm_response.choices[0].message.content) + print(f"Total tokens: {swarm_response.usage.total_tokens}") + print(f"Drones: {swarm_response.usage.hivemind_details['drone_count']}") + + # Fusion Mode: Multiple specialist models + fusion_response = await client.acompletion( + model="dev-team[fusion]", + messages=[{"role": "user", "content": "Review this API design"}] + ) + print(fusion_response.choices[0].message.content) + print(f"Specialists: {fusion_response.usage.hivemind_details['specialist_count']}") + +asyncio.run(hivemind_example()) +``` + +See the [HiveMind User Guide](../../docs/HiveMind_User_Guide.md) and [API Reference](../../docs/HiveMind_API.md) for detailed configuration options. + #### `async def aembedding(self, **kwargs) -> Any:` A wrapper around `litellm.aembedding` that provides the same key management and retry logic for embedding requests. diff --git a/src/rotator_library/ensemble_configs/README.md b/src/rotator_library/ensemble_configs/README.md index 0fece40b..6a90d526 100644 --- a/src/rotator_library/ensemble_configs/README.md +++ b/src/rotator_library/ensemble_configs/README.md @@ -6,51 +6,287 @@ This directory contains the configuration for HiveMind Ensemble (Swarm/Fusion) f ``` ensemble_configs/ -├── swarms/ # Swarm configurations -│ ├── default.json # Default swarm settings (applied to all swarms) -│ └── *.json # Model-specific swarm overrides -├── fusions/ # Fusion configurations -│ └── *.json # Individual fusion definitions -└── strategies/ # Arbitration strategy templates - └── *.txt # Strategy prompt templates +├── swarms/ # Swarm preset configurations +│ ├── default.json # Default global settings (fallback) +│ └── *.json # Preset configurations (e.g., aggressive.json, balanced.json) +├── fusions/ # Fusion configurations (multi-model teams) +│ └── *.json # Individual fusion definitions or arrays of fusions +├── strategies/ # Arbitration strategy templates +│ └── *.txt # Strategy prompt templates with {responses} placeholder +└── roles/ # Reusable role template definitions + └── *.json # Role templates for fusion specialists ``` ## Configuration Files -### Swarm Configuration +### Swarm Configuration (Preset-Based) -**Default**: `swarms/default.json` - Applied to all swarm requests +HiveMind uses a **preset-based system** for swarm configurations. Each preset defines a configuration that can be applied to multiple base models. -**Model-Specific**: `swarms/{model-name}.json` - Overrides for specific models +**Format Options**: +- Explicit: `{base_model}-{preset_id}[swarm]` +- Short (if `omit_id: true`): `{base_model}[swarm]` -Example model-specific config: +**Example**: +- `gpt-4o-mini-aggressive[swarm]` - explicitly uses the `aggressive.json` preset +- `gpt-4o-mini[swarm]` - uses `default.json` preset OR a custom preset with `omit_id: true` +- `gpt-4o-mini-default[swarm]` - always uses `default.json` even if omit_id preset exists + +**Preset File Structure** (`swarms/{preset_id}.json`): ```json { - "model": "gemini-1.5-flash", + "id": "aggressive", + "description": "High diversity swarm with adversarial critique", + "base_models": ["gpt-4o-mini", "gemini-1.5-flash", "claude-3-haiku"], + "count": 5, + "temperature_jitter": { + "enabled": true, + "delta": 0.3 + }, + "adversarial_config": { + "enabled": true, + "count": 2, + "prompt": "You are a critical reviewer. Find flaws and edge cases." + }, "arbiter": { - "model": "gpt-4o", + "model": "self", "strategy": "synthesis", "blind": true + }, + "recursive_mode": { + "enabled": true, + "consensus_threshold": 6 } } ``` -### Fusion Configuration +**Key Fields**: +- `id`: Preset identifier (must match filename) +- `base_models`: List of models this preset applies to (enables discovery) +- `omit_id` (optional): If `true`, this preset becomes the default for its `base_models` when using `{model}[swarm]` syntax +- `count`: Number of drones to spawn +- `temperature_jitter`: Randomize temperature for diversity +- `adversarial_config`: Enable critical analysis drones +- `arbiter`: Synthesis configuration +- `recursive_mode`: Autonomous low-consensus handling + +**Omit ID Feature**: When a preset has `"omit_id": true`, it becomes the default for its specified models: +- `gpt-4o-mini[swarm]` → uses the `omit_id` preset instead of `default.json` +- `gpt-4o-mini-default[swarm]` → always uses `default.json` (explicit fallback) +- `gpt-4o-mini-aggressive[swarm]` → always uses `aggressive.json` (explicit) + +**Important**: `omit_id` controls ONLY what appears in `/v1/models` for discoverability, not what works at runtime: +- Explicit format (`model-preset[swarm]`) always works regardless of `omit_id` or `base_models` +- You can use ANY model with ANY preset explicitly (e.g., `claude-3-opus-aggressive[swarm]` works even if Claude isn't in aggressive's base_models) + +**Discovery Rules** (`/v1/models` endpoint): +- Preset WITH `base_models` + `omit_id: true` → Shows as `{model}[swarm]` only (explicit form hidden to avoid clutter) +- Preset WITH `base_models` + `omit_id: false` → Shows as `{model}-{preset}[swarm]` only +- Preset WITHOUT `base_models` → Never shown (invisible preset, but still usable with explicit syntax) -Each fusion is defined in its own file: `fusions/{fusion-id}.json` +**`base_models` Purpose**: +- Controls ONLY which models appear in `/v1/models` for this preset +- Does NOT restrict runtime usage - any model can use any preset with explicit syntax +- If empty/missing, preset is "invisible" but fully functional when explicitly referenced -See `dev-team.json` for a complete example. +### Fusion Configuration (Multi-Model Teams) + +Fusions combine responses from different specialized models. Each fusion can have role-based routing and specialist expertise. + +**Single Fusion Format** (`fusions/{fusion-id}.json`): +```json +{ + "id": "dev-team", + "description": "Software development team with specialized roles", + "specialists": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt": "Focus on scalability and system design.", + "weight": 1.5, + "weight_description": "Expert in architecture. Trust for design decisions." + }, + { + "model": "claude-3-opus", + "role_template": "security-expert" + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "code_review", + "blind": false + }, + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} +``` + +**Array Format** (multiple fusions in one file): +```json +{ + "fusions": [ + { + "id": "dev-team", + "specialists": [...] + }, + { + "id": "creative-writers", + "specialists": [...] + } + ] +} +``` + +**Specialist Fields**: +- `model`: Provider/model ID +- `role`: Display name for this specialist +- `system_prompt`: Role-specific instructions sent to the model +- `weight`: Numeric importance (for future use) +- `weight_description`: Expertise description for arbiter context +- `role_template`: Reference to a reusable role template (see Roles section) + +**Arbiter Configuration**: +- `model`: Model ID for synthesis (or "self" to use first specialist) +- `strategy`: Strategy template name (from `strategies/` directory) +- `blind`: If `true`, hides model names from arbiter (preserves roles) + +### Role Templates (Reusable Configurations) + +Role templates allow you to define reusable specialist configurations that can be referenced by multiple fusions. + +**Single Role Format** (`roles/{role-id}.json`): +```json +{ + "name": "Security Expert", + "system_prompt": "You are a cybersecurity expert. Focus on vulnerabilities, edge cases, and threat modeling.", + "weight": 1.2, + "weight_description": "Expert in security and vulnerability assessment. Trust for security concerns." +} +``` + +**Array Format** (multiple roles in one file): +```json +{ + "roles": [ + { + "name": "Architect", + "system_prompt": "Focus on system design and scalability.", + "weight_description": "Expert in architectural patterns." + }, + { + "name": "Security Expert", + "system_prompt": "Focus on vulnerabilities and threats.", + "weight_description": "Expert in security assessment." + } + ] +} +``` + +**Usage in Fusions**: +```json +{ + "specialists": [ + { + "model": "claude-3-opus", + "role_template": "security-expert" + } + ] +} +``` + +**Override Behavior**: Specialist configs can override any field from the referenced template. ### Strategy Templates -Each strategy is a text file in `strategies/{strategy-name}.txt` +Each strategy is a plain text file defining how the arbiter should synthesize responses. + +**File Location**: `strategies/{strategy-name}.txt` -Use `{responses}` placeholder for injecting formatted responses. +**Placeholder**: Use `{responses}` where formatted responses should be injected. + +**Example** (`strategies/synthesis.txt`): +``` +You are an expert synthesizer. Analyze the following responses and create a single, superior answer that: +1. Combines the best elements from each response +2. Resolves any conflicts or contradictions +3. Ensures completeness and accuracy +4. Maintains coherence and clarity + +{responses} + +Provide your synthesis as a complete, high-quality response. +``` ## Adding New Configurations -1. **New Swarm Override**: Drop a JSON file in `swarms/` with model-specific settings -2. **New Fusion**: Drop a JSON file in `fusions/` with fusion definition -3. **New Strategy**: Drop a .txt file in `strategies/` with prompt template +1. **New Swarm Preset**: Create `{preset_id}.json` in `swarms/` with `id` and `base_models` fields +2. **New Fusion**: Create `{fusion_id}.json` in `fusions/` OR add to an existing array file +3. **New Strategy**: Create `{strategy_name}.txt` in `strategies/` +4. **New Role Template**: Create `{role_id}.json` in `roles/` OR add to an existing array file All configs are loaded automatically on startup! + +## Advanced Features + +### Temperature Jitter (Swarm) +Randomizes temperature across drones to increase response diversity: +```json +"temperature_jitter": { + "enabled": true, + "delta": 0.2 +} +``` +Each drone gets `base_temp ± delta` (clamped to [0.0, 2.0]). + +### Adversarial Mode (Swarm) +Dedicates N drones as critical reviewers: +```json +"adversarial_config": { + "enabled": true, + "count": 1, + "prompt": "You are a Senior Principal Engineer. Find flaws and edge cases." +} +``` +Last N drones receive the adversarial prompt. Responses are marked `[ADVERSARIAL]` in arbiter input. + +### Recursive Mode (Swarm & Fusion) +Enables autonomous arbiter decision-making: +```json +"recursive_mode": { + "enabled": true, + "consensus_threshold": 7 +} +``` +If consensus < threshold, arbiter performs internal critique before synthesis. All internal reasoning is logged but hidden from user. + +### Blind Switch +Controls whether model names are shown to arbiter: +```json +"arbiter": { + "blind": true +} +``` +- `true`: "Response 1 (Architect role)" (hides model names) +- `false`: "Response 1 (GPT-4o - Architect)" (shows models) + +Roles are **always preserved** regardless of blind setting. + +## Usage Examples + +**Swarm Request**: +```bash +curl -X POST http://localhost:8000/v1/chat/completions \ +-d '{"model": "gpt-4o-mini-aggressive[swarm]", "messages": [...]}' +``` + +**Fusion Request**: +```bash +curl -X POST http://localhost:8000/v1/chat/completions \ +-d '{"model": "dev-team[fusion]", "messages": [...]}' +``` + +For detailed usage and API reference, see: +- [HiveMind User Guide](../../../docs/HiveMind_User_Guide.md) +- [HiveMind API Reference](../../../docs/HiveMind_API.md) diff --git a/src/rotator_library/ensemble_configs/fusions/fusion.example.json b/src/rotator_library/ensemble_configs/fusions/fusion.example.json new file mode 100644 index 00000000..556b0cf6 --- /dev/null +++ b/src/rotator_library/ensemble_configs/fusions/fusion.example.json @@ -0,0 +1,64 @@ +{ + "id": "dev-team", + "description": "Software development team with specialized roles and expertise", + + "_FIELD_DOCUMENTATION": "=== FUSION CONFIGURATION ===", + "_id": "REQUIRED. Fusion identifier. Used in model name as: {id}[fusion]", + "_description": "OPTIONAL. Human-readable description of this fusion's purpose.", + + "_specialists": "REQUIRED. Array of specialist model configurations. Each specialist processes the same query with a specialized role/perspective.", + "specialists": [ + { + "_model": "REQUIRED. Provider/model ID (e.g., 'gpt-4o', 'anthropic/claude-3-5-sonnet', 'gemini/gemini-1.5-pro')", + "model": "gpt-4o", + + "_role": "OPTIONAL. Display name for this specialist. Used in arbiter input as 'Role: {response}'. Default: 'Specialist {index}'", + "role": "Architect", + + "_system_prompt": "OPTIONAL. Role-specific instructions injected as system message. Defines this specialist's perspective/expertise.", + "system_prompt": "You are a Software Architect with deep expertise in system design, scalability, and architectural patterns. Focus on:\n- System design and component architecture\n- Scalability and performance considerations\n- Design patterns and best practices\n- Technology stack decisions\n- Long-term maintainability\n\nProvide architectural guidance and recommendations.", + + "_weight": "OPTIONAL (default: 1.0). Numeric importance for future weighted synthesis. Currently used for metadata only.", + "weight": 1.5, + + "_weight_description": "OPTIONAL. Natural language description of this specialist's expertise. Injected into arbiter context to guide synthesis.", + "weight_description": "Expert in architecture and scalability. Trust for design decisions, system architecture, and performance considerations.", + + "_role_template": "OPTIONAL. Reference to reusable role template from roles/ directory. Template fields are merged (specialist config overrides template). Cannot be used together with explicit role/system_prompt.", + "role_template": null + }, + { + "model": "claude-3-5-sonnet", + + "_role_template_usage": "Example of using a role template instead of inline configuration", + "role_template": "security-expert", + + "_note": "When using role_template, you can still override fields like model, weight, etc. The template provides role, system_prompt, weight_description as defaults." + }, + { + "model": "gemini/gemini-1.5-pro", + "role": "Code Reviewer", + "system_prompt": "You are a Senior Code Reviewer focused on code quality, maintainability, and best practices. Analyze:\n- Code clarity and readability\n- Error handling and edge cases\n- Testing strategy and coverage\n- Documentation and comments\n- DRY, SOLID, and other principles\n\nProvide actionable code review feedback.", + "weight": 1.2, + "weight_description": "Expert in code quality and maintainability. Trust for code review, testing, and best practices." + } + ], + + "_arbiter": "REQUIRED. Configuration for the model that synthesizes specialist responses.", + "arbiter": { + "_model": "'self' uses first specialist model. Or specify explicit model. Should be reasoning-capable for complex synthesis.", + "model": "gpt-4o", + + "_strategy": "Strategy template name (from strategies/ directory). Default: 'synthesis'. Try 'code_review' for development tasks.", + "strategy": "synthesis", + + "_blind": "If true, hides model names from arbiter (shows roles only). If false, shows both role and model. Default: false for fusions.", + "blind": false + }, + + "_recursive_mode": "OPTIONAL. Same as swarm recursive mode. Enables autonomous critique for low-consensus scenarios.", + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} diff --git a/src/rotator_library/ensemble_configs/roles/role.example.json b/src/rotator_library/ensemble_configs/roles/role.example.json new file mode 100644 index 00000000..ae645e84 --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/role.example.json @@ -0,0 +1,14 @@ +{ + "_FIELD_DOCUMENTATION": "=== ROLE TEMPLATE (Single Format) ===", + "_name": "REQUIRED. Display name for this role. Converted to role_id (lowercase, hyphens). Used as: role_template: 'security-expert'", + "name": "Security Expert", + + "_system_prompt": "OPTIONAL. Default system prompt for this role. Can be overridden by specialist config.", + "system_prompt": "You are a cybersecurity expert with deep knowledge of secure coding practices, threat modeling, and vulnerability assessment. Focus on:\n- Security vulnerabilities and exploits\n- Authentication and authorization flaws\n- Data privacy and protection\n- Input validation and sanitization\n- Cryptography and secure communication\n- OWASP Top 10 and common attack vectors\n\nProvide security-focused analysis and recommendations.", + + "_weight": "OPTIONAL (default: 1.0). Default weight for this role. Can be overridden by specialist config.", + "weight": 1.2, + + "_weight_description": "OPTIONAL. Default expertise description. Can be overridden by specialist config.", + "weight_description": "Expert in security and vulnerability assessment. Trust for security concerns, threat modeling, and secure coding practices." +} diff --git a/src/rotator_library/ensemble_configs/roles/roles-array.example.json b/src/rotator_library/ensemble_configs/roles/roles-array.example.json new file mode 100644 index 00000000..45cfaaf7 --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/roles-array.example.json @@ -0,0 +1,25 @@ +{ + "_FIELD_DOCUMENTATION": "=== ROLE TEMPLATE (Array Format) ===", + "_roles": "Array of role template definitions. Each role can be referenced independently by its converted name.", + "roles": [ + { + "_name": "Converted to role_id (e.g., 'Performance Engineer' → 'performance-engineer')", + "name": "Performance Engineer", + "system_prompt": "You are a performance engineering specialist. Focus on optimization, profiling, and scalability.", + "weight": 1.3, + "weight_description": "Expert in performance optimization and scalability analysis." + }, + { + "name": "UX Designer", + "system_prompt": "You are a UX/UI designer with expertise in user-centered design and accessibility.", + "weight": 1.1, + "weight_description": "Expert in user experience, interface design, and accessibility standards." + }, + { + "name": "DevOps Engineer", + "system_prompt": "You are a DevOps specialist focused on CI/CD, infrastructure, deployment, and monitoring.", + "weight": 1.2, + "weight_description": "Expert in deployment, infrastructure, and operational excellence." + } + ] +} diff --git a/src/rotator_library/ensemble_configs/strategies/strategy.example.txt b/src/rotator_library/ensemble_configs/strategies/strategy.example.txt new file mode 100644 index 00000000..76032a08 --- /dev/null +++ b/src/rotator_library/ensemble_configs/strategies/strategy.example.txt @@ -0,0 +1,39 @@ +ARBITRATION STRATEGY TEMPLATE: {strategy_name} + +=== FIELD DOCUMENTATION === +This is a plain text file that defines how the arbiter model should synthesize multiple responses. + +PLACEHOLDER: {responses} +- This will be replaced with formatted drone/specialist responses +- Format: "Response 1:\n\n\nResponse 2:\n\n..." +- For fusion: "Role (Model):\n\n..." (if blind=false) or "Role:\n\n..." (if blind=true) + +SPECIALIST EXPERTISE (Fusion only): +- If fusion mode, an additional "SPECIALIST EXPERTISE" section is auto-appended +- Lists each specialist's role, model, and weight_description +- Helps arbiter understand domain expertise when synthesizing + +RECURSIVE MODE: +- If enabled, additional "AUTONOMOUS DECISION PROTOCOL" instructions are appended +- Guides arbiter through consensus assessment and conflict resolution +- Internal reasoning is wrapped in [INTERNAL] tags and hidden from user +=== + +=== EXAMPLE STRATEGY === + +You are an expert synthesizer with deep analytical capabilities. + +Your task is to analyze the following responses and create a single, superior answer that combines the best insights from each perspective. + +{responses} + +Guidelines for synthesis: +1. **Identify Core Insights**: Extract key points and unique perspectives from each response +2. **Resolve Conflicts**: If responses disagree, evaluate which perspective is most sound based on evidence and reasoning +3. **Merge Complementary Ideas**: Combine non-conflicting insights into a cohesive whole +4. **Fill Gaps**: If all responses miss something important, include it based on your own expertise +5. **Maintain Accuracy**: Never introduce hallucinations - stay grounded in the provided responses +6. **Ensure Completeness**: Address all aspects of the original query +7. **Optimize Clarity**: Present the final answer in clear, well-structured language + +Your synthesized response should be more comprehensive and insightful than any individual response while maintaining accuracy and coherence. diff --git a/src/rotator_library/ensemble_configs/swarms/preset.example.json b/src/rotator_library/ensemble_configs/swarms/preset.example.json new file mode 100644 index 00000000..6f918cd6 --- /dev/null +++ b/src/rotator_library/ensemble_configs/swarms/preset.example.json @@ -0,0 +1,65 @@ +{ + "id": "aggressive", + "description": "High diversity swarm with adversarial critique. Use for complex problems requiring multiple perspectives and critical analysis.", + + "_FIELD_DOCUMENTATION": "=== SWARM PRESET CONFIGURATION ===", + "_id": "REQUIRED. Preset identifier. Must match filename (e.g., 'aggressive' for aggressive.json). Used in model name: {base_model}-{id}[swarm]", + "_description": "OPTIONAL. Human-readable description of this preset's purpose and characteristics.", + + "_base_models": "OPTIONAL. List of models this preset applies to. Controls /v1/models discovery. If omitted, preset is invisible but still usable with explicit syntax.", + "base_models": [ + "gpt-4o-mini", + "gemini-1.5-flash", + "claude-3-haiku" + ], + + "_omit_id": "OPTIONAL (default: false). If true, shows as {model}[swarm] in /v1/models instead of {model}-{id}[swarm]. Becomes the default preset for these models. Explicit format always works regardless of this setting.", + "omit_id": false, + + "_count": "REQUIRED. Number of parallel drone executions (2-10 recommended). More drones = more diversity but higher cost.", + "count": 5, + + "_temperature_jitter": "OPTIONAL. Adds random temperature variation to each drone for increased response diversity.", + "temperature_jitter": { + "_enabled": "Enable/disable jitter", + "enabled": true, + + "_delta": "Maximum temperature deviation (±delta). Each drone gets base_temp ± random(0, delta). Clamped to [0.0, 2.0]", + "delta": 0.3 + }, + + "_adversarial_config": "OPTIONAL. Dedicates the last N drones as critical reviewers with a custom prompt.", + "adversarial_config": { + "_enabled": "Enable/disable adversarial drones", + "enabled": true, + + "_count": "Number of drones to convert to adversarial mode (taken from the end of the drone list)", + "count": 2, + + "_prompt": "System prompt injected into adversarial drones. Should instruct them to find flaws, edge cases, and issues.", + "prompt": "You are a Senior Principal Engineer with 15+ years of experience. Your role is to find edge cases, security vulnerabilities, performance bottlenecks, and incorrect assumptions. Be thorough and critical in your analysis. Focus on:\n- Edge cases that could cause failures\n- Security implications and potential exploits\n- Performance and scalability concerns\n- Maintainability and code quality issues\n- Incorrect assumptions in the solution\n\nProvide constructive criticism to improve the solution." + }, + + "_arbiter": "REQUIRED. Configuration for the model that synthesizes all drone responses into a final answer.", + "arbiter": { + "_model": "'self' uses the base model as arbiter. Or specify explicit model (e.g., 'gpt-4o', 'claude-3-5-sonnet'). Should be a reasoning-capable model.", + "model": "self", + + "_strategy": "Name of strategy template file (from strategies/ directory, without .txt extension). Default: 'synthesis'", + "strategy": "synthesis", + + "_blind": "If true, hides model names from arbiter to reduce bias. Still shows drone numbers (Response 1, Response 2, etc.)", + "blind": true + }, + + "_recursive_mode": "OPTIONAL. Enables autonomous arbiter critique when consensus is low. Requires reasoning-capable arbiter.", + "recursive_mode": { + "_enabled": "Enable/disable recursive refinement", + "enabled": true, + + "_consensus_threshold": "Threshold (1-10 scale). If arbiter detects consensus < threshold, performs internal critique before synthesis.", + "consensus_threshold": 6, + + "_note": "Arbiter internally evaluates consensus, identifies conflicts, critiques responses, then synthesizes. Internal reasoning is logged but hidden from user output." + } +}