Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/plan/adr/ADR-001-message-bus-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@ raw pointers (`sendMessage(msg, BaseAgent* target)`). Code review identified thi
critical architectural debt that would not scale to the 4-layer hierarchy (L0→L1→L2→L3)
and 100+ agent deployments planned for Phases 2-5.

> **Note (ADR-015)**: The 4-layer HMAS hierarchy (L0 `ChiefArchitectAgent`,
> L1 `ComponentLeadAgent`, L2 `ModuleLeadAgent`, L3 `TaskAgent`) has since been
> extracted from ProjectKeystone into **ProjectAgamemnon** (see ADR-015). This ADR
> documents the original MessageBus design decision, which still applies to
> Keystone's transport primitives. References to agent types below are historical.

### Problems with Direct Coupling

1. **Tight Coupling**: Agents require direct pointer references to communicate
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,14 @@

## Context

ProjectKeystone requires a high-performance scheduler to manage concurrent agent execution across multiple worker threads. Early phases used basic thread pools, but scaling to 100+ agents with coroutine-based execution demanded a more sophisticated approach.
ProjectKeystone required a high-performance scheduler to manage concurrent agent execution
across multiple worker threads during early development. Scaling to 100+ agents with
coroutine-based execution demanded a sophisticated approach.

> **Note (ADR-015)**: The HMAS agent hierarchy (`ChiefArchitectAgent`, `ComponentLeadAgent`,
> `ModuleLeadAgent`, `TaskAgent`) has been extracted into **ProjectAgamemnon**. The
> `WorkStealingScheduler` itself remains in Keystone as a transport concurrency primitive.
> References to agent types and the "4-layer hierarchy" below are historical context.

### Requirements

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@

## Context

> **Note (ADR-015)**: The concrete agent types discussed in this ADR
> (`TaskAgent`, `ChiefArchitectAgent`, `ModuleLeadAgent`, `ComponentLeadAgent`) have
> been extracted into **ProjectAgamemnon**. The C++20 Concepts defined here continue to
> apply to agent implementations in ProjectAgamemnon; this ADR is the authoritative
> record for the compile-time interface verification design.

Issue #24 identified that the agent interface has no compile-time verification that agents
implement the required methods. Errors are only caught at link time or runtime when methods
are missing or have incorrect signatures.
Expand Down
6 changes: 6 additions & 0 deletions docs/plan/adr/ADR-008-async-agent-unification.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@

## Context and Problem Statement

> **Note (ADR-015)**: The agent hierarchy (`TaskAgent`, `ChiefArchitectAgent`,
> `ModuleLeadAgent`, `ComponentLeadAgent`) described in this ADR has been extracted
> into **ProjectAgamemnon**. This ADR is preserved as the historical record of the
> unification decision; the `BaseAgent` class hierarchy it documents now lives in
> ProjectAgamemnon, not in ProjectKeystone.

The codebase had a dual hierarchy with both synchronous (`BaseAgent`) and asynchronous (`AsyncBaseAgent`) agent classes. This created:
- Code duplication (two versions of every agent type)
- Type system complexity (couldn't have uniform collections)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@

## Context and Problem Statement

> **Note (ADR-015)**: The `TaskAgent` and other agent types referenced throughout this
> ADR have been extracted into **ProjectAgamemnon**. This ADR documents a proposed design
> pattern for separating agent domain logic from infrastructure concerns; its
> implementation status applies to ProjectAgamemnon, not to ProjectKeystone's transport
> layer.

Agents currently mix domain logic with infrastructure concerns:
- `processMessage()` implementations contain business logic (bash execution, delegation, synthesis)
- Infrastructure concerns (inbox management, routing, metrics, deadlines) are coupled with domain logic
Expand Down
5 changes: 5 additions & 0 deletions docs/plan/adr/ADR-010-architecture-issue-resolution.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# ADR-010: P0 Critical Architecture Issues - RESOLVED ✅

> **Note (ADR-015)**: Agent types referenced in code examples below
> (`ChiefArchitectAgent`, `TaskAgent`, etc.) have been extracted into
> **ProjectAgamemnon**. The transport-layer fixes documented here (MessageBus
> lifetime safety, scheduler thread safety) remain part of ProjectKeystone.

## Overview
This document tracked the P0 (critical) architecture issues identified in the comprehensive code review.

Expand Down
20 changes: 20 additions & 0 deletions include/core/metrics.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,23 @@ class Metrics {
};
PriorityStats getPriorityStats() const;

/**
* @brief Set the current number of in-flight task claims.
*
* Called by the TaskClaimer (or its C++ bridge) to report how many
* advance_dag_tracked tasks are currently executing. The value is snapshotted
* on each call and exposed as a Prometheus gauge.
*
* @param count Number of tasks currently in-flight.
*/
void setInFlightCount(int64_t count);

/**
* @brief Get the current in-flight task claim count.
* @return Last value set by setInFlightCount(), or 0 if never set.
*/
int64_t getInFlightCount() const;

/**
* @brief Record a deadline miss
* @param msg_id Message identifier that missed deadline
Expand Down Expand Up @@ -179,6 +196,9 @@ class Metrics {
std::atomic<uint64_t> deadline_misses_{0};
std::atomic<int64_t> total_deadline_miss_ms_{0};

// In-flight task claim count (reported by TaskClaimer)
std::atomic<int64_t> in_flight_count_{0};

// Throughput calculation
std::chrono::steady_clock::time_point start_time_;
};
Expand Down
1 change: 1 addition & 0 deletions include/monitoring/prometheus_exporter.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ namespace monitoring {
* - hmas_worker_utilization_percent - Gauge of worker utilization
* - hmas_deadline_misses_total - Counter of deadline misses
* - hmas_deadline_miss_milliseconds - Gauge of average miss time
* - keystone_task_claimer_in_flight_count - Gauge of active advance_dag_tracked tasks
*/
class PrometheusExporter {
public:
Expand Down
9 changes: 9 additions & 0 deletions src/core/metrics.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,14 @@ Metrics::PriorityStats Metrics::getPriorityStats() const {
low_priority_count_.load(std::memory_order_relaxed)};
}

void Metrics::setInFlightCount(int64_t count) {
in_flight_count_.store(count, std::memory_order_relaxed);
}

int64_t Metrics::getInFlightCount() const {
return in_flight_count_.load(std::memory_order_relaxed);
}

void Metrics::recordDeadlineMiss(const std::string& /* msg_id */, int64_t late_by_ms) {
deadline_misses_.fetch_add(1, std::memory_order_relaxed);
total_deadline_miss_ms_.fetch_add(late_by_ms, std::memory_order_relaxed);
Expand Down Expand Up @@ -214,6 +222,7 @@ void Metrics::reset() {
total_worker_samples_.store(0, std::memory_order_relaxed);
deadline_misses_.store(0, std::memory_order_relaxed);
total_deadline_miss_ms_.store(0, std::memory_order_relaxed);
in_flight_count_.store(0, std::memory_order_relaxed);

{
std::lock_guard<std::mutex> lock(timestamps_mutex_);
Expand Down
6 changes: 6 additions & 0 deletions src/monitoring/prometheus_exporter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,12 @@ std::string PrometheusExporter::generateMetrics() {
ss << "# TYPE hmas_uptime_seconds gauge\n";
ss << "hmas_uptime_seconds " << uptime_seconds << "\n";

// In-flight task claim count (gauge - TaskClaimer advance_dag_tracked tasks)
ss << "# HELP keystone_task_claimer_in_flight_count Number of advance_dag_tracked "
"tasks currently executing in the TaskClaimer\n";
ss << "# TYPE keystone_task_claimer_in_flight_count gauge\n";
ss << "keystone_task_claimer_in_flight_count " << metrics.getInFlightCount() << "\n";

// Health status (gauge - always 1 if responding)
ss << "# HELP hmas_up HMAS health status (1=up, 0=down)\n";
ss << "# TYPE hmas_up gauge\n";
Expand Down
Loading
Loading