From 6208e4d532248e6ee9d4870ddfb3e39a73cac56b Mon Sep 17 00:00:00 2001 From: prashantbytesyntax Date: Thu, 4 Jun 2026 20:43:57 +0000 Subject: [PATCH 1/2] docs: refresh README architecture diagram to cover dynamic profiling The legacy `system_overview.png` only depicts the one-way continuous data plane (agent -> backend -> S3 -> indexer -> ClickHouse) and predates the dynamic profiling work. Replace the System Overview section with a Mermaid diagram that captures both: - Continuous profiling data plane (existing). - Dynamic profiling control plane: heartbeat protocol, ProfilingRequests / ProfilingCommands / ProfilingExecutions, agent two-slot architecture (ContinuousProfilerSlot + AdhocProfilerSlot) with CommandManager priority queue, and optional Intel(R) PerfSpect HW metrics path. Also expands the component list with the new control-plane endpoints and links out to heartbeat_doc/ for full details. Marks the old PNG as legacy rather than deleting it. Co-authored-by: Cursor --- README.md | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 116 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 3d6e6ec9..9d3ac093 100644 --- a/README.md +++ b/README.md @@ -29,17 +29,120 @@ featuring advanced flamegraph analysis tools. ## System Overview -![system_overview.png](system_overview.png) -The Continuous Profiler is structured around several key microservices, -each playing a vital role in its functionality: -- `src/gprofiler/backend` - This is the web application backend. It exposes all APIs to the frontend or API users and is responsible for collecting data from agents. -- `src/gprofiler/frontend` - The User Interface of Continuous Profiler, facilitating interaction with the backend. -- `src/gprofiler_indexer` - This service is tasked with collecting raw profiling data from S3 storage and indexing it for ClickHouse, a database management system. -- `src/gprofiler_flamedb_rest` - Handles communication with ClickHouse for the purpose of constructing flamegraphs. -- `src/gprofiler_logging` - Dedicated to collecting logs from agents, ensuring a comprehensive logging system. +The Continuous Profiler Performance Studio supports **two profiling modes** that share the +same backend services and storage layer: + +1. **Continuous profiling (data plane)** — Agents periodically upload profile data which is + indexed into ClickHouse and visualized as flame graphs. +2. **Dynamic profiling (control plane)** — Operators trigger ad-hoc / on-demand profiling + sessions from the UI; agents fetch start / stop commands via a heartbeat protocol and can + optionally collect Intel® PerfSpect hardware metrics. See + [`heartbeat_doc/README_HEARTBEAT.md`](heartbeat_doc/README_HEARTBEAT.md) and + [`heartbeat_doc/PERFSPECT_DYNAMIC_PROFILING.md`](heartbeat_doc/PERFSPECT_DYNAMIC_PROFILING.md) + for details. + +### Architecture diagram + +```mermaid +flowchart TB + classDef agent fill:#eef4ff,stroke:#4c6ef5,color:#1a2740 + classDef backend fill:#ede9fe,stroke:#7c3aed,color:#1a2740 + classDef ui fill:#fef3c7,stroke:#d97706,color:#1a2740 + classDef store fill:#f1f5f9,stroke:#475569,color:#1a2740 + classDef aws fill:#fff7ed,stroke:#ea580c,color:#1a2740 + + subgraph AGENT["gProfiler Agent (host)"] + direction TB + HB["Heartbeat loop
POST /api/metrics/heartbeat"] + CMQ["CommandManager
(priority queue:
stop > ad-hoc > continuous)"] + CS["ContinuousProfilerSlot"] + AS["AdhocProfilerSlot"] + PSI["PerfSpect installer
(HW metrics, optional)"] + LOG["Agent logs"] + HB --> CMQ + CMQ --> CS + CMQ --> AS + CS -. enable_perfspect .-> PSI + AS -. enable_perfspect .-> PSI + end + class AGENT,HB,CMQ,CS,AS,PSI,LOG agent + + subgraph UI["Frontend UI (src/gprofiler/frontend)"] + FG["Flame graph & search views"] + CTRL["Dynamic profiling console
Start / Stop, PIDs,
PerfSpect HW metrics"] + end + class UI,FG,CTRL ui + + subgraph BE["Performance Studio Backend"] + WEBAPP["webapp / backend (FastAPI)
src/gprofiler/backend
• /api/metrics/profile_request[/bulk]
• /api/metrics/heartbeat
• /api/metrics/command_completion
• PMU + capacity validation
• Slack notifications"] + LOGSVC["agents-logs-backend
src/gprofiler_logging"] + IDX["gprofiler_indexer"] + REST["gprofiler_flamedb_rest"] + end + class BE,WEBAPP,LOGSVC,IDX,REST backend + + PG[("PostgreSQL
HostHeartbeats
ProfilingRequests
ProfilingCommands
ProfilingExecutions
service metadata")] + CH[("ClickHouse
flamedb")] + S3[["AWS S3
(profile data + adhoc)"]] + SQS[["AWS SQS
(indexer queue)"]] + class PG,CH store + class S3,SQS aws + + %% --- Continuous data plane --- + CS -- "upload profile" --> WEBAPP + AS -- "upload adhoc flamegraph" --> WEBAPP + WEBAPP -- "store raw" --> S3 + WEBAPP -- "enqueue index task" --> SQS + SQS -- "trigger" --> IDX + IDX -- "read raw" --> S3 + IDX -- "write samples" --> CH + REST -- "query" --> CH + WEBAPP -- "query flames" --> REST + + %% --- Logs --- + LOG -- "POST agent logs" --> LOGSVC + LOGSVC --> PG + + %% --- Metadata --- + WEBAPP <--> PG + + %% --- Dynamic profiling control plane --- + HB <==> |"heartbeat ↑
command ↓"| WEBAPP + CTRL -- "create profiling request" --> WEBAPP + + %% --- UI queries --- + FG -- "queries" --> WEBAPP +``` -This architecture allows for efficient handling and analysis of profiling data, providing users with an intuitive and powerful tool for performance analysis. +### Components + +The Performance Studio is structured around several key microservices: + +- `src/gprofiler/backend` — Web application backend (FastAPI). Exposes APIs to the frontend + and API users, ingests continuous profile data from agents, and hosts the **dynamic + profiling control plane** (`/api/metrics/heartbeat`, `/api/metrics/profile_request`, + `/api/metrics/profile_request/bulk`, `/api/metrics/command_completion`, + `/api/metrics/profiling/host_status`). +- `src/gprofiler/frontend` — Web UI. Renders flame graphs and provides the dynamic + profiling console (Start / Stop, target hosts and PIDs, PerfSpect hardware-metrics + toggle). +- `src/gprofiler_indexer` — Reads raw profiling data from S3 (triggered via SQS) and + indexes it into ClickHouse. +- `src/gprofiler_flamedb_rest` — REST layer that the backend uses to query flame graphs + out of ClickHouse. +- `src/gprofiler_logging` (`agents-logs-backend`) — Collects logs from agents. + +The **gProfiler agent** itself (see [intel/gprofiler](https://github.com/intel/gprofiler)) +runs on each profiled host and contributes two roles: + +- **Data plane** — Continuously uploads profile data to the backend. +- **Control plane** — When started with `--enable-heartbeat-server`, runs a heartbeat + loop that fetches start / stop commands from the backend and dispatches them through a + `CommandManager` priority queue into one of two execution slots + (`ContinuousProfilerSlot`, `AdhocProfilerSlot`) which can run in parallel for + non-overlapping profiler types. Optionally bootstraps Intel® PerfSpect for hardware + metrics collection. ### External Dependencies: AWS Services The Continuous Profiler incorporates specific AWS services as essential components. @@ -52,6 +155,10 @@ These dependencies are: You are welcome to replace those services with other similar which implement the same API, like Minio for S3 and RabbitMQ for SQS. +> **Note:** The legacy `system_overview.png` diagram only depicted the continuous data +> plane and predates dynamic profiling. The Mermaid diagram above is the current source of +> truth. + ## Usage ### Pre-requisites From 678cb673236515d9beeff204c135b170e47203dd Mon Sep 17 00:00:00 2001 From: prashantbytesyntax Date: Thu, 4 Jun 2026 20:50:02 +0000 Subject: [PATCH 2/2] docs: simplify agent block in architecture diagram for blog use Co-authored-by: Cursor --- README.md | 24 +++++------------------- 1 file changed, 5 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 9d3ac093..b13d1e23 100644 --- a/README.md +++ b/README.md @@ -52,21 +52,8 @@ flowchart TB classDef store fill:#f1f5f9,stroke:#475569,color:#1a2740 classDef aws fill:#fff7ed,stroke:#ea580c,color:#1a2740 - subgraph AGENT["gProfiler Agent (host)"] - direction TB - HB["Heartbeat loop
POST /api/metrics/heartbeat"] - CMQ["CommandManager
(priority queue:
stop > ad-hoc > continuous)"] - CS["ContinuousProfilerSlot"] - AS["AdhocProfilerSlot"] - PSI["PerfSpect installer
(HW metrics, optional)"] - LOG["Agent logs"] - HB --> CMQ - CMQ --> CS - CMQ --> AS - CS -. enable_perfspect .-> PSI - AS -. enable_perfspect .-> PSI - end - class AGENT,HB,CMQ,CS,AS,PSI,LOG agent + AGENT["gProfiler Agent (host)
• continuous + ad-hoc profiling
• optional Intel® PerfSpect HW metrics"] + class AGENT agent subgraph UI["Frontend UI (src/gprofiler/frontend)"] FG["Flame graph & search views"] @@ -90,8 +77,7 @@ flowchart TB class S3,SQS aws %% --- Continuous data plane --- - CS -- "upload profile" --> WEBAPP - AS -- "upload adhoc flamegraph" --> WEBAPP + AGENT -- "upload profile / adhoc flamegraph" --> WEBAPP WEBAPP -- "store raw" --> S3 WEBAPP -- "enqueue index task" --> SQS SQS -- "trigger" --> IDX @@ -101,14 +87,14 @@ flowchart TB WEBAPP -- "query flames" --> REST %% --- Logs --- - LOG -- "POST agent logs" --> LOGSVC + AGENT -- "POST agent logs" --> LOGSVC LOGSVC --> PG %% --- Metadata --- WEBAPP <--> PG %% --- Dynamic profiling control plane --- - HB <==> |"heartbeat ↑
command ↓"| WEBAPP + AGENT <==> |"heartbeat ↑
command ↓"| WEBAPP CTRL -- "create profiling request" --> WEBAPP %% --- UI queries ---