fix: comprehensive dashboard, backend, and schema improvements by MysticRyuujin · Pull Request #15 · NethermindEth/json-bench

MysticRyuujin · 2025-12-11T21:20:13Z

This vibe coded PR fixes a ton of issues I had with the dashboards including but not limited to:

Fixes test names
Incorrect Success Rates
- Fixes issue where Avg Success Rate would read 0% as would Per-Client Metrics
- Fixes issue where JSON-RPC would return error but it would not record it as an error
Incorrect Response Time Distribution
- Mean would report Infinitys
- Std Dev would report NaNs
Baseline Management Fixes
- I could not save or create new Baselines in the Dashboard
  - I did my best to ensure the save and delete feature is working and properly displays all the required UI elements
  - I have no idea what this is actually suppose to do...
  - I think the button "Set as Baseline" at the top of the page is still broken
Fixed Throughput Trends UI
- Request Throughput now looks more correct and trendy

I also tried to fix some metrics in Grafana for the k6 Prometheus dashboard, specifically Iterations, HTTP Latency Stats (p99/p95/avg/med), and HTTP Latency Distribution (min/avg/med/p90/p95/max)

I have tested all of these changes locally:

docker compose down --remove-orphans --volumes
docker compose up -d --build
go run ./runner/main.go -config ~/tmp/config/mix.yaml -clients ~/tmp/config/clients.yaml -historic -storage-config ./config/storage-example.yaml
open outputs/report.html
open http://localhost:8080

However, there are a handful of things that PR touches that I'm not overly confident in...

I had to make changes to the database schema to make Baseline stuff work in the GUI, while I tested everything locally, it's completely unclear to me how any of the baseline stuff was intended to work in the first place, if it ever did? I could use some guidance here.
The outputs/report.html ALWAYS says that connection pooling is 0 and it should be enabled, but I don't think these metrics were being captured correctly, if at all, I don't think k6 reports this? I removed that because I wasn't sure what else to do.
Copilot has quite a few suggestions about fixing the API that are currently "out of scope" but maybe should be addressed? 😅

- Sync init-db-clean.sql schema with init-db.sql - Fix dashboard components (LatencyDistribution, ThroughputChart, BaselineManager) - Improve error handling and edge cases in runner - Update API client metrics and server handling - Fix performance analyzer and data exporter - Address code review feedback on documentation and error handling

Copilot

Pull request overview

This PR implements comprehensive fixes for dashboard metrics, baseline management, and database schema improvements. The changes address critical issues with success rate calculations, response time distributions showing invalid values (Infinity/NaN), and non-functional baseline management features. Additionally, it removes incorrectly tracked connection pooling metrics and updates Grafana dashboards with correct k6 Prometheus metric queries.

Key Changes

Metrics Accuracy: Fixed success rate calculations by tracking JSON-RPC errors via k6_checks_rate metrics, and added proper handling for Infinity/NaN values in latency distributions
Baseline Management: Enhanced baseline functionality with git branch/commit tracking, proper API integration, and improved UI for baseline CRUD operations
Database Schema: Added full_results and git fields to support baseline features, changed ID types from UUID to VARCHAR for better readability, and removed foreign key constraints (with documented risks)

Reviewed changes

Copilot reviewed 26 out of 28 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
runner/types/types.go	Added SuccessRate field to ClientMetrics for explicit success tracking
runner/storage/postgres.go	Added full_results JSONB column support and improved JSON marshaling error handling
runner/storage/historic.go	Extract test name and description from config for better run metadata
runner/storage/grafana_schema.go	Added full_results column to benchmark_runs table schema
runner/metrics/results_collection.go	Track JSON-RPC errors using k6_checks_rate metrics to fix success rate calculations
runner/main.go	Updated prometheus endpoint flag description for clarity
runner/generator/ultimate_html_report.go	Removed non-functional connection metrics from HTML reports
runner/generator/k6_generator.go	Added logic to properly construct Prometheus remote write URL
runner/exporter/data_exporter.go	Removed connection metrics from CSV exports
runner/api/test_types.go	Added GitBranch and GitCommit fields to Baseline struct
runner/api/server.go	Implemented global trends endpoint for dashboard metrics
runner/api/client_metrics.go	Added SuccessRate field computation in client metrics
runner/analyzer/performance_analyzer.go	Removed recommendations for unmeasured connection metrics
runner/analysis/baseline.go	Enhanced baseline management with git tracking and fallback metrics handling
metrics/init-db.sql	Updated schema with new columns, changed UUID to VARCHAR IDs, removed foreign keys
metrics/init-db-clean.sql	Applied same schema changes for clean database initialization
metrics/dashboards/k6-dashboard.json	Fixed Grafana queries to use correct k6 metric names (iterations, latency stats)
dashboard/src/utils/data-transformers.ts	Handle full_results as both string and pre-parsed object
dashboard/src/pages/RunDetails.tsx	Integrated proper baseline management hooks with API
dashboard/src/pages/Dashboard.tsx	Improved empty states and added warnings about throughput approximations
dashboard/src/hooks/useDetailedMetrics.ts	Refactored aggregation logic with better error handling
dashboard/src/components/ThroughputChart.tsx	UI improvements for better layout and formatting
dashboard/src/components/LatencyDistribution.tsx	Fixed NaN/Infinity handling in statistics calculations
dashboard/src/components/BaselineManager.tsx	Added extensive handling for API naming inconsistencies (camelCase vs snake_case)
dashboard/src/api/client.ts	Fixed baseline API calls to use snake_case as required by backend
go.sum	Removed unused dependency entries
README.md	Fixed output path in documentation
.gitignore	Removed snapshot directory entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cbermudez97 · 2025-12-15T15:17:14Z

@MysticRyuujin PR LGTM in general. I have approved but I see some Copilot comments that you might want to check. If they are okay then let me know to merge. 👍

cbermudez97 · 2025-12-15T15:25:57Z

About the points:

Baseline comparisons where there when I started making changes so not very familiar with them. However, iirc they were not working so any improvements is welcome. The idea is to have like a summary for each of the benchmarks runs so we can compare different versions/clients performance.
Yes, think thats correct and the connection pool metric should be removed.
I would just take a look at DB ones and other security suggestions for the queries. Besides those think the rest are not critical. However, maybe its just better to address them now instead of leaving them there forever. 😄

Dashboard - Light/dark/system theme with persistence - Overview reworked: grouped by test_name with per-test cards, recent-runs heatmap, global client leaderboard, search + chip filter - /tests/:name detail page: per-client latency trend over time, per-method leaderboard, runs table - Per-method axis on the trend chart (selectable RPC method) - Current client builds panel showing web3_clientVersion per client - Removed misleading synthetic Latency Distribution histogram; replaced with per-client p50/p95/p99/max bar chart - Throughput Trends chart clicks navigate to that run - Client filter highlights the leaderboard row without collapsing it - Client Performance Analysis dropdown actually filters the table - "View Details" kebab opens the per-method drawer (was a no-op) - Section header controls no longer collapse the parent Baselines - Set/list/delete already worked; comparison endpoint now has a UI - New BaselineComparisonView with overall + per-client deltas, status badge, risk level, recommendations - Per-client status flips on any significantly worse signal (was washed out by a composite overall_score) - ListBaselines/GetBaseline COALESCE run_id so ON DELETE SET NULL preserves the snapshot - FK reconciled to benchmark_runs at runtime; orphan rows self-heal - API accepts both run_id and runId on POST /api/baselines - Frontend typed Baseline interface, dropped defensive snake/camel helpers Client versions - New runner/clientinfo package: parallel web3_clientVersion calls at run start, "unknown" on failure - benchmark_runs.client_versions JSONB column + v5 migration - Surfaced under each client row, in the per-client trend tooltip (with diamond markers on version transitions), and in a new Current builds panel on Overview + TestDetail Grafana - Removed 3 dashboards bound to the SimpleJSON plugin (Angular, refused by Grafana 13). Their use cases are covered by the React dashboard. Removed the dead datasource + 6 alert rules that pointed at it. Empty "JSON-RPC Bench" folder cleaned up - Fixed testid default on k6-dashboard.json Backend cleanup - handleGetGlobalTrends switch-style SQL, no string interpolation; throughput derived from total_requests/EXTRACT(EPOCH FROM duration) - ListRuns now returns clients/methods/client_versions - handleListRuns accepts both ?test and ?testName - TrendPoint exposes runId so charts can navigate - has_result accounting no longer double-counts transport errors; warning routed through log instead of fmt.Printf - postgres.go drops duplicate error logging in InsertRun marshal paths - baselineManager.Start() is actually called now, so migrations run Tests - New dashboard/tests/e2e/baseline.spec.ts with 17 Playwright tests covering the API contract, baseline lifecycle, comparison view, overview structure, theme toggle, version display, leaderboard filtering, and the per-test detail page

MysticRyuujin · 2026-05-13T19:25:15Z

So I know it's been like 6 months 😆 but I decided to come back for more pain

I fixed the stuff copilot called out, and then I went crazy and added a bunch of changes to the dashboard, complete redesign, it's much nicer, more readable, etc. and I cleaned up a bunch of stuff that was "out of scope"

Claude got a lot better in the last 6 months 🤣

I highly suggest checking out this branch yourself and running it yourself, since it's such a large re-write.

Dashboard theme system + grouped-by-test Overview with leaderboard + heatmap + chip filter
New /tests/:name page with per-client trend (with method axis) + per-method leaderboard + runs table
Replaced misleading synthetic histogram with real per-client percentile bars
Throughput chart now navigates on click; trend points carry runId
Baselines: live comparison view, fixed ON DELETE SET NULL semantics, accept both casings
web3_clientVersion captured at run start, surfaced on Overview, TestDetail, and RunDetails; tooltips on the trend chart mark version transitions
Removed 3 dead Grafana dashboards (Angular-plugin, refused by Grafana 13) + their dead datasource and 6 dead alert rules
All Copilot review feedback from PR fix: comprehensive dashboard, backend, and schema improvements #15 addressed (SQL safety, FK constraints, schema migrations, naming standardization, has_result math, etc.)
17 Playwright tests covering the API contract, baseline lifecycle, comparison view, overview, theme persistence, version display, leaderboard filtering, per-test detail page

Copilot

Pull request overview

Copilot reviewed 60 out of 63 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

dashboard/package-lock.json: Language not supported

YAML is parsed non-strictly so unrecognized keys like 'frequency: "20/s"' are silently dropped from a Call, leaving Weight=0 across the board. When the k6_generator then divides by totalWeight=0, it emits an empty requests.csv and every iteration throws 'No more requests found' — caught silently inside the script. The runner reports a 'successful' run with 3001 iterations but zero actual HTTP requests, and every downstream aggregate (total_requests, avg_latency_ms, p95_latency_ms, methods[]) is zero. Validate that the sum of Call.Weight is > 0 at config-load time and mention the silently-dropped frequency footgun in the error message.

When the runner is invoked from inside the official docker image, the git binary isn't installed and the source isn't bind-mounted, so getGitInfo() returns empty strings — every saved run has empty git_commit and git_branch, which makes branch/commit columns and grouping useless in the dashboard. Honour GIT_COMMIT and GIT_BRANCH env vars in front of the existing 'git rev-parse' shell-out. CI pipelines and docker-compose setups can now pass git info explicitly (e.g. via .env files). Falls back to 'git rev-parse' for source-checkout runs — existing behaviour is unchanged when env vars are unset.

If CollectClientsMetrics returns no usable data (most often because -prometheus-rw is unset, points at an unreachable endpoint, or Prometheus has no remote-write samples for this run's test_name), the runner used to log a single Warn and continue saving a historic row where every per-client and per-method aggregate is zero. The dashboard then shows the run as 'successful' with 0 requests, 0 latency, no methods — masking the real failure and forcing the user to debug why the data looks empty. Promote the existing Warn to Error and, when -historic is set, refuse to save the run if no client reports TotalRequests > 0. The Fatal includes a specific diagnostic pointing at the most likely cause.

On a run-detail page the breadcrumb was previously: Dashboard > Run Details > 20260515-... with neither intermediate level clickable. To get back to the parent test (e.g. hoodi-read-heavy), users had to use the browser back button or retype the URL. Insert the test name as a linked breadcrumb when the run has loaded: Dashboard > hoodi-read-heavy > 20260515-... ^^^^^^^^^^^^^^ The test link goes to /tests/<test_name> via the existing TestDetail route. Omitted while the run is still loading to avoid flicker.

The 'Latency Percentiles by Client' bar chart used a single y-axis for p50, p95, p99 and max. When one client has a tail-latency outlier (e.g. geth's 202 ms max in a run where every typical-case bar is under 10 ms), the linear scale compresses every other bar into the floor — readers can't tell besu's p99 from erigon's p50 because they're both single-pixel strokes. Split into two side-by-side panels with independent y-axes: Typical (p50 / p95 / p99) | Worst-case (max) Each scale auto-fits its own data range, so the typical-case ordering and the worst-case ordering are both readable simultaneously. Click handling is preserved on both panels — clicking any bar still filters the table below to that client. Sized 2:1 because the typical panel carries 3 bars per client while the worst-case panel only carries 1.

The test-detail page (TestDetail.tsx) displays AVG SUCCESS as one of four stat cells and renders a per-method leaderboard ranking clients by p95. When a test's calls reverted or returned JSON-RPC errors (e.g. an eth_estimateGas pointed at a non-payable contract), the runner stored success_rate=0 but k6 still measured the response time of those failed calls and the dashboard happily ranked clients on that meaningless data. Surface this loudly: - red banner above the leaderboard when avg_success < 50% ("timing failed requests"), yellow when 50% <= avg < 95% - color the Success column in the Recent runs table the same way so any 0% / low-success row is immediately visible in the table

PerClientTrendChart drew a line chart even when the series had a single data point per client, which renders as an empty plot area on Chart.js — making the chart look broken right after the first test run of a new test_name. When totalPoints==0 keep the existing 'No data' message; when maxPerSeries<2 show a clearer 'needs at least 2 runs to plot a line' explanation so the user knows the chart isn't broken, just under-fed. Hooks order is preserved (count useMemos run unconditionally before any return) to comply with React's rules of hooks.

GetHistoricSummary was a placeholder returning 'not implemented', which caused every request to GET /api/tests/{testName}/summary to log an ERROR and return 500. The dashboard masks this (TestDetail.tsx computes its own client-side summary from the runs list), but it pollutes the runner's stderr and breaks any external consumer. Implement it directly: list runs matching the filter via the existing db.ListRuns, then pick best/worst by p95 in a single pass with a zero-success-rate guard so a reverting test isn't crowned 'best'. Trends, regressions, and improvements remain nil — those are higher-level analyzer outputs that callers can populate separately.

Copilot AI review requested due to automatic review settings December 11, 2025 21:20

Copilot started reviewing on behalf of MysticRyuujin December 11, 2025 21:20 View session

Copilot AI reviewed Dec 11, 2025

View reviewed changes

cbermudez97 approved these changes Dec 15, 2025

View reviewed changes

MysticRyuujin requested a review from Copilot May 13, 2026 19:27

Copilot started reviewing on behalf of MysticRyuujin May 13, 2026 19:27 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread runner/storage/migration.go Outdated

MysticRyuujin added 9 commits May 13, 2026 14:34

fix: include full_results in schema migration

ccd0949

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: comprehensive dashboard, backend, and schema improvements#15

fix: comprehensive dashboard, backend, and schema improvements#15
MysticRyuujin wants to merge 11 commits into
NethermindEth:developfrom
MysticRyuujin:develop

MysticRyuujin commented Dec 11, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cbermudez97 commented Dec 15, 2025 •

edited

Loading

Uh oh!

cbermudez97 commented Dec 15, 2025

Uh oh!

MysticRyuujin commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MysticRyuujin commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cbermudez97 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbermudez97 commented Dec 15, 2025

Uh oh!

MysticRyuujin commented May 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MysticRyuujin commented Dec 11, 2025 •

edited

Loading

cbermudez97 commented Dec 15, 2025 •

edited

Loading