⚠️Warning⚠️ This issue was written by an AI Agent
I @samjewell only supervised it, and asked it questions.
Context
I've been analyzing CI performance for the grafana-cube-datasource plugin and found several opportunities to speed things up. Faster CI runs would significantly improve my development workflow, allowing for quicker feedback loops and more efficient iteration.
After analyzing a recent CI run (PR grafana/grafana-cube-datasource#109), I discovered optimization opportunities across the entire CI pipeline. I wanted to share these findings with the team so we can improve CI performance for all plugins using these workflows.
Overall CI Performance Summary
Total CI time: 7.4 minutes (443 seconds)
Key Findings
-
Critical path: "Test and build plugin" job takes 209s (47% of total)
- Backend build: 110s (52.6% of build job)
- Frontend build: 40s (19.1% of build job)
-
Playwright tests: 4 parallel jobs (~3 min each), wall time ~3.6 min
- Good parallelization (saves ~9 minutes vs sequential)
Build & Test Optimization Opportunities
These optimizations relate to the main CI workflow (.github/workflows/ci.yml):
1. Enable Go Module Caching (HIGH IMPACT - saves ~30-50s)
Problem: Go modules are downloaded every run, even when dependencies haven't changed
Current State: The CI workflow supports go-setup-caching input, but it's not enabled by default
Proposed Solution:
Enable Go module caching by default in the CI workflow, or at minimum document it better so plugin maintainers can easily enable it:
# In plugin-ci-workflows/.github/workflows/ci.yml
# The go-setup-caching input should be enabled by default or better documented
For plugin maintainers: They can enable it by adding to their workflow:
with:
go-setup-caching: true
Expected Savings: 30-50 seconds per CI run
Location: .github/workflows/ci.yml - Setup step (line ~365-376)
2. Backend Build Optimization (MEDIUM IMPACT - potential savings ~20-40s)
Problem: Backend build takes 110s (52.6% of the build job)
Current State: Go tests and compilation run sequentially
Potential Optimizations:
- Review if all Go tests need to run on every PR
- Consider test parallelization if not already enabled
- Check if there are slow integration tests that could be moved to separate job
- Better Go module caching (covered above)
Expected Savings: 20-40 seconds (depending on test suite)
Location: .github/workflows/ci.yml - Backend test/build step
4. Frontend Build Optimization (LOW IMPACT - potential savings ~10-20s)
Problem: Frontend build takes 40s (19.1% of build job)
Current State: The workflow should already cache npm dependencies
Potential Optimizations:
- Verify
node_modules caching is working properly
- Consider if all frontend tests need to run on every PR
- Optimize webpack/build configuration if possible
Expected Savings: 10-20 seconds
Location: .github/workflows/ci.yml - Frontend test/build step
Playwright E2E Test Optimization Opportunities
Per-job wall time: ~184s (3.1 minutes)
Each Playwright test job currently takes approximately 3 minutes, with the following breakdown:
| Step |
Duration |
% of Total |
Notes |
| Start Grafana |
74s |
40.2% |
⚠️ BIGGEST BOTTLENECK - Pulling Docker image |
| Install Playwright Browsers |
31s |
16.8% |
Should be cached but might not be working |
| Install npm dependencies |
30s |
16.3% |
Reasonable if cache is working |
| Run Playwright tests |
19s |
10.3% |
Actual test execution |
| Wait for Grafana to start |
8s |
4.3% |
Health check wait |
| Other setup steps |
22s |
12.0% |
Checkout, cache checks, etc. |
1. Docker Image Caching (HIGH IMPACT - saves ~50-60s per job)
Problem: Grafana Docker images are being pulled every run (74s - 40% of total time)
The "Start Grafana" step runs:
docker compose ${DOCKER_COMPOSE_FILE:+-f "$DOCKER_COMPOSE_FILE"} up -d
This pulls the Grafana Docker image from the registry every time, even if it hasn't changed.
Proposed Solutions:
-
Enable Docker BuildKit cache mounts (recommended):
env:
DOCKER_BUILDKIT: 1
COMPOSE_DOCKER_CLI_BUILD: 1
-
Pre-pull images with caching:
Add a step before "Start Grafana" to pull images with Docker's layer caching:
- name: Pull Grafana image
run: |
docker pull ${GRAFANA_IMAGE}:${GRAFANA_VERSION} || true
env:
GRAFANA_IMAGE: ${{ matrix.GRAFANA_IMAGE.NAME }}
GRAFANA_VERSION: ${{ matrix.GRAFANA_IMAGE.VERSION }}
-
Use Docker registry cache/proxy: Configure a local registry cache to speed up pulls
Expected Savings: 50-60 seconds per job
Location: .github/workflows/playwright.yml - "Start Grafana" step (line ~215)
2. Playwright Browser Cache (MEDIUM IMPACT - saves ~20-25s per job)
Problem: Playwright browsers are being installed even though cache is configured (31s - 17% of time)
The workflow currently has:
- name: Cache Playwright
uses: actions/cache@...
with:
path: ~/.cache/ms-playwright
key: playwright-${{ steps.version.outputs.version }}
- name: Install Playwright Browsers
run: npx playwright install --with-deps chromium
However, browsers are still being installed every run, suggesting the cache might not be restoring properly.
Proposed Solutions:
-
Add conditional installation:
- name: Install Playwright Browsers
if: steps.cache.outputs.cache-hit != 'true'
run: npx playwright install --with-deps chromium
-
Verify cache key stability: Ensure the Playwright version detection is stable
-
Add cache hit logging: Add debug output to verify cache is working:
- name: Check cache status
run: echo "Cache hit: ${{ steps.cache.outputs.cache-hit }}"
Expected Savings: 20-25 seconds per job (when cache hits)
Location: .github/workflows/playwright.yml - "Cache Playwright" and "Install Playwright Browsers" steps (lines ~167-175)
3. npm Dependencies in Playwright (LOW IMPACT - saves ~10-15s per job)
Problem: npm install takes 30s
Proposed Solutions:
-
Verify npm cache is working: Ensure actions/setup-node cache is properly configured
-
Use npm ci --prefer-offline: If cache exists, prefer offline mode:
- name: Install npm dependencies
run: npm ci --prefer-offline || npm ci
-
Ensure setup-node cache is enabled: Verify the Node.js setup step has caching enabled
Expected Savings: 10-15 seconds per job
Location: .github/workflows/playwright.yml - npm install step
Total Potential Impact
Build & Test Optimizations:
- Go module caching: ~30-50s saved
- npm caching for lockfile check: ~20-30s saved (plugin-specific)
- Backend build optimization: ~20-40s saved (potential)
- Frontend build optimization: ~10-20s saved (potential)
Total build/test savings: ~80-140 seconds per CI run
Playwright E2E Optimizations:
- With Docker caching: ~50-60s saved per job
- With Playwright cache fix: ~20-25s saved per job
- With npm optimization: ~10-15s saved per job
Total Playwright savings: ~80-100 seconds per job (43-54% reduction)
For a typical plugin with 4 parallel Playwright jobs: This would reduce wall time from ~184s to ~84-104s per job
Combined Impact
Total potential savings: ~160-240 seconds per CI run (2.7-4 minutes)
This would reduce total CI time from 7.4 minutes to approximately 4-5 minutes - a significant improvement for developer productivity!
Recommendations Priority
High Priority:
- Enable Go module caching by default - Easy win, saves 30-50s
- Fix Docker image caching in Playwright - Biggest single win (saves 50-60s per job)
Medium Priority:
- Verify/fix Playwright browser cache - Saves 20-25s per job
- Document npm caching for lockfile checks - Helps plugin maintainers save 20-30s
Low Priority:
- Optimize npm dependency installation in Playwright - Saves 10-15s per job
- Backend/Frontend build optimizations - Requires more investigation
Additional Context
This analysis was done by examining the CI run for grafana-cube-datasource PR #109, which had a total CI time of 7.4 minutes. These optimizations would benefit all plugins using these workflows.
I'm happy to help implement these changes or provide more details if needed!
I @samjewell only supervised it, and asked it questions.
Context
I've been analyzing CI performance for the grafana-cube-datasource plugin and found several opportunities to speed things up. Faster CI runs would significantly improve my development workflow, allowing for quicker feedback loops and more efficient iteration.
After analyzing a recent CI run (PR grafana/grafana-cube-datasource#109), I discovered optimization opportunities across the entire CI pipeline. I wanted to share these findings with the team so we can improve CI performance for all plugins using these workflows.
Overall CI Performance Summary
Total CI time: 7.4 minutes (443 seconds)
Key Findings
Critical path: "Test and build plugin" job takes 209s (47% of total)
Playwright tests: 4 parallel jobs (~3 min each), wall time ~3.6 min
Build & Test Optimization Opportunities
These optimizations relate to the main CI workflow (
.github/workflows/ci.yml):1. Enable Go Module Caching (HIGH IMPACT - saves ~30-50s)
Problem: Go modules are downloaded every run, even when dependencies haven't changed
Current State: The CI workflow supports
go-setup-cachinginput, but it's not enabled by defaultProposed Solution:
Enable Go module caching by default in the CI workflow, or at minimum document it better so plugin maintainers can easily enable it:
For plugin maintainers: They can enable it by adding to their workflow:
Expected Savings: 30-50 seconds per CI run
Location:
.github/workflows/ci.yml- Setup step (line ~365-376)2. Backend Build Optimization (MEDIUM IMPACT - potential savings ~20-40s)
Problem: Backend build takes 110s (52.6% of the build job)
Current State: Go tests and compilation run sequentially
Potential Optimizations:
Expected Savings: 20-40 seconds (depending on test suite)
Location:
.github/workflows/ci.yml- Backend test/build step4. Frontend Build Optimization (LOW IMPACT - potential savings ~10-20s)
Problem: Frontend build takes 40s (19.1% of build job)
Current State: The workflow should already cache npm dependencies
Potential Optimizations:
node_modulescaching is working properlyExpected Savings: 10-20 seconds
Location:
.github/workflows/ci.yml- Frontend test/build stepPlaywright E2E Test Optimization Opportunities
Per-job wall time: ~184s (3.1 minutes)
Each Playwright test job currently takes approximately 3 minutes, with the following breakdown:
1. Docker Image Caching (HIGH IMPACT - saves ~50-60s per job)
Problem: Grafana Docker images are being pulled every run (74s - 40% of total time)
The "Start Grafana" step runs:
docker compose ${DOCKER_COMPOSE_FILE:+-f "$DOCKER_COMPOSE_FILE"} up -dThis pulls the Grafana Docker image from the registry every time, even if it hasn't changed.
Proposed Solutions:
Enable Docker BuildKit cache mounts (recommended):
Pre-pull images with caching:
Add a step before "Start Grafana" to pull images with Docker's layer caching:
Use Docker registry cache/proxy: Configure a local registry cache to speed up pulls
Expected Savings: 50-60 seconds per job
Location:
.github/workflows/playwright.yml- "Start Grafana" step (line ~215)2. Playwright Browser Cache (MEDIUM IMPACT - saves ~20-25s per job)
Problem: Playwright browsers are being installed even though cache is configured (31s - 17% of time)
The workflow currently has:
However, browsers are still being installed every run, suggesting the cache might not be restoring properly.
Proposed Solutions:
Add conditional installation:
Verify cache key stability: Ensure the Playwright version detection is stable
Add cache hit logging: Add debug output to verify cache is working:
Expected Savings: 20-25 seconds per job (when cache hits)
Location:
.github/workflows/playwright.yml- "Cache Playwright" and "Install Playwright Browsers" steps (lines ~167-175)3. npm Dependencies in Playwright (LOW IMPACT - saves ~10-15s per job)
Problem: npm install takes 30s
Proposed Solutions:
Verify npm cache is working: Ensure
actions/setup-nodecache is properly configuredUse
npm ci --prefer-offline: If cache exists, prefer offline mode:Ensure setup-node cache is enabled: Verify the Node.js setup step has caching enabled
Expected Savings: 10-15 seconds per job
Location:
.github/workflows/playwright.yml- npm install stepTotal Potential Impact
Build & Test Optimizations:
Total build/test savings: ~80-140 seconds per CI run
Playwright E2E Optimizations:
Total Playwright savings: ~80-100 seconds per job (43-54% reduction)
For a typical plugin with 4 parallel Playwright jobs: This would reduce wall time from ~184s to ~84-104s per job
Combined Impact
Total potential savings: ~160-240 seconds per CI run (2.7-4 minutes)
This would reduce total CI time from 7.4 minutes to approximately 4-5 minutes - a significant improvement for developer productivity!
Recommendations Priority
High Priority:
Medium Priority:
Low Priority:
Additional Context
This analysis was done by examining the CI run for grafana-cube-datasource PR #109, which had a total CI time of 7.4 minutes. These optimizations would benefit all plugins using these workflows.
I'm happy to help implement these changes or provide more details if needed!