Skip to content

Add runtime-aware fitness scoring to metrics collector#47

Merged
Pan14ek merged 5 commits into
mainfrom
experiment/runtime-aware-scoring
Jun 4, 2026
Merged

Add runtime-aware fitness scoring to metrics collector#47
Pan14ek merged 5 commits into
mainfrom
experiment/runtime-aware-scoring

Conversation

@Pan14ek
Copy link
Copy Markdown
Owner

@Pan14ek Pan14ek commented Jun 4, 2026

Purpose

Implement experiment/runtime-aware-scoring as an additive, versioned runtime score in the fitness metrics collector while preserving the existing structural fitnessScore.

Type

  • 🧪 Experiment (fitness function research)
  • ✨ Feature (new business logic)
  • 🐛 Bug fix
  • ♻️ Refactor / neutral change
  • 📝 Documentation only

Changes

  • Added runtimeFitnessScore, runtimeFitnessScoreVersion, and runtimeFitness metadata to the collector output.
  • Kept top-level fitnessScore as the historical structural CI score; runtime evidence is not mixed into it.
  • Added local runtime scoring from RUNTIME_METRICS_JSON and RUNTIME_BASELINE_METRICS_JSON.
  • Added SLO threshold parsing/classification from slo-thresholds-v1.json.
  • Added optional local metrics output through METRICS_OUTPUT_JSON.
  • Extracted runtime scoring logic into a dedicated module for maintainability.
  • Updated backend docs and marked experiment/runtime-aware-scoring as implemented.
  • Updated Notion knowledge base entries for the runtime-aware scoring contract.

Files changed

File Change
fitness-metrics-collector/scripts/collectMetrics.ts Wires runtime inputs, baseline lookup, SLO classification, local output, and final document shape
fitness-metrics-collector/scripts/runtimeScoring.ts Adds runtime-aware v1 scoring and SLO classification helpers
fitness-metrics-collector/scripts/collectMetrics.test.ts Adds focused tests for runtime scoring, SLO classification, missing metrics, zero baselines, and local output
docs/runtime-aware-scoring.md Documents the scoring contract, normalization rules, config, and eligibility model
docs/README.md Links the new runtime-aware scoring documentation
docs/experiment-tasks.md Marks experiment/runtime-aware-scoring as implemented

Expected result

Runtime score is emitted separately from the structural CI score. Existing structural metrics should remain neutral unless CI reports change due to unrelated noise.

Metric Baseline (main) Branch Direction
checkstyle_violations neutral
spotbugs_total neutral
line_coverage neutral
critical_violations neutral
code_smells neutral
duplicated_lines_density neutral
F score neutral
SonarQube QG unknown until CI

Additional runtime output:

Runtime field Expected
runtimeFitnessScore number when current and baseline runtime summaries are comparable, otherwise null
runtimeFitnessScoreVersion runtime-aware-v1
runtimeFitness.eligibleForStableComparison false when hard SLO evidence is invalid or unknown

Verification

  • npm test passes locally from fitness-metrics-collector/
  • npm run build passes locally from fitness-metrics-collector/
  • git diff --check passes
  • Runtime-only local collector smoke test writes METRICS_OUTPUT_JSON
  • ./gradlew check passes locally
  • ./gradlew jacocoTestReport passes locally
  • No intended changes to backend product business logic
  • Branch name matches experiment naming convention

Review note

The runtime-aware scoring changes are intended to be the PR scope. Before opening or merging, confirm the branch does not include unrelated web/mobile scaffold, IDE, or .DS_Store files.

@Pan14ek
Copy link
Copy Markdown
Owner Author

Pan14ek commented Jun 4, 2026

Code Review: Add runtime-aware fitness scoring to metrics collector

❌ Blockers — must resolve before merge

1. Out-of-scope files inflate the diff by ~27,000 lines

The PR body itself flagged this risk but the branch was not cleaned. The diff contains content completely unrelated to runtime scoring:

Unrelated content Notes
lined-web/ (entire scaffold) ~27,000 of 29,702 additions
.idea/ IDE config files 6+ files
.DS_Store (root, backend/, fitness-metrics-analyzer/) Binary noise
backend/lined/.beads/ interaction logs Should not be committed

Do not merge until these are removed. Cherry-pick the collector commits onto a clean branch from main, or rebase and drop the unrelated paths interactively.

2. .DS_Store must be added to the root .gitignore

It appears three times in the diff, which means it is not currently excluded. Add it to .gitignore so it cannot be committed again.


🟠 Should-fix

3. Re-export indirection should be removed (collectMetrics.ts)

// collectMetrics.ts — this re-export exists only to satisfy the test file
export { classifyRuntimeMetrics, computeRuntimeFitness, parseSloThresholds } from "./runtimeScoring";

The test file imports from collectMetrics rather than runtimeScoring directly. The re-export was added to match that import, which is backwards. Move the test imports to runtimeScoring directly and remove the re-export from collectMetrics.ts. collectMetrics is a script entry point, not a library surface.

4. Hardcoded personal machine path in documentation (runtime-aware-scoring.md)

# This is a personal machine path — will not work for any other contributor
cd /Users/oleksii_makieiev/Documents/startups/Lined/fitness-metrics-collector

Replace with a repo-relative path:

cd fitness-metrics-collector

🟡 Nice-to-have

5. Document the evidence-source/readiness unknown invariant inline

In classifyThreshold (runtimeScoring.ts), rules that use evidence_source instead of metric always resolve to missing: true because threshold.metric is undefined. This means hasUnknownHardConstraint is always true when a readiness rule is present, so eligibleForStableComparison is permanently false until a future schema adds a summarized readiness field.

This is intentional per the docs, but it is non-obvious from the code alone. A short inline comment on the field === undefined path would prevent future contributors from treating it as a bug.

6. Guard hasStructuralMetrics before structural scoring on the non-RUNTIME_ONLY path

Making checkstyle_violations, spotbugs_total, and spotbugs_total_classes optional is correct for RUNTIME_ONLY=true. However, on the normal path, if CI reports are missing these fields, the ?? 0 coercions produce a zero fitness score that is indistinguishable from a genuinely zero score. Consider asserting hasStructuralMetrics(metrics) before the structural fitness computation when config.runtimeOnly === false.

7. Missing test coverage for two throw paths

  • writeMetricsOutput with an unwritable or invalid path — the function guards blank paths but would throw on a bad path with no test coverage.
  • collectRuntimeOnlyMetrics when RUNTIME_METRICS_JSON is not set — the function throws "RUNTIME_ONLY=true requires RUNTIME_METRICS_JSON" but this throw is not tested.

Overall

The runtime scoring logic itself (runtimeScoring.ts) is solid — the design is clean, the math is correct, the zero-baseline edge cases are handled explicitly, and the test fixtures are well-organized. The t.plan(n) discipline and fixture constants are good practice. The blockers are branch hygiene issues, not implementation problems.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 4, 2026

@Pan14ek Pan14ek merged commit 5eeb242 into main Jun 4, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant