Pan14ek · Pan14ek · Jun 4, 2026 · Jun 3, 2026 · Jun 3, 2026 · Jun 3, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+.DS_Store
diff --git a/backend/lined/docs/README.md b/backend/lined/docs/README.md
@@ -17,6 +17,7 @@ changes.
 | HPA Resource Scenarios | Local kind variants for backend resource requests/limits, fixed replicas, and CPU-based HPA behavior.       | `hpa-resource-scenarios.md`    | Use before comparing deployment/runtime trade-offs under k6 workload traffic.               | Resource pressure, fixed replica comparison, HPA prerequisites, scenario cleanup.        |
 | Runtime Scenario Summaries | Scenario-runner seam for producing sanitized runtime-summary artifacts from one scenario and workload. | `runtime-scenario-summaries.md` | Use before generating collector-ready runtime evidence for local scenario comparisons. | Scenario runner CLI, k6 summary export, Kubernetes state summaries, provenance manifest. |
 | Runtime Fitness Extension | Runtime-aware fitness metric contract and optional collector input shape.                                  | `runtime-fitness-extension.md` | Use before adding runtime-aware scoring or attaching runtime summaries to metrics documents. | Structural/runtime score separation, runtime metric schema, normalization, compatibility. |
+| Runtime-Aware Scoring | Versioned scalar runtime score that compares current runtime summaries against baseline evidence. | `runtime-aware-scoring.md` | Use before running or interpreting runtime-aware fitness scoring. | Local baseline input, score fields, SLO classification, missing metrics, metrics-store fallback. |
 | Prometheus Telemetry Pipeline | Local Prometheus deployment and scrape workflow for kind backend metrics.                              | `prometheus-telemetry-pipeline.md` | Use before collecting persistent-enough Prometheus samples from local scenario runs.       | Prometheus pod discovery, Actuator scrape verification, PromQL checks, telemetry cleanup. |
 | SLO Constraint Thresholds | Initial runtime SLO and constraint thresholds for classifying local experiment variants. | `slo-constraint-thresholds.md` | Use before interpreting runtime-summary evidence or adding runtime-aware scoring. | Latency, error-rate, availability, restart, readiness, and resource-pressure constraints. |
 | LLM Support Service | Plan for a separate advisory service that proposes candidate fitness rules and trade-off explanations. | `llm-support-service.md` | Use before designing or implementing LLM-assisted rule synthesis for the experiment. | Service boundary, serverless/manual triggers, input/output contracts, review workflow. |

diff --git a/backend/lined/docs/experiment-tasks.md b/backend/lined/docs/experiment-tasks.md
@@ -18,7 +18,7 @@ scientific experiment work.
 | `experiment/scenario-fixture-discipline`    | Task | Runtime evidence       | Yes                   | Scenario fixture discipline               | Define explicit workload/context profiles and repeatable input setup for Lined experiment scenario runs. | Deployment/runtime comparisons use stable fixtures instead of manual setup.         |
 | `experiment/slo-constraint-thresholds`      | Task | Runtime evidence       | Yes                   | SLO and constraint thresholds             | Define initial latency, error-rate, availability, restart, readiness, and resource-efficiency thresholds for classifying valid experiment variants. | Runtime evidence can be evaluated against explicit constraints instead of ad hoc interpretation. |
 | `experiment/fitness-runtime-extension`      | Task | Runtime scoring        | Yes                   | Runtime fitness extension                 | Extend experiment documentation and/or collector design to include telemetry metrics. | Fixed CI fitness can be compared with runtime-aware adaptive fitness.               |
-| `experiment/runtime-aware-scoring`          | Task | Runtime scoring        | No                    | Runtime-aware scoring                     | Add a versioned runtime fitness score that uses summarized runtime metrics while preserving the existing structural `fitnessScore`. | Runtime-aware scalar fitness can be computed without changing historical CI fitness semantics. |
+| `experiment/runtime-aware-scoring`          | Task | Runtime scoring        | Yes                   | Runtime-aware scoring                     | Add a versioned runtime fitness score that uses summarized runtime metrics while preserving the existing structural `fitnessScore`. | Runtime-aware scalar fitness can be computed without changing historical CI fitness semantics. |
 | `experiment/adaptive-weighted-fitness`      | Task | Runtime scoring        | No                    | Adaptive weighted fitness                 | Implement context-sensitive weighting over structural and runtime signals for workload, SLO, or resource-pressure contexts. | Fixed structural fitness can be compared with an adaptive scalar fitness baseline.  |
 | `experiment/pareto-optimization-baseline`   | Task | Runtime scoring        | No                    | Pareto optimization baseline              | Add a small NSGA-II or equivalent Pareto-based optimizer over the current deployment scenario set and runtime objectives. | Deployment variants can be compared as multi-objective trade-offs rather than a single weighted score. |
 | `experiment/decision-usefulness-reporting`  | Task | Runtime scoring        | No                    | Decision-usefulness reporting             | Extend experiment reporting to compare whether Pareto/NSGA-II gives more actionable trade-off alternatives than fixed-weight scalar scoring. | Results explain decision usefulness in addition to numeric objective values.        |

diff --git a/backend/lined/docs/runtime-aware-scoring.md b/backend/lined/docs/runtime-aware-scoring.md
@@ -0,0 +1,173 @@
+# Runtime-Aware Scoring
+
+This guide describes the runtime-aware scoring contract for
+`experiment/runtime-aware-scoring`.
+
+Runtime-aware scoring is additive. It keeps the existing top-level
+`fitnessScore` as the structural CI score and adds a separate versioned score
+from summarized runtime evidence.
+
+## Scope
+
+This task provides:
+
+- a versioned scalar runtime score named `runtimeFitnessScore`;
+- local scoring from explicit current and baseline runtime summary files;
+- optional persisted baseline lookup through the collector metrics-store seam;
+- SLO constraint classification from `slo-thresholds-v1.json`;
+- optional local output when Cosmos DB or another metrics database is not
+  configured.
+
+This task does not add adaptive weighting, Pareto optimization, new backend
+API behavior, production SLOs, dashboarding, or live telemetry scraping inside
+the collector.
+
+## Collector Inputs
+
+The collector accepts current runtime evidence through the existing input:
+
+```text
+RUNTIME_METRICS_JSON=/absolute/path/to/runtime-summary.json
+```
+
+For local/offline scoring, pass an explicit baseline summary:
+
+```text
+RUNTIME_BASELINE_METRICS_JSON=/absolute/path/to/baseline-runtime-summary.json
+```
+
+When a metrics store is configured and no explicit baseline file is provided,
+the collector can look for the latest persisted `main` runtime summary matching
+the configured baseline scenario and current workload:
+
+```text
+RUNTIME_BASELINE_SCENARIO=fixed-medium
+```
+
+The default threshold artifact is:
+
+```text
+SLO_THRESHOLDS_JSON=../backend/lined/load-tests/runtime-scenarios/slo-thresholds-v1.json
+```
+
+When no database is configured, write the final document locally:
+
+```text
+METRICS_OUTPUT_JSON=/absolute/path/to/metrics-document.json
+```
+
+For a runtime-only local smoke check without structural CI reports or
+SonarCloud access, use:
+
+```text
+RUNTIME_ONLY=true
+```
+
+The default collector path still reads Checkstyle, SpotBugs, JaCoCo, and
+SonarCloud evidence. `RUNTIME_ONLY=true` is only for local runtime scoring
+experiments where the structural CI score is not being recomputed.
+
+## Output Contract
+
+The stored or local metrics document preserves the structural score:
+
+```json
+{
+  "fitnessScore": 0.1234,
+  "runtimeFitnessScore": 0.2185,
+  "runtimeFitnessScoreVersion": "runtime-aware-v1",
+  "runtimeFitness": {
+    "current": {
+      "scenario": "replicas-2",
+      "workload": "baseline",
+      "source": "local-kind"
+    },
+    "baseline": {
+      "scenario": "fixed-medium",
+      "workload": "baseline",
+      "source": "local-kind"
+    },
+    "eligibleForStableComparison": false
+  }
+}
+```
+
+`fitnessScore` remains the CI-only structural score. Runtime evidence is
+attached under `metrics.runtime_metrics`; runtime score metadata is attached
+under `runtimeFitness`.
+
+When `RUNTIME_ONLY=true`, the output document may contain
+`fitnessScore: null` because no structural CI evidence was collected. That
+does not redefine the field; it records that the runtime-only smoke path did
+not compute the structural score.
+
+## Runtime-Aware v1 Formula
+
+The score compares current runtime summary metrics against a baseline runtime
+summary. Each metric is normalized to `[-1, 1]` before weighting.
+
+Lower-is-better metrics use:
+
+```text
+(baseline - current) / baseline
+```
+
+Higher-is-better metrics use:
+
+```text
+(current - baseline) / baseline
+```
+
+If baseline and current are both zero, the normalized delta is `0`. If baseline
+is zero and current is non-zero, beneficial movement is `1` and harmful
+movement is `-1`. Missing metrics are omitted from the score and the active
+weights are re-normalized.
+
+| Metric | Direction | Weight |
+|--------|-----------|--------|
+| `latency_p95_ms` | lower is better | `0.20` |
+| `latency_p99_ms` | lower is better | `0.15` |
+| `error_rate` | lower is better | `0.20` |
+| `throughput_rps` | higher is better | `0.15` |
+| `availability` | higher is better | `0.15` |
+| `restart_count` | lower is better | `0.10` |
+| `cpu_utilization` | lower is better | `0.025` |
+| `memory_utilization` | lower is better | `0.025` |
+
+`hpa_current_replicas` and `hpa_desired_replicas` remain contextual evidence
+and are not scored directly in v1.
+
+## SLO Classification
+
+The collector classifies current runtime evidence against
+`slo-thresholds-v1.json` and records per-constraint results:
+
+- `valid` when evidence exists and satisfies the constraint;
+- `warning` when evidence exists and crosses a warning threshold;
+- `invalid` when evidence exists and violates a hard constraint;
+- `unknown` when required evidence is missing.
+
+`runtimeFitness.eligibleForStableComparison` is `false` when any hard
+constraint is `invalid` or `unknown`. The numeric runtime score may still be
+emitted when comparable current and baseline metrics exist, but eligibility
+keeps incomplete or unstable runs out of article-ready comparisons.
+
+Readiness remains external evidence. It is classified as `unknown` unless a
+future runtime summary contract adds a summarized readiness source.
+
+## Local Example
+
+```bash
+cd fitness-metrics-collector
+npm run build
+RUNTIME_ONLY=true \
+RUNTIME_METRICS_JSON=/absolute/path/current/runtime-summary.json \
+RUNTIME_BASELINE_METRICS_JSON=/absolute/path/baseline/runtime-summary.json \
+METRICS_OUTPUT_JSON=/absolute/path/output/metrics-document.json \
+npm run metrics
+```
+
+If `COSMOS_DB_CONNECTION_STRING` is absent, the collector writes the local
+output document when `METRICS_OUTPUT_JSON` is set and skips database
+persistence. Omit `RUNTIME_ONLY=true` when structural reports and `SONAR_TOKEN`
+are available and the run should also compute the structural `fitnessScore`.