Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
268 changes: 268 additions & 0 deletions S-CORE_Quality_Maintenance_Proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
# S-CORE Quality Maintenance Proposal

**Author:** Mahale Komal
**Date:** May 6, 2026
**For:** Ahmed & Jan

---

## Summary

This paper describes how S-CORE can define, track, and enforce key quality KPIs.

Examples of these KPIs are:
- 100% line coverage
- No clang-tidy issues
- No CodeQL issues
- No compiler warnings

The key question is not only which targets we want to achieve, but also how we want to steer and sustain them as a team.

I suggest a **hybrid quality plan** for S-CORE.

This means:
- Run **fast checks on every PR** so developers get quick feedback.
- Run **full quality checks at night** so we still catch deep issues.
- Show results in a **shared quality dashboard** so everyone can track quality.

**Proposed Model:**
- **PR checks (fast):** Build, unit tests, formatting, and basic lint.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having quality jobs (CodeQL, Clang-tidy, Coverage) on every PR but it got triggered (manually) only once after the review process is done so we don't need to run quality jobs after each commit on the same time we are sure that no PRs introduce new findings?

- **Nightly checks (full):** CodeQL, full clang-tidy, sanitizers, coverage, and static analysis.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we got some findings, When do you think we should fix it? Do we fix it immediately or When?

- **Reporting:** Export quality numbers to a shared dashboard.

---

## Problem Statement

The team needs a clear way to manage software quality.

Today, we can define quality targets, but that is not enough. We also need a practical way to:
- See the current status of these KPIs
- Take action when quality goes down
- Stop important regressions from being merged

Ahmed also made an important point. Some quality rules may need to be enforced directly in CI instead of only being shown in reports.

For this reason, this proposal should be treated as a decision paper. It should compare the available options and suggest one approach for the team to discuss.

---

## Options Considered

### Option 1: Dashboard-Based Monitoring

In this model, the team tracks quality KPIs through a shared dashboard. The dashboard provides visibility into metrics such as coverage, static analysis findings, security findings, and compiler warnings.

**Pros**
- Clear visibility of quality status
- Supports trend monitoring over time
- Encourages transparency and team ownership
- Useful for reporting and management review

**Cons**
- Relies on manual team reaction
- Does not prevent low-quality code from being merged
- Loses effectiveness if the dashboard is not reviewed regularly

### Option 2: CI Enforcement

In this model, quality gates are enforced directly in the CI pipeline. If code does not meet the agreed thresholds, the build or merge process fails.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check How much time it cost to run (CodeQL, Clang-tidy, Coverage) ? What is the average time for each job?


**Pros**
- Enforces quality standards automatically
- Prevents regressions from being merged
- Makes expectations clear and measurable
- Reduces dependence on manual follow-up

**Cons**
- Can slow down development if thresholds are too strict from the beginning
- May create friction if the current codebase is not ready for full enforcement
- Requires careful rollout and baseline definition

### Option 3: Combined Approach

In this model, dashboard reporting and CI enforcement are used together. The dashboard provides visibility and trend monitoring, while CI enforces the most critical quality gates.

**Pros**
- Balances visibility with enforcement
- Supports both team awareness and process control
- Helps track improvement over time while preventing regressions
- More robust than using only one approach

**Cons**
- Requires more setup effort
- Needs agreement on which KPIs are monitored only and which are enforced in CI

---

## Pros
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under which option are then pros cons ?!


### 1. Faster PR Feedback
- PR checks can finish in about 5 to 10 minutes.
- Developers get results quickly.
- This helps teams work faster.

### 2. Better Quality Coverage
- Nightly CodeQL can find security problems.
- Nightly full clang-tidy can find more code warnings.
- Nightly sanitizers can find memory leaks and race conditions.
- Coverage trend can be tracked in the dashboard.

### 3. Clear Visibility for Teams
- A shared dashboard shows quality trends.
- Everyone can see the same data.
- Teams can spot problems early.

### 4. Proven Approach
- Many teams use this model: fast PR checks + full nightly checks.
- SPP already follows a similar dashboard-based idea.

---

## Cons

### 1. Some Issues Are Found Later
- A PR may pass fast checks but fail nightly checks.
- Some issues may be found the next day.
- We need a clear rule for how fast these failures must be fixed.

### 2. Dashboard Setup Needs Effort
- We need to connect CI results to a dashboard solution.
- Dashboard design and setup will take time.
- We may reuse ideas from SPP.

### 3. Merge Risk Can Increase
- If full checks are nightly, a bug can be merged before night run.
- We need a quick fix process for nightly failures.

### 4. Team Ownership Is Needed
- We must define who handles nightly failures.
- We must define expected fix time.

### 5. More CI Config to Maintain
- PR and nightly jobs are different.
- We need good documentation so nothing is missed.

---

## Implementation Roadmap

### Phase 1 : Start Nightly Quality Jobs
- [ ] Create a nightly workflow for CodeQL and full clang-tidy.
- [ ] Create a nightly workflow for sanitizers (TSAN, ASAN, LSAN).
- [ ] Export coverage result from nightly job.
- [ ] Define fix-time rule for nightly failures.

### Phase 2 : Add Shared Dashboard
- [ ] Decide which numbers to show: coverage, warnings, test pass rate, defects.
- [ ] Choose the dashboard tool. Options can include Grafana, a static HTML dashboard, SonarQube Community Edition, or Power BI if a company license is available.
- [ ] Push nightly data to the selected dashboard.
- [ ] Share dashboard with S-CORE teams.

---

## Questions

1. Which checks must run on every PR and block merge?
2. Which checks can run only at night?
3. What is the maximum allowed PR CI time?
4. Is full CodeQL needed on PR, or is nightly enough?
5. Is full clang-tidy needed on PR, or nightly only?
6. Which quality numbers should we show in the dashboard?
7. Who will handle nightly failures, and how fast should they be fixed?
8. Can we create an S-CORE dashboard like SPP?

---

## Dashboard Tool Options

If Power BI is not preferred, we can use one of these options:

### Option A: Grafana
- Open source and widely used
- Good for trend charts and team dashboards
- Needs a data source and some setup effort

### Option B: Static HTML Dashboard
- Low-cost and simple
- CI can generate and publish a dashboard after nightly jobs
- Good for basic KPI reporting

### Option C: SonarQube Community Edition
- Free and self-hosted
- Good for code quality and coverage reporting
- May not cover all KPI types without extra integration

### Option D: Power BI
- Good reporting and visualization
- Desktop is free, but cloud sharing usually needs a paid license
- Useful only if company licensing is already available

---

## Recommendation

Use the hybrid model, which corresponds to **Option 3: Combined Approach**.

Why:
- Fast PR checks keep developer feedback quick.
- Nightly full checks keep code quality high.
- Cost is lower than running everything on every PR.
- A shared dashboard gives clear quality visibility.

This recommendation does not assume that reporting alone is sufficient. Instead, it combines:
- **Visibility** through dashboard reporting
- **Control** through CI enforcement for the most critical checks

For example:
- CI can enforce no compiler warnings, no critical CodeQL issues, and no new clang-tidy violations.
- The dashboard can track broader trends such as overall coverage progress and quality status over time.

This approach is practical and can be started step by step.

---

**Next Steps:**

1. Agree on the final list of KPIs.
2. Decide which KPIs must block merges in CI and which KPIs are tracked through reporting.
3. Run a short pilot: nightly CodeQL + coverage.
4. Review results with team.
5. Select the dashboard tool and start rollout.

---

## One-Page Version

### Goal
Keep S-CORE quality high, but keep PR feedback fast and CI cost under control.

### Simple Plan
- Run fast checks on every PR (build, tests, formatting).
- Run full quality checks at night (CodeQL, full clang-tidy, sanitizers, coverage).
- Show quality trends in a shared dashboard.

### Why This Plan
- Developers get quick PR feedback.
- Deep quality issues are still checked every day.
- CI cost is lower than running all checks on every PR.
- Quality status is visible for all teams.

### Risks
- Some issues are found at night, not immediately on PR.
- We need clear ownership for nightly failures.
- Dashboard setup needs some initial effort.

### What Is Already in S-CORE (Observed)
- PR build and test checks are present.
- Formatting checks are present.
- Coverage and sanitizer workflows exist but are not PR-blocking checks.

### What Is Missing (Observed)
- No dedicated CodeQL GitHub Actions workflow.
- No nightly scheduled quality workflow.
- No KPI export/dashboard flow to a shared reporting tool.


### Recommendation
Start with the hybrid model now, run a short pilot, and then finalize targets with Jan based on CI time, cost, and team capacity.