Skip to content

Create comprehensive CI/CD test framework for E2E validation and component testing #26

@trsdn

Description

@trsdn

Problem Statement

Currently, there's no systematic way to test the CI/CD pipeline itself. We discovered multiple issues (Issues #17-#22) only when running a test PR. We need a comprehensive testing framework that can:

  1. Test CI/CD workflows end-to-end (E2E)
  2. Detect false positives (passing when should fail)
  3. Detect false negatives (failing when should pass)
  4. Test individual CI/CD components in isolation

Current Issues Found Without Proper Testing

Proposed Test Framework

1. E2E Test Suite

Create test scenarios that validate entire CI/CD pipelines:

# .github/workflows/test-ci-e2e.yml
name: CI/CD E2E Tests

on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly
  workflow_dispatch:

jobs:
  test-success-scenario:
    runs-on: ubuntu-latest
    steps:
      - name: Create clean PR
        run: |
          # Create PR with perfect code
          # Should pass all checks
      
      - name: Verify all checks pass
        run: |
          # Assert all workflows succeed
          
  test-failure-scenarios:
    runs-on: ubuntu-latest
    steps:
      - name: Test formatting failure
        run: |
          # Create PR with bad formatting
          # Should fail formatting check ONLY
          
      - name: Test coverage failure
        run: |
          # Create PR that reduces coverage
          # Should fail coverage check ONLY

2. False Positive Detection

Test that CI catches actual problems:

# tests/ci_validation/test_false_positives.py

class TestFalsePositives:
    """Ensure CI catches real issues (no false positives)."""
    
    def test_detects_formatting_issues(self):
        """CI should fail when code is unformatted."""
        # Create unformatted code
        # Run formatting check
        # Assert it fails
        
    def test_detects_coverage_drop(self):
        """CI should fail when coverage drops below threshold."""
        # Remove tests to drop coverage
        # Run coverage check
        # Assert it fails at exactly 80% threshold
        
    def test_detects_security_issues(self):
        """CI should catch security vulnerabilities."""
        # Introduce known vulnerable dependency
        # Run security scan
        # Assert it fails

3. False Negative Detection

Test that CI doesn't fail valid code:

# tests/ci_validation/test_false_negatives.py

class TestFalseNegatives:
    """Ensure CI doesn't fail good code (no false negatives)."""
    
    def test_accepts_formatted_code(self):
        """CI should pass properly formatted code."""
        # Create perfectly formatted code
        # Run all quality checks
        # Assert all pass
        
    def test_handles_edge_cases(self):
        """CI should handle edge cases gracefully."""
        # Test with:
        # - Empty PRs
        # - Documentation-only changes
        # - Large files
        # - Unicode content

4. Component Testing

Test individual workflow components:

# .github/workflows/test-ci-components.yml
name: Test CI Components

jobs:
  test-version-bump:
    runs-on: ubuntu-latest
    steps:
      - name: Test version detection
        run: |
          # Test with different commit types
          # Verify correct version bump detected
          
      - name: Test branch handling
        run: |
          # Test with existing branches
          # Verify proper cleanup/handling
          
  test-pr-summary:
    runs-on: ubuntu-latest
    steps:
      - name: Test status parsing
        run: |
          # Test with various check states
          # Verify proper JSON handling
          
      - name: Test output formatting
        run: |
          # Test markdown generation
          # Verify no syntax errors

5. Test Data Generation

Create test fixtures for various scenarios:

# tests/ci_validation/fixtures.py

TEST_SCENARIOS = {
    "perfect_pr": {
        "files": ["valid.py"],
        "formatting": "correct",
        "coverage": 85,
        "expected": "all_pass"
    },
    "formatting_issue": {
        "files": ["unformatted.py"],
        "formatting": "incorrect",
        "coverage": 85,
        "expected": "format_fail_only"
    },
    "coverage_drop": {
        "files": ["valid.py"],
        "formatting": "correct",
        "coverage": 75,
        "expected": "coverage_fail_only"
    }
}

6. CI/CD Health Dashboard

Create monitoring for CI/CD health:

# scripts/ci_health_check.py

def check_workflow_health():
    """Generate CI/CD health report."""
    
    checks = {
        "syntax_valid": validate_all_workflows(),
        "dependencies_current": check_action_versions(),
        "secrets_configured": verify_required_secrets(),
        "branch_rules_active": check_branch_protection(),
        "recent_failures": analyze_failure_patterns()
    }
    
    return generate_health_report(checks)

Implementation Plan

Phase 1: Test Infrastructure

  1. Create test framework structure
  2. Set up test data generators
  3. Create mock PR creation utilities

Phase 2: Core Tests

  1. Implement E2E success path tests
  2. Add basic failure scenario tests
  3. Create component unit tests

Phase 3: Advanced Testing

  1. Add false positive detection
  2. Add false negative detection
  3. Create edge case tests

Phase 4: Automation

  1. Schedule regular test runs
  2. Create health dashboard
  3. Add alerting for failures

Success Criteria

  • Can automatically test all CI workflows
  • Detects when CI incorrectly passes bad code
  • Detects when CI incorrectly fails good code
  • Can test individual workflow components
  • Provides clear reporting of CI health
  • Catches issues before they affect developers

Benefits

  1. Early Detection: Find CI issues before they block PRs
  2. Confidence: Know that CI checks are working correctly
  3. Documentation: Tests serve as CI behavior documentation
  4. Regression Prevention: Catch when changes break CI
  5. Debugging: Easier to identify CI issues

Test Matrix

Scenario Expected Result Test Type
Perfect code All pass False negative
Bad formatting Format fail only False positive
Low coverage Coverage fail only False positive
Syntax error Lint fail only False positive
Security issue Security fail only False positive
Docs only change All pass (or skip) False negative
CI file change Special workflow Component
Version bump Correct version Component

Monitoring Metrics

  • Workflow success rate
  • False positive rate
  • False negative rate
  • Average execution time
  • Flaky test frequency

Priority

High - Critical for maintaining CI/CD reliability and preventing issues like those found in #17-#22.

Related Issues

Acceptance Criteria

  • E2E tests cover all major workflows
  • False positive tests for each check type
  • False negative tests for edge cases
  • Component tests for complex workflows
  • Automated running on schedule
  • Clear documentation on adding new tests
  • Dashboard or report for CI health

Notes

This is a significant undertaking but will dramatically improve CI/CD reliability and developer experience. Should be implemented incrementally, starting with the most critical workflows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions