Git-Aware A/B Comparison Runner

### Future Feature Idea: Git-Aware A/B Comparison Runner

**Summary:**

Create a new, high-level benchmark runner that is aware of the version control system (Git). This runner would be capable of checking out two different commits, running the same benchmark against both versions within a single session, and performing a paired-sample statistical analysis to provide a highly robust and reliable comparison. This would transform the framework from a baseline-comparison tool into a true A/B performance testing system.

**Problem Solved:**

Traditional baseline comparison (like `pytest-benchmark`) suffers from the "noisy environment" problem. A performance change can be masked by variations in system load between runs, leading to flaky tests and low confidence in results. This new model would eliminate environmental noise as a confounding variable by running both versions of the code in an interleaved fashion under the exact same system conditions.

**Proposed Workflow:**

The user would invoke the benchmark from the command line with references to two Git commits:

```bash
# Compare the current working directory against the 'main' branch
simplebench --compare-with main

# Compare two specific commits
simplebench --compare b1c3a4d --with a0f9e8b
```

The output would be a direct, high-confidence statement about the performance difference:

> **Benchmark: `process_data`**
> - **Commit `b1c3a4d` is 15.2% slower (±1.8%) than commit `a0f9e8b`.**
> - The difference is statistically significant (p < 0.001).
> - **Result: REGRESSION DETECTED**

**Technical Implementation Plan:**

1.  **CLI Enhancement (`cli.py`):**
    *   Add a new command group or arguments like `--compare` and `--with` to the main `simplebench` CLI entry point.

2.  **Git Interaction Layer:**
    *   Create a new module responsible for interacting with the Git repository. This could use a library like `GitPython` or the `subprocess` module.
    *   It needs to handle:
        *   Identifying the current branch/commit.
        *   Checking out specific commits to a temporary directory to avoid disturbing the user's working tree.
        *   Installing dependencies for each checked-out version (e.g., by running `pip install -e .` in the temporary directory).

3.  **Dynamic Code Loading:**
    *   The runner will need to dynamically import the benchmark functions from the two different checked-out versions of the code. Python's `importlib` will be essential here.

4.  **New "Comparison Runner" (`runners.py`):**
    *   Create a new `ComparisonRunner` class.
    *   This runner will orchestrate the process:
        *   Set up the two temporary environments for commit A and commit B.
        *   Load `function_A` and `function_B`.
        *   In a loop, run the functions in an interleaved or randomized order (`A, B, A, B, ...` or `B, A, A, B, ...`).
        *   Collect the raw timing/measurement data for both versions into two separate `Results` objects.

5.  **Paired Statistical Analysis (`stats/`):**
    *   Enhance the `stats` module to include paired-sample statistical tests (e.g., a paired t-test).
    *   This test will operate on the two lists of results from the `ComparisonRunner` to determine if the difference between them is statistically significant.

6.  **New Reporter Mode (`reporters/`):**
    *   The existing `Reporter` system will need a new mode or a new dedicated `ComparisonReporter` to format and display the results of the paired analysis, including the percentage change, confidence intervals, and the p-value.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Git-Aware A/B Comparison Runner #2

Future Feature Idea: Git-Aware A/B Comparison Runner

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Git-Aware A/B Comparison Runner #2

Description

Future Feature Idea: Git-Aware A/B Comparison Runner

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions