spotify · elbersb · Jan 13, 2026 · Dec 23, 2025 · Dec 23, 2025 · Dec 23, 2025
diff --git a/.flake8 b/.flake8
diff --git a/.github/workflows/confidence.yml b/.github/workflows/confidence.yml
@@ -13,18 +13,22 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: ['3.9', '3.10', '3.11']
+        python-version: ['3.9', '3.10', '3.11', '3.12']
 
     steps:
-    - uses: actions/checkout@v1
+    - uses: actions/checkout@v4
     - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ matrix.python-version }}
+    - name: Install uv
+      uses: astral-sh/setup-uv@v5
+      with:
+        enable-cache: true
+        cache-dependency-glob: "**/pyproject.toml"
     - name: Install dependencies
       run: |
-        python -m pip install --upgrade pip
-        if [ -f requirements_dev.txt ]; then pip install -r requirements_dev.txt; fi
-        python -m pip install tox tox-gh-actions
+        uv pip install --system -e ".[dev]"
+        uv pip install --system tox tox-gh-actions
     - name: Test with tox
       run: tox
diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
@@ -18,11 +18,11 @@ jobs:
     runs-on: ubuntu-latest
 
     steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v4
     - name: Set up Python
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
       with:
-        python-version: '3.9'
+        python-version: '3.11'
     - name: Install dependencies
       run: |
         python -m pip install --upgrade pip

diff --git a/.gitignore b/.gitignore
@@ -90,4 +90,8 @@ ENV/
 
 .DS_store
 
-.idea/
+.idea/
+
+# uv
+uv.lock
+.venv/
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,157 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Spotify Confidence is a Python library for A/B test analysis. It provides convenience wrappers around statsmodel's functions for computing p-values and confidence intervals. The library supports both frequentist (Z-test, Student's T-test, Chi-squared) and Bayesian (BetaBinomial) statistical methods, with features for variance reduction, sequential testing, and sample size calculations.
+
+## Development Commands
+
+### Setup
+```bash
+# Install with development dependencies (including tox-uv)
+uv pip install -e ".[dev]"
+```
+
+### Testing
+```bash
+# Run all tests with coverage
+uv run pytest
+
+# Run tests without coverage reports
+uv run pytest --no-cov
+
+# Run specific test file
+uv run pytest tests/frequentist/test_z_test.py
+
+# Run specific test
+uv run pytest tests/frequentist/test_z_test.py::test_name
+
+# Run all tests across Python versions
+uv run tox
+```
+
+### Code Quality
+```bash
+# Format code with black (line length: 119)
+uv run black spotify_confidence tests
+
+# Check formatting without making changes
+uv run black --check --diff spotify_confidence tests
+
+# Lint with flake8 (max line length: 120)
+uv run flake8 spotify_confidence tests
+
+# Run all quality checks (as done in CI)
+uv run black --check --diff spotify_confidence tests && uv run flake8 spotify_confidence tests && uv run pytest
+```
+
+### Build
+```bash
+# Build distribution packages
+uv run python -m build
+```
+
+## Architecture
+
+### Core Design Pattern
+
+The library follows an object-oriented design with separation of concerns:
+
+1. **Statistical Test Classes**: High-level APIs (`ZTest`, `StudentsTTest`, `ChiSquared`, `BetaBinomial`, `ZTestLinreg`)
+2. **Experiment Class**: Base class containing shared analysis methods for frequentist tests
+3. **Computer Classes**: Perform the actual statistical computations
+4. **Grapher Classes**: Generate visualizations using Chartify
+
+All main test classes inherit from abstract base classes in `spotify_confidence/analysis/abstract_base_classes/`:
+- `ConfidenceABC`: Base for all statistical test classes
+- `ConfidenceComputerABC`: Base for computation logic
+- `ConfidenceGrapherABC`: Base for visualization logic
+
+### Module Structure
+
+```
+spotify_confidence/
+├── analysis/
+│   ├── abstract_base_classes/    # ABC definitions for the framework
+│   ├── frequentist/               # Frequentist statistical methods
+│   │   ├── confidence_computers/  # Statistical computation logic
+│   │   ├── experiment.py          # Base class for frequentist tests
+│   │   ├── z_test.py              # Z-test implementation
+│   │   ├── t_test.py              # Student's T-test implementation
+│   │   ├── chi_squared.py         # Chi-squared test
+│   │   ├── z_test_linreg.py       # Z-test with linear regression variance reduction
+│   │   ├── sequential_bound_solver.py  # Group sequential testing
+│   │   ├── multiple_comparison.py # Multiple testing correction
+│   │   └── sample_size_calculator.py
+│   ├── bayesian/                  # Bayesian methods
+│   │   └── bayesian_models.py     # BetaBinomial implementation
+│   ├── constants.py               # Shared constants
+│   └── confidence_utils.py        # Shared utility functions
+├── samplesize/                    # Sample size calculations
+├── examples.py                    # Example data generators
+├── chartgrid.py                   # Chart grid utilities
+└── options.py                     # Global configuration
+```
+
+### Key Classes and Their Relationships
+
+- **Experiment** (in `frequentist/experiment.py`): The core base class for frequentist tests. Provides methods like:
+  - `summary()`: Overall metric summaries
+  - `difference()`: Pairwise comparisons
+  - `multiple_difference()`: Multiple comparisons with correction
+  - `difference_plot()`, `summary_plot()`, etc.: Visualization methods
+  - `sample_size()`: Required sample size calculations
+  - `statistical_power()`: Power analysis
+
+- **ZTest, StudentsTTest, ChiSquared**: Thin wrappers that initialize `Experiment` with the appropriate computer and method
+
+- **Computer Classes** (in `frequentist/confidence_computers/`): Handle the statistical calculations
+  - `ZTestComputer`, `TTestComputer`, `ChiSquaredComputer`: Specific computation implementations
+  - All inherit from `ConfidenceComputerABC`
+
+- **ChartifyGrapher**: Implements visualization using the Chartify library
+
+### Data Model
+
+The library works with DataFrames containing sufficient statistics:
+- `numerator_column`: Sum or count (e.g., sum of conversions)
+- `denominator_column`: Total observations (e.g., total users)
+- `numerator_sum_squares_column`: Sum of squares (optional, for variance calculations)
+- `categorical_group_columns`: Treatment/control groups and other dimensions
+- `ordinal_group_column`: Time-based grouping for sequential analysis
+
+### Important Conventions
+
+1. **Method Column**: Tests add a `METHOD_COLUMN_NAME` to data indicating the test type (e.g., "z-test", "t-test")
+
+2. **Multiple Comparison Correction**: Supported methods defined in `constants.py`:
+   - Standard: bonferroni, holm, hommel, sidak, FDR methods
+   - SPOT-1 variants: Custom Spotify methods for specific use cases
+
+3. **Non-Inferiority Margins (NIMs)**: Can be specified as absolute values or relative percentages
+
+4. **Sequential Testing**: The `sequential_bound_solver.py` module implements group sequential designs with spending functions
+
+5. **Variance Reduction**: `ZTestLinreg` uses pre-exposure data to fit a linear model and reduce variance (CUPED method)
+
+## Testing Guidelines
+
+- Tests are organized to mirror the source structure under `tests/`
+- Use pytest fixtures for common test data
+- Tests check both DataFrame outputs and chart generation
+- Coverage target is configured in `pyproject.toml`
+
+## Python Version Support
+
+Supports Python 3.9, 3.10, 3.11, and 3.12. The `tox.ini` includes a `py39-min` environment that tests with minimum dependency versions.
+
+The project uses `tox-uv` to leverage uv's fast package installation and environment management in tox, significantly speeding up multi-environment testing. The GitHub Actions CI workflow also uses uv for faster dependency installation.
+
+## Code Style
+
+- Black formatting with 119 character line length
+- Flake8 linting with max line length 120
+- Ignored flake8 rules: E203, E231, W503
+- Excluded from linting: `.venv`, `.tox`, `dist`, `build`, `scratch.py`, `confidence_dev`
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -57,41 +57,55 @@ Get Started!
 
 Ready to contribute? Here's how to set up `confidence` for local development.
 
+**Prerequisites:**
+
+* `uv <https://docs.astral.sh/uv/>`_ - Fast Python package installer (recommended)
+* Python 3.9 or later
+
 1. Fork the `confidence` repo on GitHub.
 2. Clone your fork locally::
 
-    $ git clone https://github.com/spotify/confidence
+    $ git clone git@github.com:your_username/confidence.git
+    $ cd confidence
+
+3. Set up your development environment using uv::
+
+    $ uv venv
+    $ uv pip install -e ".[dev]"
 
-3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::
+   This creates a virtual environment and installs the package in editable mode with all development dependencies.
 
-    $ mkvirtualenv confidence_dev
-    $ cd confidence/
-    $ tox
+4. Verify your setup by running the tests::
 
-   The tox command will install the dev requirements in requirements_dev.txt and run all tests.
+    $ uv run pytest
 
-4. Create a branch for local development::
+   This should run all tests and show they pass.
+
+5. Create a branch for local development::
 
     $ git checkout -b name-of-your-bugfix-or-feature
 
    Now you can make your changes locally.
 
-5. When you're done making changes, format using `make black`, check that your changes pass flake8 and the tests, including testing other Python versions with tox::
+6. When you're done making changes, check that your changes pass all quality checks::
+
+    $ uv run black spotify_confidence tests --line-length 119  # Format code
+    $ uv run flake8 spotify_confidence tests                   # Lint code
+    $ uv run pytest                                            # Run tests
+
+   To test across all supported Python versions (3.9, 3.10, 3.11, 3.12)::
 
-    $ make black
-    $ flake8 confidence tests
-    $ python setup.py test or py.test
-    $ tox
+    $ uv run tox -p auto
 
-   To get flake8 and tox, just pip install them into your virtualenv.
+   Note: tox requires all Python versions to be installed on your system.
 
-6. Commit your changes and push your branch to GitHub::
+7. Commit your changes and push your branch to GitHub::
 
     $ git add .
     $ git commit -m "Your detailed description of your changes."
     $ git push origin name-of-your-bugfix-or-feature
 
-7. Submit a pull request through the GitHub website.
+8. Submit a pull request through the GitHub website.
 
 Pull Request Guidelines
 -----------------------
@@ -101,23 +115,36 @@ Before you submit a pull request, check that it meets these guidelines:
 1. The pull request should include tests.
 2. If the pull request adds functionality, the docs should be updated. Put
    your new functionality into a function with a docstring, and add the
-   feature to the list in README.rst.
-3. The pull request should work for Python 3.6 and 3.7. Check
-   and make sure that the tests pass for all supported Python versions.
+   feature to the list in README.md.
+3. The pull request should work for Python 3.9, 3.10, 3.11, and 3.12. The CI
+   pipeline will automatically test all supported Python versions.
 
 Tips
 ----
 
 To run a subset of tests::
 
-$ py.test tests.test_confidence
+    $ uv run pytest tests/frequentist/test_ttest.py
+
+To run a specific test::
+
+    $ uv run pytest tests/frequentist/test_ttest.py::TestCategorical::test_summary
+
+To run tests with verbose output::
+
+    $ uv run pytest -v
+
+To see test coverage::
+
+    $ uv run pytest --cov=spotify_confidence --cov-report=html
+    $ open htmlcov/index.html
 
 
 Release Process
 -----------------------
 
 While commits and pull requests are welcome from  any contributor, we try to
-simplify the distribution process for everyone by managing the release 
+simplify the distribution process for everyone by managing the release
 process with specific contributors serving in the role of Release Managers.
 
 Release Managers are responsible for:
@@ -142,7 +169,7 @@ PATCH version when you make backwards-compatible bug fixes.
 
 Release Stategy
 ~~~~~~~~~~~~~~~~
-Each new release will be made on its own branch, with the branch Master 
+Each new release will be made on its own branch, with the branch Master
 representing the most recent, furthest release. Releases are published to PyPi
 automatically once a new release branch is merged to Master. Additionally,
 rew releases are also tracked manually on `github

diff --git a/MANIFEST.in b/MANIFEST.in
diff --git a/Makefile b/Makefile
@@ -47,14 +47,17 @@ clean-test: ## remove test and coverage artifacts
 	rm -f .coverage
 	rm -fr htmlcov/
 
+format: ## format code with black
+	black spotify_confidence tests --line-length 119
+
 lint: ## check style with flake8
-	flake8 confidence tests
+	flake8 spotify_confidence tests
 
 test: ## run tests quickly with the default Python
 	python3 -m pytest
 
 coverage: ## check code coverage quickly with the default Python
-	coverage run --source confidence -m pytest
+	coverage run --source spotify_confidence -m pytest
 	coverage report -m
 	coverage html
 	$(BROWSER) htmlcov/index.html
@@ -86,10 +89,8 @@ install: clean ## install the package to the active Python's site-packages
 	pip install -e .
 
 install-test: clean
-	pip3 install --index-url https://test.pypi.org/simple/ confidence-spotify
+	pip3 install --index-url https://test.pypi.org/simple/ spotify-confidence
 
 install-prod: clean
-	pip3 install confidence-spotify
+	pip3 install spotify-confidence
 
-black:
-	black spotify_confidence tests --line-length 119
-Original file line number
+Diff line change
@@ Expand Up / @@ -90,4 +90,8 @@ ENV/ @@
     .DS_store
-    .idea/
+    .idea/
+    # uv
+    uv.lock
+    .venv/