Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .flake8

This file was deleted.

16 changes: 10 additions & 6 deletions .github/workflows/confidence.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,22 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.9', '3.10', '3.11']
python-version: ['3.9', '3.10', '3.11', '3.12']

steps:
- uses: actions/checkout@v1
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true
cache-dependency-glob: "**/pyproject.toml"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
if [ -f requirements_dev.txt ]; then pip install -r requirements_dev.txt; fi
python -m pip install tox tox-gh-actions
uv pip install --system -e ".[dev]"
uv pip install --system tox tox-gh-actions
- name: Test with tox
run: tox
6 changes: 3 additions & 3 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: '3.9'
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
Expand Down
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,8 @@ ENV/

.DS_store

.idea/
.idea/

# uv
uv.lock
.venv/
157 changes: 157 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Spotify Confidence is a Python library for A/B test analysis. It provides convenience wrappers around statsmodel's functions for computing p-values and confidence intervals. The library supports both frequentist (Z-test, Student's T-test, Chi-squared) and Bayesian (BetaBinomial) statistical methods, with features for variance reduction, sequential testing, and sample size calculations.

## Development Commands

### Setup
```bash
# Install with development dependencies (including tox-uv)
uv pip install -e ".[dev]"
```

### Testing
```bash
# Run all tests with coverage
uv run pytest

# Run tests without coverage reports
uv run pytest --no-cov

# Run specific test file
uv run pytest tests/frequentist/test_z_test.py

# Run specific test
uv run pytest tests/frequentist/test_z_test.py::test_name

# Run all tests across Python versions
uv run tox
```

### Code Quality
```bash
# Format code with black (line length: 119)
uv run black spotify_confidence tests

# Check formatting without making changes
uv run black --check --diff spotify_confidence tests

# Lint with flake8 (max line length: 120)
uv run flake8 spotify_confidence tests

# Run all quality checks (as done in CI)
uv run black --check --diff spotify_confidence tests && uv run flake8 spotify_confidence tests && uv run pytest
```

### Build
```bash
# Build distribution packages
uv run python -m build
```

## Architecture

### Core Design Pattern

The library follows an object-oriented design with separation of concerns:

1. **Statistical Test Classes**: High-level APIs (`ZTest`, `StudentsTTest`, `ChiSquared`, `BetaBinomial`, `ZTestLinreg`)
2. **Experiment Class**: Base class containing shared analysis methods for frequentist tests
3. **Computer Classes**: Perform the actual statistical computations
4. **Grapher Classes**: Generate visualizations using Chartify

All main test classes inherit from abstract base classes in `spotify_confidence/analysis/abstract_base_classes/`:
- `ConfidenceABC`: Base for all statistical test classes
- `ConfidenceComputerABC`: Base for computation logic
- `ConfidenceGrapherABC`: Base for visualization logic

### Module Structure

```
spotify_confidence/
├── analysis/
│ ├── abstract_base_classes/ # ABC definitions for the framework
│ ├── frequentist/ # Frequentist statistical methods
│ │ ├── confidence_computers/ # Statistical computation logic
│ │ ├── experiment.py # Base class for frequentist tests
│ │ ├── z_test.py # Z-test implementation
│ │ ├── t_test.py # Student's T-test implementation
│ │ ├── chi_squared.py # Chi-squared test
│ │ ├── z_test_linreg.py # Z-test with linear regression variance reduction
│ │ ├── sequential_bound_solver.py # Group sequential testing
│ │ ├── multiple_comparison.py # Multiple testing correction
│ │ └── sample_size_calculator.py
│ ├── bayesian/ # Bayesian methods
│ │ └── bayesian_models.py # BetaBinomial implementation
│ ├── constants.py # Shared constants
│ └── confidence_utils.py # Shared utility functions
├── samplesize/ # Sample size calculations
├── examples.py # Example data generators
├── chartgrid.py # Chart grid utilities
└── options.py # Global configuration
```

### Key Classes and Their Relationships

- **Experiment** (in `frequentist/experiment.py`): The core base class for frequentist tests. Provides methods like:
- `summary()`: Overall metric summaries
- `difference()`: Pairwise comparisons
- `multiple_difference()`: Multiple comparisons with correction
- `difference_plot()`, `summary_plot()`, etc.: Visualization methods
- `sample_size()`: Required sample size calculations
- `statistical_power()`: Power analysis

- **ZTest, StudentsTTest, ChiSquared**: Thin wrappers that initialize `Experiment` with the appropriate computer and method

- **Computer Classes** (in `frequentist/confidence_computers/`): Handle the statistical calculations
- `ZTestComputer`, `TTestComputer`, `ChiSquaredComputer`: Specific computation implementations
- All inherit from `ConfidenceComputerABC`

- **ChartifyGrapher**: Implements visualization using the Chartify library

### Data Model

The library works with DataFrames containing sufficient statistics:
- `numerator_column`: Sum or count (e.g., sum of conversions)
- `denominator_column`: Total observations (e.g., total users)
- `numerator_sum_squares_column`: Sum of squares (optional, for variance calculations)
- `categorical_group_columns`: Treatment/control groups and other dimensions
- `ordinal_group_column`: Time-based grouping for sequential analysis

### Important Conventions

1. **Method Column**: Tests add a `METHOD_COLUMN_NAME` to data indicating the test type (e.g., "z-test", "t-test")

2. **Multiple Comparison Correction**: Supported methods defined in `constants.py`:
- Standard: bonferroni, holm, hommel, sidak, FDR methods
- SPOT-1 variants: Custom Spotify methods for specific use cases

3. **Non-Inferiority Margins (NIMs)**: Can be specified as absolute values or relative percentages

4. **Sequential Testing**: The `sequential_bound_solver.py` module implements group sequential designs with spending functions

5. **Variance Reduction**: `ZTestLinreg` uses pre-exposure data to fit a linear model and reduce variance (CUPED method)

## Testing Guidelines

- Tests are organized to mirror the source structure under `tests/`
- Use pytest fixtures for common test data
- Tests check both DataFrame outputs and chart generation
- Coverage target is configured in `pyproject.toml`

## Python Version Support

Supports Python 3.9, 3.10, 3.11, and 3.12. The `tox.ini` includes a `py39-min` environment that tests with minimum dependency versions.

The project uses `tox-uv` to leverage uv's fast package installation and environment management in tox, significantly speeding up multi-environment testing. The GitHub Actions CI workflow also uses uv for faster dependency installation.

## Code Style

- Black formatting with 119 character line length
- Flake8 linting with max line length 120
- Ignored flake8 rules: E203, E231, W503
- Excluded from linting: `.venv`, `.tox`, `dist`, `build`, `scratch.py`, `confidence_dev`
69 changes: 48 additions & 21 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,41 +57,55 @@ Get Started!

Ready to contribute? Here's how to set up `confidence` for local development.

**Prerequisites:**

* `uv <https://docs.astral.sh/uv/>`_ - Fast Python package installer (recommended)
* Python 3.9 or later

1. Fork the `confidence` repo on GitHub.
2. Clone your fork locally::

$ git clone https://github.com/spotify/confidence
$ git clone git@github.com:your_username/confidence.git
$ cd confidence

3. Set up your development environment using uv::

$ uv venv
$ uv pip install -e ".[dev]"

3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::
This creates a virtual environment and installs the package in editable mode with all development dependencies.

$ mkvirtualenv confidence_dev
$ cd confidence/
$ tox
4. Verify your setup by running the tests::

The tox command will install the dev requirements in requirements_dev.txt and run all tests.
$ uv run pytest

4. Create a branch for local development::
This should run all tests and show they pass.

5. Create a branch for local development::

$ git checkout -b name-of-your-bugfix-or-feature

Now you can make your changes locally.

5. When you're done making changes, format using `make black`, check that your changes pass flake8 and the tests, including testing other Python versions with tox::
6. When you're done making changes, check that your changes pass all quality checks::

$ uv run black spotify_confidence tests --line-length 119 # Format code
$ uv run flake8 spotify_confidence tests # Lint code
$ uv run pytest # Run tests

To test across all supported Python versions (3.9, 3.10, 3.11, 3.12)::

$ make black
$ flake8 confidence tests
$ python setup.py test or py.test
$ tox
$ uv run tox -p auto

To get flake8 and tox, just pip install them into your virtualenv.
Note: tox requires all Python versions to be installed on your system.

6. Commit your changes and push your branch to GitHub::
7. Commit your changes and push your branch to GitHub::

$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature

7. Submit a pull request through the GitHub website.
8. Submit a pull request through the GitHub website.

Pull Request Guidelines
-----------------------
Expand All @@ -101,23 +115,36 @@ Before you submit a pull request, check that it meets these guidelines:
1. The pull request should include tests.
2. If the pull request adds functionality, the docs should be updated. Put
your new functionality into a function with a docstring, and add the
feature to the list in README.rst.
3. The pull request should work for Python 3.6 and 3.7. Check
and make sure that the tests pass for all supported Python versions.
feature to the list in README.md.
3. The pull request should work for Python 3.9, 3.10, 3.11, and 3.12. The CI
pipeline will automatically test all supported Python versions.

Tips
----

To run a subset of tests::

$ py.test tests.test_confidence
$ uv run pytest tests/frequentist/test_ttest.py

To run a specific test::

$ uv run pytest tests/frequentist/test_ttest.py::TestCategorical::test_summary

To run tests with verbose output::

$ uv run pytest -v

To see test coverage::

$ uv run pytest --cov=spotify_confidence --cov-report=html
$ open htmlcov/index.html


Release Process
-----------------------

While commits and pull requests are welcome from any contributor, we try to
simplify the distribution process for everyone by managing the release
simplify the distribution process for everyone by managing the release
process with specific contributors serving in the role of Release Managers.

Release Managers are responsible for:
Expand All @@ -142,7 +169,7 @@ PATCH version when you make backwards-compatible bug fixes.

Release Stategy
~~~~~~~~~~~~~~~~
Each new release will be made on its own branch, with the branch Master
Each new release will be made on its own branch, with the branch Master
representing the most recent, furthest release. Releases are published to PyPi
automatically once a new release branch is merged to Master. Additionally,
rew releases are also tracked manually on `github
Expand Down
10 changes: 0 additions & 10 deletions MANIFEST.in

This file was deleted.

13 changes: 7 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,17 @@ clean-test: ## remove test and coverage artifacts
rm -f .coverage
rm -fr htmlcov/

format: ## format code with black
black spotify_confidence tests --line-length 119

lint: ## check style with flake8
flake8 confidence tests
flake8 spotify_confidence tests

test: ## run tests quickly with the default Python
python3 -m pytest

coverage: ## check code coverage quickly with the default Python
coverage run --source confidence -m pytest
coverage run --source spotify_confidence -m pytest
coverage report -m
coverage html
$(BROWSER) htmlcov/index.html
Expand Down Expand Up @@ -86,10 +89,8 @@ install: clean ## install the package to the active Python's site-packages
pip install -e .

install-test: clean
pip3 install --index-url https://test.pypi.org/simple/ confidence-spotify
pip3 install --index-url https://test.pypi.org/simple/ spotify-confidence

install-prod: clean
pip3 install confidence-spotify
pip3 install spotify-confidence

black:
black spotify_confidence tests --line-length 119
Loading