A Python library for Multi-armed Bandit problems supporting both binary outcomes (LogisticBandit) and continuous rewards (LinearBandit).
-
LogisticBandit: For binary outcomes (clicks, conversions, etc.) using logistic regression with Thompson Sampling. Implements Full-Rank Thompson Sampling (Full-TS) and Odds Ratio Thompson Sampling (ORTS), as described in this paper.
-
LinearBandit: For continuous outcomes (revenue, latency, ratings, etc.) using Gaussian Thompson Sampling with Bayesian updating.
ORTS is generally preferred over Full-TS or Beta-Bernoulli Thompson Sampling for binary outcomes because it is more robust to time-varying effects, making it ideal for real-world A/B testing scenarios where user behavior changes over time.
-
β Multiple Bandit Types:
- LogisticBandit for binary outcomes (clicks, conversions)
- ORTS (Odds Ratio Thompson Sampling) - robust to time-varying effects
- Full-TS (Full Rank Thompson Sampling) - traditional approach
- LinearBandit for continuous rewards (revenue, latency, ratings)
- Gaussian Thompson Sampling with conjugate Bayesian updates
- Beta-Bernoulli Thompson Sampling - simple baseline for binary outcomes
- LogisticBandit for binary outcomes (clicks, conversions)
-
β Type-Safe: Full type hints for better IDE support and code safety
-
β Input Validation: Comprehensive validation with clear error messages
-
β Well-Documented: Detailed docstrings in NumPy style
-
β Configurable: Flexible parameters for exploration/exploitation trade-offs
-
β Numerically Stable: Robust handling of edge cases and numerical issues
-
β Well-Tested: Comprehensive test suite with 100+ unit tests and 88%+ coverage
# Clone the repository
git clone https://github.com/sulgik/orts.git
cd orts
# Install dependencies
pip install -r requirements.txt
# Install the package
pip install -e .# Install with development dependencies
pip install -e ".[dev]"Use LogisticBandit when your outcomes are binary (success/failure, click/no-click, convert/not-convert).
from logisticbandit import LogisticBandit
# Create a LogisticBandit instance (ORTS by default)
bandit = LogisticBandit()
# First observation: arm_1 had 300 successes out of 30000 trials
obs = {"arm_1": [30000, 300], "arm_2": [30000, 290]}
bandit.update(obs)
# Get winning probabilities for each arm
probabilities = bandit.win_prop()
print(probabilities)
# Output: {'arm_1': 0.543, 'arm_2': 0.457}
# Get the list of tracked arms
print(bandit.get_models())
# Output: ['arm_1', 'arm_2']# Add a new arm (arm_3) to the existing model
obs = {"arm_1": [20000, 200], "arm_2": [20000, 180], "arm_3": [20000, 210]}
bandit.update(obs)
probabilities = bandit.win_prop()
print(probabilities)
# Output: {'arm_1': 0.312, 'arm_2': 0.287, 'arm_3': 0.401}# Use Full-TS instead of ORTS
obs = {"arm_1": [30000, 310], "arm_3": [30000, 300]}
bandit.update(obs, odds_ratios_only=False)
probabilities = bandit.win_prop()# Adjust exploration/exploitation trade-off
probabilities = bandit.win_prop(aggressive=2.0) # More exploitation
probabilities = bandit.win_prop(aggressive=0.5) # More exploration
# Use decay to discount old observations (useful for non-stationary environments)
bandit.update(obs, decay=0.1) # 10% discount on prior information
# Adjust sampling precision
probabilities = bandit.win_prop(draw=50000) # More samples = more accuratefrom ts import TSPar
# Traditional Beta-Bernoulli approach
ts_bandit = TSPar()
obs = {"arm_1": [1000, 50], "arm_2": [1000, 45]}
ts_bandit.update(obs)
probabilities = ts_bandit.win_prop()Use LinearBandit when your outcomes are continuous values (revenue, response time, ratings, etc.).
from linearbandit import LinearBandit
# Create a LinearBandit instance
bandit = LinearBandit(obs_noise=1.0)
# Observations are continuous reward values
obs = {
"variant_a": [2.3, 2.7, 2.4, 2.6, 2.5], # Multiple observations
"variant_b": [3.1, 3.3, 3.0], # Different number is fine
"variant_c": 2.8 # Single value also works
}
bandit.update(obs)
# Get winning probabilities
probabilities = bandit.win_prop()
print(probabilities)
# Output: {'variant_b': 0.65, 'variant_c': 0.25, 'variant_a': 0.10}# Get detailed statistics
stats = bandit.get_statistics()
for action, stat in stats.items():
print(f"{action}: mean={stat['mu']:.3f}, uncertainty={stat['sigma']:.3f}")
# Adjust exploration/exploitation
probabilities = bandit.win_prop(aggressive=2.0) # More exploitation
# Use decay for non-stationary environments
bandit.update(obs, decay=0.2) # Discount old information
# Custom noise parameters
bandit = LinearBandit(obs_noise=0.5, default_sigma=2.0)For binary outcomes using ORTS and Full-TS algorithms.
-
update(obs, odds_ratios_only=True, decay=0.0): Update model with new observationsobs: Dictionary mapping arm names to[total_count, success_count]odds_ratios_only: If True, use ORTS; if False, use Full-TSdecay: Discount factor for prior information (0.0 to 1.0)
-
win_prop(action_list=None, draw=100000, aggressive=1.0): Calculate winning probabilitiesaction_list: List of arms to consider (None = all arms)draw: Number of Monte Carlo samplesaggressive: Exploration/exploitation parameter
-
get_models(): Get list of currently tracked arms -
get_par(action_list): Get transformed parameters for specific arms
For continuous rewards using Gaussian Thompson Sampling.
-
update(obs, decay=0.0): Update model with new observationsobs: Dictionary mapping action names to rewards (list of floats or single float)decay: Discount factor for prior information (0.0 to 1.0)
-
win_prop(action_list=None, draw=100000, aggressive=1.0): Calculate winning probabilitiesaction_list: List of actions to consider (None = all actions)draw: Number of Monte Carlo samplesaggressive: Exploration/exploitation parameter
-
get_models(): Get list of currently tracked actions -
get_statistics(): Get detailed statistics (mu, sigma, count, mean_reward) for all actions
mu: Prior mean for each action (dict)sigma: Prior standard deviation for each action (dict)default_sigma: Default prior uncertainty for new actions (default: 1.0)obs_noise: Assumed observation noise standard deviation (default: 1.0)
Beta-Bernoulli Thompson Sampling baseline for binary outcomes.
update(obs): Update Beta distributions with observationswin_prop(draw=10000): Calculate winning probabilitiesget_models(): Get list of tracked arms
Start with the comprehensive tutorial:
-
TUTORIAL.md- Complete guide to Multi-Armed Bandits and Thompson Sampling- Why use bandits? (Exploration-Exploitation tradeoff)
- When to use LogisticBandit vs LinearBandit
- Real-world examples with detailed explanations
- Common mistakes and how to avoid them
-
examples/tutorial_step_by_step.py- Interactive step-by-step tutorial- Run it to learn concepts interactively
- Covers basic usage, parameters, and best practices
# Start with the interactive tutorial
python examples/tutorial_step_by_step.py
# Read the comprehensive guide
cat TUTORIAL.mdPractical examples are available in the examples/ directory:
examples/basic_usage.py- LogisticBandit basicsexamples/comparison.py- Compare ORTS, Full-TS, and Beta-Bernoulliexamples/ab_testing.py- Realistic A/B testing simulationexamples/linear_bandit.py- LinearBandit for continuous rewards
Run any example:
python examples/linear_bandit.py
python examples/comparison.pyThis implementation is based on the paper:
- Paper: Odds Ratio Thompson Sampling
- Key Idea: ORTS uses odds ratios to model arm differences, which provides robustness to time-varying effects compared to traditional approaches
Uses Bayesian inference with Gaussian conjugate priors:
- Prior: Each action's mean reward follows N(ΞΌβ, ΟβΒ²)
- Likelihood: Observations are N(ΞΌ, ΟΒ²) where Ο is the observation noise
- Posterior: Also Gaussian (conjugate prior property)
- Thompson Sampling: Sample from posterior and select action with highest sample
Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change.
# Install development dependencies
pip install -e ".[dev]"
# Run tests (once implemented)
pytest
# Check types (recommended)
mypy logisticbandit.py ts.py utils.pyRun the comprehensive test suite:
# Run all tests
pytest
# Run with coverage
pytest --cov=. --cov-report=html
# Run specific test file
pytest tests/test_linearbandit.py
pytest tests/test_logisticbandit.pyCurrent test coverage: 88%+ with 100+ unit tests
- Default Monte Carlo samples: 100,000 for LogisticBandit, 10,000 for TSPar
- Reduce
drawparameter for faster computation (at the cost of accuracy) - For production use with many arms, consider adjusting
drawbased on your latency requirements
- β NEW: Added LinearBandit for continuous rewards (Gaussian Thompson Sampling)
- β NEW: Comprehensive test suite with 100+ tests (88%+ coverage)
- β
NEW: Multiple examples in
examples/directory - β NEW: CI/CD with GitHub Actions
- β Type checking with mypy
- β Code coverage analysis
- β
Fixed critical
exit()bug that caused program termination - β Added comprehensive docstrings
- β Added type hints for all public methods
- β Added input validation with clear error messages
- β Made hardcoded values configurable
- β
Added
requirements.txtandsetup.pyfor easy installation - β Improved numerical stability in covariance matrix handling
- Sulgi Kim - GitHub
If you use this code in your research, please cite the original paper:
@article{kim2020odds,
title={Odds-Ratio Thompson Sampling to Control for Time-Varying Effect},
author={Kim, Sulgi and Kim, Kyungmin},
journal={arXiv preprint arXiv:2003.01905},
year={2020}
}