All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog (additionally tagging whether a change affects the [Code, Docs, Rules, Leaderboard]), and this project adheres to Semantic Versioning.
Versioning Policy:
AlgoPerf uses a unified versioning scheme: codebase, rules, and leaderboard all share the same Major.Minor version. All results produced under the same Major.Minor version are comparable. Patch versions can be incremented independently for each component to reflect smaller, non-breaking changes to allow some flexibility:
- Leaderboard: New submissions or minor fixes to the leaderboard could increment its
Patchversion (e.g.,0.6.0->0.6.1) as shown in the leaderboard repo. - Codebase: API improvements, bug fixes, or small non-breaking changes in the benchmark code could increment its
Patchversion as reflected in thealgoperfpackage version. - Documentation/Rules: Clarifications, typo fixes, or minor updates to the rules/documentation could increment its
Patchversion as shown in the documentation file.
Improved and streamlined version of the benchmark which includes important bug fixes, API improvements and benchmark protocol changes following the lessons learned from the first competition.
- [Code, Rules] Updated API to allow for
prepare_for_evalfunction (PR/Issue). - [Docs] Document default dropout values for each workload (PR/Issue).
- [Docs] Unified versioning policy section (PR).
- [Code] Add the ability to change dropout values during training (PR/Issue).
- [Code, Docs] Rename package to
algoperf(PR). - [Code, Docs] Switch to
rufffor linting and formatting(PR). - [Code, Rules] Pass
train_statetoupdate_paramsfunction (PR/Issue). - [Code, Rules] Reduced number of studies from 5 to 3 (PR). See also Section 5.1 in our results paper.
- [Code, Rules] Remove held-out workloads from the benchmark (PR). See also Section 5.1 in our results paper.
- [Code] Remove sacrebleu dependency (PR).
- [Code] Switch to
pyproject.tomlfor package management (PR). - [Code] Update Python version to 3.11 and dependencies accordingly (PR/Issue).
- [Rules] Modify the runtime budgets and step hints for each workload (PR/Issue). See also Section 5.1 in our results paper.
- [Code] Automatically determine the package version via the latest GitHub tag (PR).
- [Code, Docs] Move all algorithms into a dedicated
algorithmsdirectory (PR). - [Code] Migrate from
pmaptojitin JAX for better performance and scalability (PR).
- [Code] Batch norm bug (PR/PR/Issue).
- [Code] Fix bug of potentially giving a free evaluation to a submission that goes out of
max_runtime(PR/Issue). - [Code] Fix that models in the self-tuning ruleset will always be initialized with default dropout (PR/PR/Issue).
The version of the benchmark used for the first competition.
Summary:
- Finalized variant workload targets.
- Fix in random_utils helper function.
- For conformer PyTorch Dropout layers set
inplace=True. - Clear CUDA cache at begining of each trial for PyTorch.
What's changed:
- update speech variants target setting points by @priyakasimbeg in #727
- set num_workers for librispeech back to 4 by @priyakasimbeg in #736
- [fix] random_utils.py to
_signed_to_unsignedby @tfaod in #739 - Fix path in helper config for running experiments in bulk. by @priyakasimbeg in #740
- Finalize variants targets by @priyakasimbeg in #738
- Aiming to Fix Conformer OOM by @pomonam in #710
- Lint fixes by @priyakasimbeg in #742
- Add warning for PyTorch data loader num_workers flag. by @priyakasimbeg in #726
Upgrade CUDA version to CUDA 12.1:
- Upgrade CUDA version in Dockerfiles that will be used for scoring.
- Update Jax and PyTorch package version tags to use local CUDA installation.
Add flag for completely disabling checkpointing.
- Note that we will run with checkpointing off at scoring time.
Update Deepspeech and Conformer variant target setting configurations.
- Note that variant targets are not final.
Fixed bug in scoring code to take best trial in a study for external-tuning ruleset.
Added instructions for submission.
Changed default number of workers for PyTorch data loaders to 0. Running with >0 may lead to incorrect eval results see mlcommons#732.
Workload variant additions and fixes:
- Add Deepspeech workload variant
- Fix bugs in Imagenet ResNet, WMT and Criteo1tb variants
Add prize qualification logs for external tuning ruleset. Note: FastMRI trials with dropout are not yet added due to mlcommons#664.
Add missing funcitonality to Docker startup script for self_tuning ruleset. Add self_tuning ruleset option to script that runs all workloads for scoring.
Datasetup fixes.
Fix tests that check training differences in PyTorch and JAX on GPU.
Bug fixes to FastMRI metric calculation and targets.
Added workload variants and targets for ogbg, fastmri, librispeech_conformer, imagenet_resnet, imagenet_vit, criteo1tb to be used as held-out workloads.
First release of the AlgoPerf: Training algorithms benchmarking code.