Open
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an incomplete pull request to guage interest in merging before finalizing.
I have added the asymmetric loss (ASL) function (first introduced in https://arxiv.org/abs/2009.14119v4). This extends binary crossentropy and focal loss for multilabel classification with sparse label sets (like tagging images where there may be upwards of 1000s of classes but less than 10 positive ground truth labels). Typical focusing reduces the impact of close predictions on the total loss under the assumption that close predictions are easy and therefore less important to training than currently bad predictions.
When labels are sparse enough, we want to emphasize the positive labels while reducing the contribution of accumulating a large number of small residuals for easy negative label. ASL handles this by using asymmetric focusing, so the positive labels and negative labels can have different focusing parameters, and by hard thresholds, so that easy predictions for negative labels are ignored entirely.
I already wrote this for my own use, if there is interest in adding this to optax, I will finish integrating it into optax.