Skip to content

feat(aggregation): Add IMTL-L#725

Merged
ValerianRey merged 7 commits into
SimplexLab:mainfrom
ppraneth:scalarization-5
Jun 10, 2026
Merged

feat(aggregation): Add IMTL-L#725
ValerianRey merged 7 commits into
SimplexLab:mainfrom
ppraneth:scalarization-5

Conversation

@ppraneth

@ppraneth ppraneth commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Adds IMTL, the loss-balancing variant (IMTL-L) of Impartial Multi-Task Learning from Towards Impartial Multi-Task Learning (ICLR 2021). It's a stateful, trainable Scalarizer.

IMTL

Each value $L_i$ (typically a per-task loss) is assigned a learnable scale $s_i$, and the values are combined as:

$$\sum_i \left( e^{s_i} L_i - s_i \right)$$

This is the loss-balance objective (eq. 6 in the paper, with the default $a=e, b=1$), and it matches the loss-balancing part of the LibMTL implementation (loss_scale.exp() * losses - loss_scale).

The factor $e^{s_i}$ rescales each loss so the scaled losses stay at a comparable magnitude across tasks, and the $-s_i$ term is a regularizer that prevents the trivial solution $s_i \to -\infty$. The $s_i$ are stored as an nn.Parameter, so the scalarizer's parameters must be passed to the optimizer to be learned jointly with the model.

Design notes:

  • shape is given at construction (IMTL(3) or IMTL((2, 3))), since the parameter has to exist before the optimizer is built. The shape is validated against the input at call time, like Constant and UW.
  • Scales are initialized to 0, so at the start of training the scalarization reduces to the plain sum of the values ($e^0 = 1$).
  • Implements reset() (from Stateful), which zeros the scales.
  • No positivity precondition: IMTL-L is designed for positive losses but the forward is well-defined for any input, so it isn't enforced.

Relationship to UW (almost equivalent)

IMTL-L is almost equivalent to UW: it equals UW up to a constant factor of two and the sign of the learned parameter, namely

$$\mathrm{IMTL}(s) = 2 \cdot \mathrm{UW}(-s)$$

(the paper notes this in Appendix C.4, where UW's regression form is written as $\tfrac{1}{2}(e^s L - s)$. They derive from different principles — UW from Gaussian/Laplace likelihoods, IMTL-L without any distribution assumption — but share the same per-task weighting and the same optima. IMTL is kept as its own discoverable class with its own direct formula; the docstring states the UW relationship, and a test locks it. The complementary gradient-balancing variant (IMTL-G) is already available as the IMTLG aggregator.

Tests

tests/unit/scalarization/test_imtl.py covers the value at init (reduces to sum(values)), int-vs-tuple shape equivalence, scalar output and gradient flow over all input shapes (0-dim, vector, matrix, higher-dim), gradient flow to log_scale, shape validation, reset(), that negative inputs are allowed, trainability via an optimizer step, the representations, and that IMTL(s) == 2 * UW(-s).

Signed-off-by: ppraneth <pranethparuchuri@gmail.com>
@ppraneth ppraneth requested a review from a team as a code owner June 9, 2026 03:01
ppraneth added 2 commits June 9, 2026 08:31
Signed-off-by: ppraneth <pranethparuchuri@gmail.com>
@PierreQuinton

Copy link
Copy Markdown
Contributor

If they are the same, shouldn't we merge them and write a note? I'm pretty sure the factor two will just double the effective LR of s but nothing else.

I think we could in principle name it as the first implementation of the two (does any cite another?) I'm not so sure that adding duplicated methods is a good idea as it contributes noise to the library, it will also cost compute to people doing benchmarks on all methods.

Not sure what we should do.

@ppraneth

ppraneth commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@PierreQuinton I added doc strings saying that but kept code as separate as we know both are different methods right(I mean papers).

@ValerianRey ValerianRey added cc: feat Conventional commit type for new features. package: aggregation labels Jun 9, 2026
@github-actions github-actions Bot changed the title feat(scalarization)!:add IMTL-L feat(aggregation): Feat(scalarization)!:add IMTL-L Jun 9, 2026
@ValerianRey

Copy link
Copy Markdown
Member

If they are the same, shouldn't we merge them and write a note? I'm pretty sure the factor two will just double the effective LR of s but nothing else.

I think we could in principle name it as the first implementation of the two (does any cite another?) I'm not so sure that adding duplicated methods is a good idea as it contributes noise to the library, it will also cost compute to people doing benchmarks on all methods.

Not sure what we should do.

The reason why I wanted to have two separate classes is so that it's easy for people of the field to find the method they want to benchmark against. If they're implementing the IMTL paper, they know they need IMTL-G + IMTL-L. They will never know that they can replace IMTL-L by UW.

Also, these methods are not exactly the same, even if the difference is extremely minimal. So I guess it's ok to include this. It's not like this will happen very often I think. It's more of a reviewer's mistake to let them claim IMTL-L as novel.

@ppraneth ppraneth changed the title feat(aggregation): Feat(scalarization)!:add IMTL-L feat(aggregation): add IMTL-L Jun 9, 2026
@github-actions github-actions Bot changed the title feat(aggregation): add IMTL-L feat(aggregation): Add IMTL-L Jun 9, 2026
Comment thread docs/source/docs/scalarization/imtl.rst Outdated
Comment thread src/torchjd/scalarization/_imtl.py Outdated
Comment thread CHANGELOG.md Outdated
@ValerianRey ValerianRey mentioned this pull request Jun 9, 2026
ValerianRey and others added 4 commits June 9, 2026 12:12
Co-authored-by: Valérian Rey <31951177+ValerianRey@users.noreply.github.com>
Co-authored-by: Valérian Rey <31951177+ValerianRey@users.noreply.github.com>
Signed-off-by: ppraneth <pranethparuchuri@gmail.com>
@ppraneth ppraneth requested a review from ValerianRey June 9, 2026 13:36
@ppraneth

ppraneth commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@ValerianRey I have made the changes

@ValerianRey ValerianRey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @PierreQuinton are you ok with merging this? See my comment for an answer to your concerne.

@PierreQuinton

Copy link
Copy Markdown
Contributor

Yes, the solution to my concern is an improved onboarding, which is independent from this. Thanks a lot @ppraneth !

@ppraneth

Copy link
Copy Markdown
Contributor Author

@PierreQuinton How about we work on docs once we I am done with the whole scalarization package
We can make more readable docs(ideally not to technical but just enough to get a simple user onboard quickly)

@ValerianRey

Copy link
Copy Markdown
Member

@PierreQuinton How about we work on docs once we I am done with the whole scalarization package We can make more readable docs(ideally not to technical but just enough to get a simple user onboard quickly)

I agree with that. I think our README is outdated and we're missing a simple getting-started tutorial. Also, we need to emphasize much more on scalarization when the package becomes more complete.

Instead of spending a lot of time explaining what jacobian descent is, I would rather say that we can either combine the losses into a scalar loss and do gradient descent, or compute every gradient and combine them into a single gradient, which is jacobian descent. Then explain a bit about the pros and cons.

@ValerianRey ValerianRey merged commit d759aed into SimplexLab:main Jun 10, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cc: feat Conventional commit type for new features. package: aggregation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants