feat(scalarization): Add readme by ppraneth · Pull Request #749 · SimplexLab/TorchJD

ppraneth · 2026-06-21T04:17:17Z

Adds a simple README

PierreQuinton

So think that if this is a good idea to add, then it should be dev oriented only. Here it looks like documentation, which it should not be. The way I view this would be to explain how to create scalarizers, what to be careful about (when are they stateful, avoid non-atomic things, etc...), what are not scalarizers? etc...

What do you think of this?

ppraneth · 2026-06-21T07:41:06Z

Yea I will make it dev oriented

PierreQuinton · 2026-06-21T08:32:15Z

+
+- **Any shape in, scalar out:** it reduces over *all* elements of `values` (0-dim, vector, matrix,
+  higher-dim) into a 0-dim scalar.
+- **`values`, not `losses`:** a scalarizer is generic and not tied to losses.


I think we may want to speak more about the abstraction level rather than saying that. Not sure how though.

rewrote it around the abstraction

PierreQuinton · 2026-06-21T08:35:43Z

+Anything that needs the model, its parameters, or the per-task gradients belongs in the
+[aggregation](../aggregation) package as a `Weighting` / `Aggregator`, which operates on the Jacobian
+or its Gramian. If you reach for gradient norms or the network inside `forward`, you are writing an
+aggregator, not a scalarizer.


Well the aggregation package should not contain anything else than Scalarizers, it contains aggregators (that could be stateful).

Fixed. It now says the gradient-level counterpart in the aggregation package is an Aggregator (which, like a scalarizer, can be stateful) that operates on the Jacobian or its Gramian.

PierreQuinton · 2026-06-21T08:36:37Z

+- **Subclass `Stateful`** (`from torchjd._mixins import Stateful`) and implement `reset()` to restore
+  the initial state.


Do we have Randomness? If so do we have the Stochastic mixin that we also use in aggregation? Maybe we can mention it here.

We do have a random baseline, but there is no Stochastic mixin. It just calls torch.randn directly

My bad, I thought that In the aggregation package, we consider random aggregators as stateful (the seed is the state), I think it would be beneficial to do that in scalarization. @ValerianRey Didn't we go in that direction? Did we abandon that idea?

Yes we abandoned this idea because it was quite tricky to make it work on cuda:

See:

refactor(aggregation): Make Stateful an iff and add Stochastic #656

stochastic state #657

The biggest problems were that something can be stochastic without directly owning a generator (it can own another stochastic object that does own a generator), and generators need a device to be created, so they can't be simply created ahead of time.

Co-authored-by: Pierre Quinton <pierre.quinton@gmail.com>

ValerianRey

Really nice but IMO we should try to include some parts within public docstrings, some within comments, and some within a skill. So I don't think we actually need a README file here at all. It's a bit different from what other repos do, but it will really enable agentic development to a much stronger level, while avoiding duplication of information (which is a nightmare to maintain).

E.g.

# Scalarization

This package implements the `Scalarizer`s: objects that reduce a tensor of values (typically a
vector of losses) into a single scalar optimizable with a standard `loss.backward()`.

=> init.py public docstring

## The abstraction

=> Split between public docstring of Scalarizer.forward, comments, etc

## Adding one
## State
## Things to be careful about

=> skill

And somewhere in CONTRIBUTING.md we should tell people that they're supposed to read skills before implementing something, and to use those skills if they use agents to develop.

ValerianRey · 2026-06-21T12:54:10Z

+## The abstraction
+
+A scalarizer captures a single decision: **how to collapse a vector of values into one scalar to
+minimize**. It operates purely on those values: it has no notion of the losses, tasks, or model they


Well, the values are 99% of the time losses so it's a bit confusing to say that it has no notion of the losses. Also I think we should say that in most cases, values are losses, but the scalarization package has been designed to be able to scalarize any tensor of values.

ValerianRey · 2026-06-21T12:55:47Z

+Concretely, it subclasses `Scalarizer` (in [`_scalarizer_base.py`](_scalarizer_base.py)) and
+implements one method:
+
+```python
+def forward(self, values: Tensor, /) -> Tensor:
+    ...
+```
+
+- **Any shape in, scalar out:** it reduces over *all* elements of `values` (scalar, vector, matrix,
+  higher-dim) into a single scalar.
+- **Pure and differentiable:** the output depends only on `values` and the configured parameters, so
+  that `scalarizer(values).backward()` produces the gradient.


This part seems like something that should be explained in the code itself (if not already), either in a public docstring for things that are intended to be user-facing, or in a comment for things that are intended to be contributor-facing.

ValerianRey · 2026-06-21T12:57:02Z

+## Adding one
+
+A new scalarizer is a class plus the files that register it. Mirror an existing scalarizer of the
+same kind:
+
+- `_<name>.py`: the class.
+- `__init__.py`: the import and an `__all__` entry.
+- `docs/source/docs/scalarization/<name>.rst`: the docs page, added to the `index.rst` toctree.
+- `tests/unit/scalarization/test_<name>.py`: the tests.
+- `CHANGELOG.md`: an entry under `[Unreleased]`.


Honestly I feel like this should be explained in a skill rather than here. But I also think skills should be addressed to humans and agents alike.

ValerianRey · 2026-06-21T13:01:46Z

+- **Subclass `Stateful`** (`from torchjd._mixins import Stateful`) and implement `reset()` to restore
+  the initial state.


Yes we abandoned this idea because it was quite tricky to make it work on cuda:

See:

refactor(aggregation): Make Stateful an iff and add Stochastic #656

stochastic state #657

The biggest problems were that something can be stochastic without directly owning a generator (it can own another stochastic object that does own a generator), and generators need a device to be created, so they can't be simply created ahead of time.

ValerianRey · 2026-06-21T13:09:13Z

+- **Determinism and side effects:** the output should depend only on `values`, the configured
+  parameters, and (if the method is intentionally random) the global RNG. Any state change must be
+  deliberate, explicit, and undone by `reset()`.


I don't necessarily agree with that. We want to be able to give extra information through setters. E.g. a scalarizer that takes the gramian every once in a while:

my_agg = MyAgg() for i in ...: losses = ... if i % 1000 == 0: gramian = engine.compute_gramian(losses) my_agg.set_gramian(gramian) loss = my_agg(losses) loss.backward() ...

Note that such a scalarizer could use internally for example an UPGradWeighting, return weights that would be used for the next 1000 steps, etc.

This would be equivalent to:

W = UPGradWeighting() for i in ...: losses = ... if i % 1000 == 0: gramian = engine.compute_gramian(losses) weights = W(gramian) losses.backward(weights) ...

So my point is that really a scalarizer could use anything to make its decision, let's not be restrictive here.

add readme

bbe4957

ppraneth requested review from PierreQuinton and ValerianRey as code owners June 21, 2026 04:17

ppraneth added cc: feat Conventional commit type for new features. package: scalarization labels Jun 21, 2026

github-actions Bot changed the title ~~add readme~~ feat(scalarization): Add readme Jun 21, 2026

PierreQuinton reviewed Jun 21, 2026

View reviewed changes

add readme v2

0d342cb

ppraneth requested a review from PierreQuinton June 21, 2026 07:58

PierreQuinton reviewed Jun 21, 2026

View reviewed changes

ppraneth and others added 4 commits June 21, 2026 14:31

Update src/torchjd/scalarization/README.md

ada66ad

Co-authored-by: Pierre Quinton <pierre.quinton@gmail.com>

add readme v3

5990d23

add readme v3

42f6bcf

minor fix

0156c1f

ValerianRey reviewed Jun 21, 2026

View reviewed changes

ppraneth closed this Jun 21, 2026

ppraneth deleted the scalarization-readme branch June 21, 2026 15:16

		- Subclass `Stateful` (`from torchjd._mixins import Stateful`) and implement `reset()` to restore
		the initial state.

Conversation

ppraneth commented Jun 21, 2026

Uh oh!

PierreQuinton left a comment

Choose a reason for hiding this comment

Uh oh!

ppraneth commented Jun 21, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ValerianRey left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ValerianRey left a comment •

edited

Loading