Skip to content

feat(scalarization): Add readme#749

Closed
ppraneth wants to merge 6 commits into
SimplexLab:mainfrom
ppraneth:scalarization-readme
Closed

feat(scalarization): Add readme#749
ppraneth wants to merge 6 commits into
SimplexLab:mainfrom
ppraneth:scalarization-readme

Conversation

@ppraneth

Copy link
Copy Markdown
Contributor

Adds a simple README

@ppraneth ppraneth added cc: feat Conventional commit type for new features. package: scalarization labels Jun 21, 2026
@github-actions github-actions Bot changed the title add readme feat(scalarization): Add readme Jun 21, 2026

@PierreQuinton PierreQuinton left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So think that if this is a good idea to add, then it should be dev oriented only. Here it looks like documentation, which it should not be. The way I view this would be to explain how to create scalarizers, what to be careful about (when are they stateful, avoid non-atomic things, etc...), what are not scalarizers? etc...

What do you think of this?

@ppraneth

Copy link
Copy Markdown
Contributor Author

Yea I will make it dev oriented

@ppraneth ppraneth requested a review from PierreQuinton June 21, 2026 07:58
Comment thread src/torchjd/scalarization/README.md Outdated
Comment thread src/torchjd/scalarization/README.md Outdated

- **Any shape in, scalar out:** it reduces over *all* elements of `values` (0-dim, vector, matrix,
higher-dim) into a 0-dim scalar.
- **`values`, not `losses`:** a scalarizer is generic and not tied to losses.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to speak more about the abstraction level rather than saying that. Not sure how though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrote it around the abstraction

Comment thread src/torchjd/scalarization/README.md Outdated
Comment on lines +53 to +56
Anything that needs the model, its parameters, or the per-task gradients belongs in the
[aggregation](../aggregation) package as a `Weighting` / `Aggregator`, which operates on the Jacobian
or its Gramian. If you reach for gradient norms or the network inside `forward`, you are writing an
aggregator, not a scalarizer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the aggregation package should not contain anything else than Scalarizers, it contains aggregators (that could be stateful).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. It now says the gradient-level counterpart in the aggregation package is an Aggregator (which, like a scalarizer, can be stateful) that operates on the Jacobian or its Gramian.

Comment on lines +41 to +42
- **Subclass `Stateful`** (`from torchjd._mixins import Stateful`) and implement `reset()` to restore
the initial state.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have Randomness? If so do we have the Stochastic mixin that we also use in aggregation? Maybe we can mention it here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a random baseline, but there is no Stochastic mixin. It just calls torch.randn directly

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I thought that In the aggregation package, we consider random aggregators as stateful (the seed is the state), I think it would be beneficial to do that in scalarization. @ValerianRey Didn't we go in that direction? Did we abandon that idea?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we abandoned this idea because it was quite tricky to make it work on cuda:

See:

The biggest problems were that something can be stochastic without directly owning a generator (it can own another stochastic object that does own a generator), and generators need a device to be created, so they can't be simply created ahead of time.

ppraneth and others added 4 commits June 21, 2026 14:31

@ValerianRey ValerianRey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice but IMO we should try to include some parts within public docstrings, some within comments, and some within a skill. So I don't think we actually need a README file here at all. It's a bit different from what other repos do, but it will really enable agentic development to a much stronger level, while avoiding duplication of information (which is a nightmare to maintain).

E.g.

# Scalarization

This package implements the `Scalarizer`s: objects that reduce a tensor of values (typically a
vector of losses) into a single scalar optimizable with a standard `loss.backward()`.

=> init.py public docstring

## The abstraction

=> Split between public docstring of Scalarizer.forward, comments, etc

## Adding one
## State
## Things to be careful about

=> skill

And somewhere in CONTRIBUTING.md we should tell people that they're supposed to read skills before implementing something, and to use those skills if they use agents to develop.

## The abstraction

A scalarizer captures a single decision: **how to collapse a vector of values into one scalar to
minimize**. It operates purely on those values: it has no notion of the losses, tasks, or model they

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the values are 99% of the time losses so it's a bit confusing to say that it has no notion of the losses. Also I think we should say that in most cases, values are losses, but the scalarization package has been designed to be able to scalarize any tensor of values.

Comment on lines +17 to +28
Concretely, it subclasses `Scalarizer` (in [`_scalarizer_base.py`](_scalarizer_base.py)) and
implements one method:

```python
def forward(self, values: Tensor, /) -> Tensor:
...
```

- **Any shape in, scalar out:** it reduces over *all* elements of `values` (scalar, vector, matrix,
higher-dim) into a single scalar.
- **Pure and differentiable:** the output depends only on `values` and the configured parameters, so
that `scalarizer(values).backward()` produces the gradient.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems like something that should be explained in the code itself (if not already), either in a public docstring for things that are intended to be user-facing, or in a comment for things that are intended to be contributor-facing.

Comment on lines +30 to +39
## Adding one

A new scalarizer is a class plus the files that register it. Mirror an existing scalarizer of the
same kind:

- `_<name>.py`: the class.
- `__init__.py`: the import and an `__all__` entry.
- `docs/source/docs/scalarization/<name>.rst`: the docs page, added to the `index.rst` toctree.
- `tests/unit/scalarization/test_<name>.py`: the tests.
- `CHANGELOG.md`: an entry under `[Unreleased]`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I feel like this should be explained in a skill rather than here. But I also think skills should be addressed to humans and agents alike.

Comment on lines +41 to +42
- **Subclass `Stateful`** (`from torchjd._mixins import Stateful`) and implement `reset()` to restore
the initial state.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we abandoned this idea because it was quite tricky to make it work on cuda:

See:

The biggest problems were that something can be stochastic without directly owning a generator (it can own another stochastic object that does own a generator), and generators need a device to be created, so they can't be simply created ahead of time.

Comment on lines +60 to +62
- **Determinism and side effects:** the output should depend only on `values`, the configured
parameters, and (if the method is intentionally random) the global RNG. Any state change must be
deliberate, explicit, and undone by `reset()`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily agree with that. We want to be able to give extra information through setters. E.g. a scalarizer that takes the gramian every once in a while:

my_agg = MyAgg()

for i in ...:
  losses = ...
  if i % 1000 == 0:
    gramian = engine.compute_gramian(losses)
    my_agg.set_gramian(gramian)
  loss = my_agg(losses)
  loss.backward()
  ...

Note that such a scalarizer could use internally for example an UPGradWeighting, return weights that would be used for the next 1000 steps, etc.

This would be equivalent to:

W = UPGradWeighting()

for i in ...:
  losses = ...
  if i % 1000 == 0:
    gramian = engine.compute_gramian(losses)
    weights = W(gramian)
  losses.backward(weights)
  ...

So my point is that really a scalarizer could use anything to make its decision, let's not be restrictive here.

@ppraneth ppraneth closed this Jun 21, 2026
@ppraneth ppraneth deleted the scalarization-readme branch June 21, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cc: feat Conventional commit type for new features. package: scalarization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants