Skip to content

Commit 2b13464

Browse files
ValerianReyclaude
andauthored
style: Fix typos in docs and source files (#674)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 9c5d789 commit 2b13464

14 files changed

Lines changed: 21 additions & 21 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ changelog does not include internal changes that do not affect the user.
9191
Suggested change: `mtl_backward(losses=losses, features=features)` =>
9292
`mtl_backward(losses, features=features)`. The `features` parameter remains usable as positional
9393
or keyword. All other parameters are now keyword-only.
94-
- `Aggregator.__call__`: The `matrix` parameter is now positonal-only. Suggested change:
94+
- `Aggregator.__call__`: The `matrix` parameter is now positional-only. Suggested change:
9595
`aggregator(matrix=matrix)` => `aggregator(matrix)`.
9696
- `Weighting.__call__`: The `stat` parameter is now positional-only. Suggested change:
9797
`weighting(stat=gramian)` => `weighting(gramian)`.
@@ -177,7 +177,7 @@ changelog does not include internal changes that do not affect the user.
177177

178178
- Made some aggregators (`CAGrad`, `ConFIG`, `DualProj`, `GradDrop`, `IMTLG`, `NashMTL`, `PCGrad`
179179
and `UPGrad`) raise a `NonDifferentiableError` whenever one tries to differentiate through them.
180-
Before this change, trying to differentiate through them leaded to wrong gradients or unclear
180+
Before this change, trying to differentiate through them led to wrong gradients or unclear
181181
errors.
182182

183183
### Added

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ TorchJD provides many existing aggregators from the literature, listed in the fo
294294

295295
## Release Methodology
296296

297-
We try to make a release whenever have something worth sharing to users (bug fix, minor or large
297+
We try to make a release whenever we have something worth sharing to users (bug fix, minor or large
298298
feature, etc.). TorchJD follows [semantic versioning](https://semver.org/). Since the library is
299299
still in beta (`0.x.y`), we sometimes make interface changes in minor versions. We prioritize the
300300
long-term quality of the library, which occasionally means introducing breaking changes. Whenever a

docs/source/examples/basic_usage.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Perform the Jacobian descent backward pass:
6969

7070
The first function will populate the ``.jac`` field of each model parameter with the corresponding
7171
Jacobian, and the second one will aggregate these Jacobians and store the result in the ``.grad``
72-
field of the parameters. It also deletes the ``.jac`` fields save some memory.
72+
field of the parameters. It also deletes the ``.jac`` fields to save some memory.
7373

7474
Update each parameter based on its ``.grad`` field, using the ``optimizer``:
7575

docs/source/examples/grouping.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,21 @@ The aggregation can be made independently on groups of parameters, at different
66
the parameters:
77

88
1. **Together** (baseline): one group covering all parameters. Corresponds to the `whole_model`
9-
stategy in the paper.
9+
strategy in the paper.
1010

1111
2. **Per network**: one group per top-level sub-network (e.g. encoder and decoder separately).
12-
Corresponds to the `enc_dec` stategy in the paper.
12+
Corresponds to the `enc_dec` strategy in the paper.
1313

14-
3. **Per layer**: one group per leaf module of the network. Corresponds to the `all_layer` stategy
14+
3. **Per layer**: one group per leaf module of the network. Corresponds to the `all_layer` strategy
1515
in the paper.
1616

1717
4. **Per tensor**: one group per individual parameter tensor. Corresponds to the `all_matrix`
18-
stategy in the paper.
18+
strategy in the paper.
1919

2020
In TorchJD, grouping is achieved by calling :func:`~torchjd.autojac.jac_to_grad` once per group
2121
after :func:`~torchjd.autojac.backward` or :func:`~torchjd.autojac.mtl_backward`, with a dedicated
2222
aggregator instance per group. For :class:`~torchjd.aggregation.Stateful` aggregators, each instance
23-
should independently maintains its own state (e.g. the EMA :math:`\hat{\phi}` state in
23+
should independently maintain its own state (e.g. the EMA :math:`\hat{\phi}` state in
2424
:class:`~torchjd.aggregation.GradVac`, matching the per-block targets from the original paper).
2525

2626
.. note::

src/torchjd/_linalg/_matrix.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
from torch import Tensor
44

5-
# Note: we're using classes and inherittance instead of NewType because it's possible to have
6-
# multiple inherittance but there is no type intersection. However, these classes should never be
5+
# Note: we're using classes and inheritance instead of NewType because it's possible to have
6+
# multiple inheritance but there is no type intersection. However, these classes should never be
77
# instantiated: they're only used for static type checking.
88

99

src/torchjd/aggregation/_graddrop.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ class GradDrop(Aggregator):
2020
Optimizing Deep Multitask Models with Gradient Sign Dropout
2121
<https://arxiv.org/pdf/2010.06808.pdf>`_.
2222
23-
:param f: The function to apply to the Gradient Positive Sign Purity. It should be monotically
23+
:param f: The function to apply to the Gradient Positive Sign Purity. It should be monotonically
2424
increasing. Defaults to identity.
2525
:param leak: The tensor of leak values, determining how much each row is allowed to leak
2626
through. Defaults to None, which means no leak.

src/torchjd/aggregation/_imtl_g.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ class IMTLG(GramianWeightedAggregator):
2929
:class:`~torchjd.aggregation.GramianWeightedAggregator` generalizing the method described in
3030
`Towards Impartial Multi-task Learning <https://discovery.ucl.ac.uk/id/eprint/10120667/>`_.
3131
This generalization, defined formally in `Jacobian Descent For Multi-Objective Optimization
32-
<https://arxiv.org/pdf/2406.16232>`_, supports matrices with some linearly dependant rows.
32+
<https://arxiv.org/pdf/2406.16232>`_, supports matrices with some linearly dependent rows.
3333
"""
3434

3535
gramian_weighting: IMTLGWeighting

src/torchjd/aggregation/_pcgrad.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def forward(self, gramian: PSDMatrix, /) -> Tensor:
4848

4949
class PCGrad(GramianWeightedAggregator):
5050
"""
51-
:class:`~torchjd.aggregation.GramianWeightedAggregator` as defined in algorithm 1 of
51+
:class:`~torchjd.aggregation.GramianWeightedAggregator` as defined in Algorithm 1 of
5252
`Gradient Surgery for Multi-Task Learning <https://arxiv.org/pdf/2001.06782.pdf>`_.
5353
"""
5454

src/torchjd/aggregation/_random.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def forward(self, matrix: Tensor, /) -> Tensor:
2121
class Random(WeightedAggregator):
2222
"""
2323
:class:`~torchjd.aggregation.WeightedAggregator` that computes a random combination of
24-
the rows of the provided matrices, as defined in algorithm 2 of `Reasonable Effectiveness of
24+
the rows of the provided matrices, as defined in Algorithm 2 of `Reasonable Effectiveness of
2525
Random Weighting: A Litmus Test for Multi-Task Learning
2626
<https://arxiv.org/pdf/2111.10603.pdf>`_.
2727
"""

src/torchjd/autogram/_jacobian_computer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ def functional_model_call(rg_params: dict[str, Parameter]) -> tuple[Tensor, ...]
120120

121121
vjp_func = torch.func.vjp(functional_model_call, self.rg_params)[1]
122122

123-
# vjp_func is a function that computes the vjp w.r.t. to the primals (tuple). Here the
123+
# vjp_func is a function that computes the vjp w.r.t. the primals (tuple). Here the
124124
# functional has a single primal which is dict(module.named_parameters()). We therefore take
125125
# the 0'th element to obtain the dict of gradients w.r.t. the module's named_parameters.
126126
gradients = vjp_func(grad_outputs_j_)[0]

0 commit comments

Comments
 (0)