PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning

Paper: https://arxiv.org/abs/2602.03846

PLATE (Plasticity-Tunable Efficient Adapters) is a parameter-efficient fine-tuning method that reduces catastrophic forgetting in continual learning by combining neuron selection with orthogonal input basis computation. This method does not require any pre-training (old task) data nor features.

Installation

cd PLATE
pip install .

Requirements

See requirements.txt for full dependencies. Minimum requirements:

Python >= 3.8
PyTorch >= 2.0.0
peft >= 0.15.0
transformers >= 4.40.0

Quick Start

from transformers import AutoModelForCausalLM
from plate import PLATEConfig, get_plate_model

# Load model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B")

# Configure PLATE
config = PLATEConfig(
    r=64,             # Number of trainable neurons
    col_tau=0.9,      # Input orthogonality threshold
    plate_alpha=1.0,  # Scaling factor
    max_rank =512,    # Maximum input basis dimension
    plate_dropout=0.0,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
)

# Apply PLATE adapter
model = get_plate_model(model, config)

model.print_trainable_parameters()

PLATE adapters

PLATE adapts a frozen pretrained linear layer W using a structured PEFT-style update:

$$ W' = W + \rho \Delta W,\qquad \Delta W = B A Q^\top $$

Only A is trained; B and Q are computed once from the frozen pretrained weights and then kept fixed.

What are B, A, Q?

A (trainable): A ∈ R^{r×k}
The only learned parameters per layer (initialized to zeros). Trainable parameter count per layer is r * k.
B (frozen output selector): B ∈ R^{d_out×r}
Selects which output neurons are allowed to change. PLATE chooses the r most redundant output rows of W, then uses the corresponding identity columns as a selector.
Q (frozen input basis): Q ∈ R^{d_in×k}
Orthonormal basis for a low-energy input subspace computed from the frozen rows of W (rows not selected by B). It constrains updates to directions that weakly excite old task data/features, reducing drift on old behavior.

Hyperparameters

Key hyperparameter

r: number of trainable output neurons per layer
- Higher r ⇒ more capacity for the new task, but typically more forgetting risk.

Other hyperparameters (dimension / constraint controls)

col_tau: input energy threshold controlling the size of the low-energy subspace (i.e., k)
- Higher col_tau ⇒ smaller k (stricter constraint).
max_rank: cap on k (maximum input basis dimension).
rho: scaling factor in W' = W + ρ * (B A Q^T).

These hyperparameters mainly control the input/output dimensions of the trainable adapter parameters (through r and k).

Example

See examples/run_example.py for a complete training example.

Results

PLATE learning vs forgetting

Qwen 2.5-3B - PLATE sweep across (r,τ): Columns fix the PLATE output rank r ∈ {32,64,128,256} and sweep τ ∈ {0.70,0.80,0.90,0.98} (green, solid) against LoRA baselines with varying ranks (blue, dashed). Top row reports WikiText-2 perplexity (forgetting) and bottom row reports Middle English perplexity (task learning), both over training steps.

Local-geometry view of forgetting

Local-geometry view of forgetting - PLATE sweep across (r,τ): We sweep PLATE's two knobs on a two-moons continual-learning toy. Blue points denote the old-task dataset $P_0$ and yellow points the new-task dataset $P_1$; decision boundaries are shown when trained on $P_0$ (blue curve) and after training on $P_1$ (yellow curve). The background heatmap visualizes how the training on $P_1$ change the model's local input-output linearization, $\Delta(x)\coloneqq|J_x(\theta_1,x)-J_x(\theta_0,x)|_F$. Retention is compromised when the heatmap turns yellow around the blue points (large drift on supp(P_0)), while effective learning requires yellow regions around the yellow points (large drift concentrated near supp(P_1)). This motivates our goal: parameter-efficient continual updates that localize drift away from the (often unavailable) old distribution while remaining expressive on the new task. Increasing r expands the plasticity budget and improves task 2 performance but can increase task 1 drift/forgetting, while increasing τ tends to concentrate updates onto more redundant degrees of freedom and reduces drift/forgetting. Overall, PLATE provides an explicit mechanism to target a desired point on the retention-adaptation trade-off.

Citation

If you use PLATE in your work, please cite:

@misc{cosentino2026plateplasticitytunableefficientadapters,
      title={PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning}, 
      author={Romain Cosentino},
      year={2026},
      eprint={2602.03846},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.03846}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples		examples
figures		figures
plate		plate
AI_ETHICS.md		AI_ETHICS.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
how_to_license.md		how_to_license.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning

Installation

Requirements

Quick Start

PLATE adapters

What are B, A, Q?

Hyperparameters

Key hyperparameter

Other hyperparameters (dimension / constraint controls)

Example

Results

PLATE learning vs forgetting

Local-geometry view of forgetting

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

SalesforceAIResearch/PLATE

Folders and files

Latest commit

History

Repository files navigation

PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning

Installation

Requirements

Quick Start

PLATE adapters

What are B, A, Q?

Hyperparameters

Key hyperparameter

Other hyperparameters (dimension / constraint controls)

Example

Results

PLATE learning vs forgetting

Local-geometry view of forgetting

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages