feat: add noether-init for scaffolding by kinggongzilla · Pull Request #113 · Emmi-AI/noether

kinggongzilla · 2026-03-10T12:52:24Z

What is new:
noether-init cli command to create a new project with the specified model and dataset. After running this command, the train command is printed, which the user can run to start training immediately.

This does:

copies python files from a template_files directory substituting the project_name for imports
generates dynamic python files (__init__.py files to import chosen model, any_model_config.py, etc.)
copies YAML files and substitutes project name, dataset path, optimizer or tracker where necessary (e.g. kind: __PROJECT__.model.Transformer )

Example how to run: noether-init my_project -m ab_upt -d shapenet_car --dataset-path /Users/davidhauser/emmi/shapenet

Files that contain the logic inside the src/noether/scaffold folder:

cli.py entry point for cli command
config.py contains logic for mapping user inputs to a config object
/references folder contains YAMLs that define which Noether YAMLs should be copied depending on user cli inputs
file_copier.py handles logic for actually copying/generating all templates and substituting
choices.py enums for possible user inputs
generator.py orchestrates project generation

Custom dataset or custom model is not yet supported with this, can be added to create boilerplate files, but will also add some more template files etc.

github-actions · 2026-03-10T12:54:04Z

Tests	Skipped	Failures	Errors	Time
1137	21 💤	0 ❌	0 🔥	32.072s ⏱️

Ndles

I left some comments. I didn't go over the templates themselves, but the logic that takes care of the CLI. Also didn't check the tests as they are not final yet.

src/noether/scaffold/cli.py

Ndles · 2026-03-11T07:57:10Z

src/noether/scaffold/choices.py

+_MODEL_CLASS_NAMES: dict[str, str] = {
+    "transformer": "Transformer",
+    "upt": "UPT",
+    "ab_upt": "ABUPT",


ABUPT is the model from the tutorial, not the nother.modeling.models, there it's AnchorBranchedUPT. I think somehow we need to decide which one to use when as it may be confusing 🤔

Good point. For the scaffolding I would keep the ABUPT wrapper around AnchorBranchedUPT, because it gives users a starting point to add their own logic which I think is helpful.

alright, does it make sense to to create a small README.md inside of the project to highlight these details or maybe make it as part of the documentation of the tool itself?

i think we definitely need documentation for noether-init and will add it there, i think that should be sufficient but open to anything

Ndles · 2026-03-11T07:59:36Z

src/noether/scaffold/file_copier.py

I would call it file_manager.py, create a class in there with static methods, and extend it as needed. For example, if we expose extra flags into CLI to do a cleanup - FileManager would take care of it.

kinggongzilla · 2026-03-11T13:37:05Z

I have updated the PR with:

Addressing PR feedback from Pavel
Adjusting templates for breaking changes from init method simplification
Adding proper tests
Adding documentation (sphinx and in README files where useful)

MauritsBleeker

Few general comments:

How do we make this CLI scale to different model versions/dataset? I think we need to structure the files based on the problem.
Where are the pipeline(s) and trainers? Do we need those?
Why don't we import the dataset stats from the dataset zoo? I feel there is too much duplication going on.

MauritsBleeker · 2026-03-12T10:12:11Z

boilerplate_project/README.MD

+
+> You can use `noether-init` to automatically scaffold a complete project with your choice of model, dataset, and configuration. See the [scaffolding tutorial](https://noether-docs.emmi.ai/tutorials/scaffolding_a_new_project.html) for details.
+
 This folder contains skeleton/boilerplate code for a minimal working `Noether` training pipeline, including all required components.


What about pipelines, trainers, etc., those are also quite dataset/model specific? Do we add those later?

MauritsBleeker · 2026-03-12T10:15:13Z

docs/source/guides/working_with_cli.rst

+- ``project_name`` (positional) — project name, must be a valid Python identifier (no hyphens)
+- ``--model, -m`` — model architecture (``transformer``, ``upt``, ``ab_upt``, ``transolver``)
+- ``--dataset, -d`` — dataset (``shapenet_car``, ``drivaernet``, ``drivaerml``, ``ahmedml``, ``emmi_wing``)
+- ``--dataset-path`` — path to dataset on disk
+
+**Optional arguments:**
+
+- ``--optimizer, -o`` — optimizer, default: ``adamw`` (also: ``lion``)
+- ``--tracker, -t`` — experiment tracker, default: ``disabled`` (also: ``wandb``, ``trackio``, ``tensorboard``)


These examples are all very specific to what we have right now. The number of models, etc., will grow in the relatively near future. Maybe we should not hardcode everything here, but provide a ref to the dataset/model zoo. Maybe just one one/two examples/

okay good point. I have updated this section to only give one example for each parameter. Then I added a link to the new 'Scaffolding a Project" page, where we list all available parameters.

MauritsBleeker · 2026-03-12T10:16:43Z

docs/source/tutorials/scaffolding_a_new_project.rst

+Scaffolding a New Project
+=========================
+


We need to rethink the role of the trainer/pipeline within this noether-init

How exactly do you mean this?

As I understand it, you init a project with a dataset and a model. But you still need a pipeline/trainer to run anything, right? Do we implement that manually?

The pipeline, trainer etc you need for a specific dataset are specified in the YAML files for each dataset inside scaffold/references. So the user selects a dataset and the reference YAML maps to the correct pipeline. trainer etc

src/noether/scaffold/template_files/configs/callbacks/training_callbacks_caeml.yaml

MauritsBleeker · 2026-03-12T10:41:26Z

src/noether/scaffold/template_files/model/ab_upt.py

+#  Copyright © 2025 Emmi AI GmbH. All rights reserved.
+import torch


We need to rethink the folder structure.

This is not the template for the AB-UPT. Especially with the BaseModel implementation, this is tailored to AeroDynamic CFD (for cars). I think we should reflect this in the file structure.

This is also related to my comment below and how we can handle future different versions of ab_upt etc.

When we do have different AB-UPT implementations in the future we can reflect this in the folder structure here. To keep things simple, for now I would suggest to keep it like this, until we actually need to change it.
However if you think it makes more sense to adapt the structure immediately we can also do it like this.

I also get your point. I would do it directly, such that we enforce this pattern a bit, since we know this will come eventually anyway. However, I also understand if we keep it simple for now.

Maurits and I discussed this and agreed that we would do the renaming and grouping of different (ABUPT) models for different tasks/domains in the future when we have other ABUPT (or UPT etc.) implementations

MauritsBleeker · 2026-03-12T10:42:37Z

docs/source/guides/working_with_cli.rst

+**Required arguments:**
+
+- ``project_name`` (positional) — project name, must be a valid Python identifier (no hyphens)
+- ``--model, -m`` — model architecture (``transformer``, ``upt``, ``ab_upt``, ``transolver``)


This works for now. But soon we will have a different AB-UPT for a different task. How will we do the naming then? This AB-UPT implementation won't work as a universal model class.

For each dataset we have reference files in scaffold/references. These define which pipeline, trainer, callback templates etc. are being copied for a specific dataset.

If in the future we have other implementations of AB-UPT or UPT, etc. for different tasks/datasets we can add a mapping in these reference files such that the correct UPT or AB-UPT implementation is automatically copied from the templates

Does this make sense?

src/noether/scaffold/template_files/pipeline/collators/__init__.py

MauritsBleeker · 2026-03-12T10:59:19Z

src/noether/scaffold/choices.py

+    TRANSFORMER = "transformer"
+    UPT = "upt"
+    AB_UPT = "ab_upt"
+    TRANSOLVER = "transolver"
+


As mentioned earlier. This won't work in the relatively near future when new versions of models are added that solve a different task.

MauritsBleeker · 2026-03-12T11:00:25Z

src/noether/scaffold/cli.py

+    model: Annotated[ModelChoice, typer.Option("--model", "-m", help="Model architecture")] = ...,  # type: ignore[assignment]
+    dataset: Annotated[DatasetChoice, typer.Option("--dataset", "-d", help="Dataset")] = ...,  # type: ignore[assignment]
+    dataset_path: Annotated[str, typer.Option("--dataset-path", help="Path to dataset")] = ...,  # type: ignore[assignment]
+    optimizer: Annotated[OptimizerChoice, typer.Option("--optimizer", "-o", help="Optimizer")] = OptimizerChoice.ADAMW,
+    tracker: Annotated[
+        TrackerChoice, typer.Option("--tracker", "-t", help="Experiment tracker")
+    ] = TrackerChoice.DISABLED,
+    hardware: Annotated[HardwareChoice, typer.Option("--hardware", help="Hardware target")] = HardwareChoice.GPU,
+    project_dir: Annotated[Path, typer.Option("--project-dir", "-l", help="Where to create project dir")] = Path("."),
+    wandb_entity: Annotated[
+        str | None, typer.Option("--wandb-entity", help="W&B entity, e.g. 'my-team' (defaults to your W&B username)")
+    ] = None,


Why are the trainer/pipeline not part of the options?

So as far as I understand the pipelines and trainers depend on the dataset that is being used. So in this regard they are fixed and do not need to be selected by the user.

kinggongzilla requested review from HennerM, MauritsBleeker and Ndles as code owners March 10, 2026 12:52

kinggongzilla marked this pull request as draft March 10, 2026 13:35

Ndles reviewed Mar 11, 2026

View reviewed changes

kinggongzilla added 4 commits March 11, 2026 11:00

feat: add noether-init for scaffolding

08cecb1

Address PR feedback

86a27d7

fix printing noether-train command without line break /

dab3d65

fix: adapt to breaking changes from PR#110

30a90f3

kinggongzilla force-pushed the feat-add-project-setup-cli branch from 065674e to 30a90f3 Compare March 11, 2026 10:27

kinggongzilla added 3 commits March 11, 2026 13:04

add tests for noether-init scaffolding

f23360a

add noether-init documentation

9b94346

fix: only copy model specific experiment YAML

a14adb8

kinggongzilla marked this pull request as ready for review March 11, 2026 13:37

Ndles approved these changes Mar 11, 2026

View reviewed changes

kinggongzilla added 2 commits March 11, 2026 14:48

chore: remove unnecessary comments

5414ab4

chore: update noether-init commands in docs to use uv run

5978a57

MauritsBleeker reviewed Mar 12, 2026

View reviewed changes

kinggongzilla added 3 commits March 12, 2026 13:10

update scaffolding section in how to work with cli docs

7482ed4

remove comment

d20ab4a

remove SparseTensorOffsetCollator from templates

2c059e3


		> You can use `noether-init` to automatically scaffold a complete project with your choice of model, dataset, and configuration. See the [scaffolding tutorial](https://noether-docs.emmi.ai/tutorials/scaffolding_a_new_project.html) for details.

		This folder contains skeleton/boilerplate code for a minimal working `Noether` training pipeline, including all required components.

		# Copyright © 2025 Emmi AI GmbH. All rights reserved.
		import torch

		Scaffolding a New Project
		=========================

Conversation

kinggongzilla commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ndles left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kinggongzilla Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kinggongzilla commented Mar 11, 2026

Uh oh!

MauritsBleeker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MauritsBleeker Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kinggongzilla Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kinggongzilla commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

kinggongzilla Mar 11, 2026 •

edited

Loading

MauritsBleeker Mar 12, 2026 •

edited

Loading

kinggongzilla Mar 12, 2026 •

edited

Loading