Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,30 @@ You might be in a situation when your venv won't be configured as intended anymo
---
# Quickstart

You can run a training job immediately using the [tutorial](./tutorial/README.MD) configuration. For local development (Mac/CPU), use:
> [!IMPORTANT]
> Before training, you need a prepared dataset. To get started with the ShapeNet-Car dataset,
> follow the download and preprocessing steps in the
> [ShapeNet-Car dataset README](./src/noether/data/datasets/cfd/shapenet_car/README.MD).

## Scaffold a New Project

Use `noether-init` to generate a complete training project:

```console
uv run noether-init my_project --model upt --dataset shapenet_car --dataset-path /path/to/shapenet_car
```

Then train with:

```console
uv run noether-train --config-dir my_project/configs --config-name train +experiment=upt
```

See the [scaffolding tutorial](https://noether-docs.emmi.ai/tutorials/scaffolding_a_new_project.html) for all options and the generated project structure.

## Run the Tutorial Example

You can also run a training job immediately using the [tutorial](./tutorial/README.MD) configuration. For local development (Mac/CPU), use:

```console
uv run noether-train --hp tutorial/configs/train_shapenet.yaml \
Expand Down
3 changes: 3 additions & 0 deletions boilerplate_project/README.MD
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
## `Noether` Starter Kit Project
----

> You can use `noether-init` to automatically scaffold a complete project with your choice of model, dataset, and configuration. See the [scaffolding tutorial](https://noether-docs.emmi.ai/tutorials/scaffolding_a_new_project.html) for details.

This folder contains skeleton/boilerplate code for a minimal working `Noether` training pipeline, including all required components.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about pipelines, trainers, etc., those are also quite dataset/model specific? Do we add those later?


1. A dataset that loads (and generates) dummy data.
Expand Down
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@
"**/*.ipynb",
"**/*.md",
"**/.venv/**",
"**/scaffold/template_files/**",
]


Expand Down
29 changes: 29 additions & 0 deletions docs/source/guides/working_with_cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,32 @@ Verify your setup by running the ``estimate`` command, which fetches metadata an
noether-data aws estimate noaa-goes16 ABI-L1b-RadC/2023/001/00/

If you see no errors — congratulations, your setup works!

Scaffolding a New Project
-------------------------

The ``noether-init`` command generates a complete Noether training project with all required modules and configurations.

.. code-block:: bash

uv run noether-init my_project \
--model upt \
--dataset shapenet_car \
--dataset-path /path/to/shapenet_car

**Required arguments:**

- ``project_name`` (positional) — project name, e.g. ``my_project``
- ``--model, -m`` — model architecture, e.g. ``ab_upt``
- ``--dataset, -d`` — dataset, e.g. ``shapenet_car``
- ``--dataset-path`` — path to dataset on disk

**Optional arguments:**

- ``--optimizer, -o`` — optimizer, e.g. ``adamw`` (default)
- ``--tracker, -t`` — experiment tracker, e.g. ``wandb``
- ``--hardware`` — hardware target, e.g. ``gpu`` (default)
- ``--project-dir, -l`` — parent directory for the project folder
- ``--wandb-entity`` — W&B entity name (only used with ``--tracker wandb``)

For all available options, see :doc:`/tutorials/scaffolding_a_new_project`.
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ Welcome to the Noether Framework documentation. Here you will find available API
tutorials/training_first_model_with_code
tutorials/full_code_tutorial
tutorials/how_to_initialize

Walkthrough <https://github.com/Emmi-AI/noether/blob/main/tutorial/README.MD>
tutorials/scaffolding_a_new_project


.. toctree::
Expand Down
1 change: 1 addition & 0 deletions docs/source/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ Step-by-step instructions to get you up and running with Noether.
* :doc:`training_first_model_with_configs`: Learn how to train models by simply editing configuration files.
* :doc:`training_first_model_with_code`: Understand how to use Noether as a library to build custom training scripts.
* `Walkthrough <https://github.com/Emmi-AI/noether/blob/main/tutorial/README.MD>`_: A hands-on guide through the repository's tutorial examples.
* :doc:`scaffolding_a_new_project`: Use ``noether-init`` to generate a complete training project from scratch.
109 changes: 109 additions & 0 deletions docs/source/tutorials/scaffolding_a_new_project.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
Scaffolding a New Project
=========================

Comment on lines +1 to +3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to rethink the role of the trainer/pipeline within this noether-init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly do you mean this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, you init a project with a dataset and a model. But you still need a pipeline/trainer to run anything, right? Do we implement that manually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline, trainer etc you need for a specific dataset are specified in the YAML files for each dataset inside scaffold/references. So the user selects a dataset and the reference YAML maps to the correct pipeline. trainer etc

The ``noether-init`` command generates a complete, ready-to-train Noether project for
models and datasets supported out of the box by the framework. It creates all required Python modules, Hydra configuration
files, schemas, data pipelines, trainers, and callbacks, giving you a working starting point that you
can adapt to your own use case.

Prerequisites
-------------

Before scaffolding, download and preprocess the dataset you want to use. Each dataset has its own
fetching and preprocessing instructions — see the
`Dataset Zoo README <https://github.com/Emmi-AI/noether/blob/main/src/noether/data/datasets/README.md>`_
for an overview and links to dataset-specific guides.

Example Usage
-------------

.. code-block:: bash

uv run noether-init my_project \
--model upt \
--dataset shapenet_car \
--dataset-path /path/to/shapenet_car

This creates a ``my_project/`` directory in the current working directory with a UPT model and the ``shapenet_car`` dataset.
After completion, ``noether-init`` prints a summary of the configuration and the corresponding
``noether-train`` command to start training.

Arguments
---------

.. list-table::
:header-rows: 1
:widths: 25 50 25

* - Option
- Values
- Default
* - ``project_name`` *(required)*
- Positional argument. Must be a valid Python identifier (no hyphens).
-
* - ``--model, -m`` *(required)*
- ``transformer``, ``upt``, ``ab_upt``, ``transolver``
-
* - ``--dataset, -d`` *(required)*
- ``shapenet_car``, ``drivaernet``, ``drivaerml``, ``ahmedml``, ``emmi_wing``
-
* - ``--dataset-path`` *(required)*
- Path to the dataset on disk
-
* - ``--optimizer, -o``
- ``adamw``, ``lion``
- ``adamw``
* - ``--tracker, -t``
- ``wandb``, ``trackio``, ``tensorboard``, ``disabled``
- ``disabled``
* - ``--hardware``
- ``gpu``, ``mps``, ``cpu``
- ``gpu``
* - ``--project-dir, -l``
- Parent directory for the project folder
- current directory
* - ``--wandb-entity``
- W&B entity name (only with ``--tracker wandb``)
- your W&B username

Generated Project Structure
---------------------------

The generated project contains:

.. code-block:: text

my_project/
├── configs/
│ ├── callbacks/ # Training callback configs
│ ├── data_specs/ # Data specification configs
│ ├── dataset_normalizers/
│ ├── dataset_statistics/
│ ├── datasets/ # Dataset configs
│ ├── experiment/ # Experiment configs (one per model)
│ ├── model/ # Model architecture config
│ ├── optimizer/ # Optimizer config
│ ├── pipeline/ # Data pipeline config
│ ├── tracker/ # Experiment tracker config
│ ├── trainer/ # Trainer config
│ └── train.yaml # Main training config
├── model/ # Model implementation
├── schemas/ # Configuration dataclasses
├── pipeline/ # Data processing (collators, sample processors)
├── trainers/ # Training loop implementation
└── callbacks/ # Training callbacks

All Python files are wired up with correct imports for your chosen model, and all Hydra configs reference
your dataset path, optimizer, and tracker selections.

Running Training
----------------

After scaffolding, start training with:

.. code-block:: bash

uv run noether-train \
--config-dir my_project/configs \
--config-name train \
+experiment=upt
8 changes: 8 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,16 @@ Docs = "https://noether-docs.emmi.ai/"
[tool.setuptools_scm]
write_to = "src/noether/_version.py"

[tool.setuptools.package-data]
"noether.scaffold" = ["references/*.yaml", "template_files/**/*"]

[project.scripts]
noether-train = "noether.training.cli.main_train:main"
noether-train-submit-job = "noether.training.cli.submit_job:main"
noether-eval = "noether.inference.cli.main_inference:main"
noether-data = "noether.io.cli.cli:app"
noether-dataset-stats = "noether.data.tools.calculate_statistics:main"
noether-init = "noether.scaffold.cli:app"

# --- Centralized Development & Tooling Dependencies ---
# These are dependencies for developing the *entire* workspace.
Expand Down Expand Up @@ -130,6 +134,10 @@ module = [
# "rtree.*"
]

[[tool.mypy.overrides]]
module = ["noether.scaffold.template_files.*"]
ignore_errors = true

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["src"]
Expand Down
1 change: 1 addition & 0 deletions src/noether/scaffold/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Copyright © 2025 Emmi AI GmbH. All rights reserved.
61 changes: 61 additions & 0 deletions src/noether/scaffold/choices.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Copyright © 2025 Emmi AI GmbH. All rights reserved.

from __future__ import annotations

from enum import StrEnum

_MODEL_CLASS_NAMES: dict[str, str] = {
"transformer": "Transformer",
"upt": "UPT",
"ab_upt": "ABUPT",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ABUPT is the model from the tutorial, not the nother.modeling.models, there it's AnchorBranchedUPT. I think somehow we need to decide which one to use when as it may be confusing 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. For the scaffolding I would keep the ABUPT wrapper around AnchorBranchedUPT, because it gives users a starting point to add their own logic which I think is helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, does it make sense to to create a small README.md inside of the project to highlight these details or maybe make it as part of the documentation of the tool itself?

Copy link
Contributor Author

@kinggongzilla kinggongzilla Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we definitely need documentation for noether-init and will add it there, i think that should be sufficient but open to anything

"transolver": "Transolver",
}


class ModelChoice(StrEnum):
TRANSFORMER = "transformer"
UPT = "upt"
AB_UPT = "ab_upt"
TRANSOLVER = "transolver"

Comment on lines +16 to +20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier. This won't work in the relatively near future when new versions of models are added that solve a different task.

@property
def class_name(self) -> str:
return _MODEL_CLASS_NAMES[self.value]

@property
def module_name(self) -> str:
return self.value

@property
def schema_module(self) -> str:
return f"{self.value}_config"

@property
def config_class_name(self) -> str:
return f"{self.class_name}Config"


class DatasetChoice(StrEnum):
SHAPENET_CAR = "shapenet_car"
DRIVAERNET = "drivaernet"
DRIVAERML = "drivaerml"
AHMEDML = "ahmedml"
EMMI_WING = "emmi_wing"


class OptimizerChoice(StrEnum):
ADAMW = "adamw"
LION = "lion"


class TrackerChoice(StrEnum):
WANDB = "wandb"
TRACKIO = "trackio"
TENSORBOARD = "tensorboard"
DISABLED = "disabled"


class HardwareChoice(StrEnum):
GPU = "gpu"
MPS = "mps"
CPU = "cpu"
96 changes: 96 additions & 0 deletions src/noether/scaffold/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Copyright © 2025 Emmi AI GmbH. All rights reserved.

from pathlib import Path
from typing import Annotated

import typer

from .choices import DatasetChoice, HardwareChoice, ModelChoice, OptimizerChoice, TrackerChoice
from .config import ScaffoldConfig, resolve_config
from .generator import generate_project

app = typer.Typer(
name="noether-init",
help="Scaffold a new Noether training project.",
add_completion=False,
)


@app.command()
def main(
project_name: Annotated[
str,
typer.Argument(
help="Project name (valid Python identifier). Examples: 'my_project', 'MyProject1'). No hyphens allowed."
),
],
model: Annotated[ModelChoice, typer.Option("--model", "-m", help="Model architecture")] = ..., # type: ignore[assignment]
dataset: Annotated[DatasetChoice, typer.Option("--dataset", "-d", help="Dataset")] = ..., # type: ignore[assignment]
dataset_path: Annotated[str, typer.Option("--dataset-path", help="Path to dataset")] = ..., # type: ignore[assignment]
optimizer: Annotated[OptimizerChoice, typer.Option("--optimizer", "-o", help="Optimizer")] = OptimizerChoice.ADAMW,
tracker: Annotated[
TrackerChoice, typer.Option("--tracker", "-t", help="Experiment tracker")
] = TrackerChoice.DISABLED,
hardware: Annotated[HardwareChoice, typer.Option("--hardware", help="Hardware target")] = HardwareChoice.GPU,
project_dir: Annotated[Path, typer.Option("--project-dir", "-l", help="Where to create project dir")] = Path("."),
wandb_entity: Annotated[
str | None, typer.Option("--wandb-entity", help="W&B entity, e.g. 'my-team' (defaults to your W&B username)")
] = None,
Comment on lines +27 to +38
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the trainer/pipeline not part of the options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as far as I understand the pipelines and trainers depend on the dataset that is being used. So in this regard they are fixed and do not need to be selected by the user.

) -> None:
"""Scaffold a new Noether training project."""
# Validate project name
if not project_name.isidentifier():
typer.echo(f"Error: '{project_name}' is not a valid Python identifier.", err=True)
raise typer.Exit(1)

# Resolve to absolute path
project_dir = (project_dir / project_name).resolve()

# Check if project dir already exists
if project_dir.exists():
typer.echo(f"Error: Directory already exists: {project_dir}", err=True)
raise typer.Exit(1)

# Build config
config = resolve_config(
project_name=project_name,
model=model,
dataset=dataset,
dataset_path=dataset_path,
optimizer=optimizer,
tracker=tracker,
hardware=hardware,
project_dir=project_dir,
wandb_entity=wandb_entity,
)

# Generate
typer.echo(f"Creating project '{project_name}' at {project_dir}")
generate_project(config)

# Print summary
_print_summary(config)


def _print_summary(config: ScaffoldConfig) -> None:
typer.echo(
"\nProject created successfully!\n"
"Configuration:\n"
f" Project: {config.project_name}\n"
f" Model: {config.model.value}\n"
f" Dataset: {config.dataset.value}\n"
f" Optimizer: {config.optimizer.value}\n"
f" Tracker: {config.tracker.value}\n"
f" Hardware: {config.hardware.value}\n"
f" Path: {config.project_dir}\n"
)
# Suggest run command
typer.echo(
"To train, run:\n"
f" uv run noether-train --config-dir {config.project_dir}/configs --config-name train +experiment={config.model.value}\n\n"
"Experiment configs for all models are in configs/experiment/."
)


if __name__ == "__main__":
app()
Loading
Loading