silverpond · DevanshThawani · Nov 11, 2025 · Nov 26, 2025 · Nov 26, 2025 · Nov 26, 2025
diff --git a/content/docs/reference/sdk/highlighter-scaffolds.md b/content/docs/reference/sdk/highlighter-scaffolds.md
@@ -19,27 +19,15 @@ top = false
 If you haven't already, you should checkout [Getting Started With Highlighter SDK](../getting-started-with-highlighter-sdk/)
 
 
-## Create a new Highlighter project scaffold
-
-Highlighter project scaffolds provide a bunch of the boilerplate we all hate
-when starting any new project.
-
-```bash
-# create a Highlighter project, follow the prompts
-hl new .
-```
-
 ## Create simple Agent
 
-1. If you have not run `hl new .` to create a scaffold then do that first
-  - This should have created a directory from the `title_slug` field of prompts, `cd` to that.
-  - `pip install -e .`
-2. `hl generate agent .`. This will:
+1. `hl generate agent .`. This will:
   - create an `agents/` dir with an agent definition and a data source Capability
   for the data type specified in the prompts
   - create a `src/<title_slug>/<capability_name>.py` with a dummy implementation
   - add a `## Run Agent` section to your `README.md`
-3. Run the command in the `## Run Agent` section of the `README.md`
+2. Run the command in the `## Run Agent` section of the `README.md`
+
 ## Create a new Capability and add it to the Agent
 - Download some weights from Huggingface, https://huggingface.co/SpotLab/YOLOv8Detection/blob/3005c6751fb19cdeb6b10c066185908faf66a097/yolov8n.onnx
 - Make a `weights` directory and move them to `weights/yolov8n.onnx`
@@ -152,24 +140,137 @@ class MyPersonDetector(OnnxYoloV8):
 hl agent run agents/YOU_AGENT_DEF.json -f VIDEO_PATH
 ```
 
-## Training
+### Training
 
-Training a model is done in 3 steps (4 if you want to tweak the config)
+## Training a model using the Highlighter SDK
 
-  1. create a TrainingRun in Highlighter and configure the Datasets. (You can 
-ignore the reset of the configuration for now)
 
-  2. `generate` a new training run in your `scaffold` directory. This  will download the Dataset 
-  annotations defined in the Highlighter TrainingRun and add a directory to the
-  `ml_training` directory of the scaffold.
 
-```
-hl generate training-run TRAINING_RUN_ID MODEL_TYPE PROJECT_DIR
+Concepts
+
+* **Highlighter Training Run**: A record in the Highlighter tool used to configure a training run.
+* **Evaluation**: A page in the Highlighter tool used to record and track the performance of trained models.
+* **Artefact**: A file containing model weights and accompanying information needed to run a model for inference.(Enables saving and tracking of trained models and results in Highlighter, ensuring others can access and reuse them without retraining.)
+* **Highlighter SDK**: A python library including a CLI for interacting with Highlighter and doing common Highlighter related things, like training models.
+
+
+To train a model using the SDK you must first setup the training run in the Highlighter Account. Then you can start the training run on the development machine using the Highlighter SDK
 
-# MODEL_TYPE: {yolo-det} <-- More to come
+## Create New Training Run In Highlighter
+1. Navigate to the Training tab in Highlighter
+<br />
+{{ resize_image(path="docs/user-manual/resources/training_tab_image.png", width=200, height=1, op="fit_width") }}
+<br />
+
+2. Click **Train a new model** or select an existing training run (we're going to assume you're training a new model)
+3. Fill in the **Name** field with a meaningful name for your training run
+4. Select the Capability the what the training run is for.
+5. Select the **Evaluation** that will store the training run's evaluation metrics.
+6. Select **Datasets**
+
+   A training run must have at least one dataset for training and another of testing. There can be no overlay between the training  and test annotations.
+
+<br />
+{{ resize_image(path="docs/user-manual/resources/dataset_form.png", width=400, height=1, op="fit_width") }}
+<br />
+
+
+7. **Model Template** can be ignored, leave as default.
+2. **Config Override** can be ignored, leave as default.
+3. Click **Save Training Run**.
+4. Note the Training Run Id from the URL, (ie: https://highlighter-marketing-demo.highlighter.ai/training_runs/123).The training run id is 123
+
+
+## Start Model Training Using The Highlighter SDK
+
+1. `hl generate training-run TRAINING_RUN_ID {yolo-det|yolo-seg|yolo-cls} ML_TRAINING_DIR`
+Generates a local training directory for a specified Highlighter training run. Downloads the configuration files, datasets, and model weights (for detection, segmentation, or classification) into the provided directory, allowing you to reproduce, resume, or evaluate the training locally.
+```bash
+hl generate training-run 123 yolo-seg .
 ```
-  optionally you can modify the `PROJECT_DIR/ml_training/TRAINING_RUN_ID/trainer.py`
-  file to customize your training
+Where `123` is the Training Run ID, `yolo-seg` is the type of model to train, and `.` (dot) tells the sdk to create the directory (./123) in the current working directory that will store the training files.
+
+2. **OPTIONAL**: Edit the `cfg.yaml` using your favourite text editor: eg: `vim 123/cfg.yaml`.
+3. **OPTIONAL**: Override the default `Trainer` class: You may need to override to default `Trainer` class if you need to:
+  - Customize the Highlighter Dataset pre-processing to filter out annotations before saving the dataset.
+  - Customize the cropping that occurs prior to saving a classification dataset
+  - Train the model on a attribute that is not an OBJECT CLASS
+  Below is commented example of a custom `Trainer` class that does all 3 of the above.
 
-  3. `start` the training: `hl train start TRAINING_RUN_DIR`
-  4. Modify the config in `ml_training/TRAINING_RUN_ID/` and train again
+  To create your own you must create a `trainer.py` in your training run's directory.
+
+```python
+"""
+Import the implementation of the Trainer class you wish to override
+"""
+from highlighter.trainers.yolov11 import YoloV11Trainer
+from highlighter.datasets.cropping import CropArgs
+import uuid
+
+"""Inherate from the base implementation
+"""
+class Trainer(YoloV11Trainer):
+
+    crop_args: CropArgs = CropArgs(
+            # If true, expect to crop non orthogonal rectangles.
+            # Once extracted the crops will be transformed to be an orthogonal
+            # rectangle of the same size. Use warped_wh to transform the 
+            # rotated rectangles to a different shape
+            crop_rotated_rect=False,
+
+            # Optional[Tuple[int, int]]
+            warped_wh=None,
+
+            # Optional[float]: proportionally scale the rectangle before cropping
+            scale=1.0,
+
+            # Optional[int]: pad the rectangle before cropping
+            pad=0,
+            )
+
+    # Only use attributes with this specific attribute_id as the
+    # categories for your dataset
+    category_attribute_id: UUID = uuid.UUID(CUSTOM_ATTRIBUTE_UUID)
+
+    # Optionally define a list of attribute values to use for your dataset.
+    # If None, then use the output attributes defined in the TrainingConfigType.input_output_schema.
+    # If categories is set then the YoloWrite will respect the order they are
+    # listed.
+    categories: Optional[List[str]] = None
+
+    def filter_dataset(self, dataset: Dataset) -> Dataset:
+        """Optionally add some code to filter the Highlighter Datasets as required.
+        The YoloWriter will only use entities with both a pixel_location attribute
+        and a 'category_attribute_id' attribute when converting to the Yolo dataset format.
+        It will the unique values for the object_class attribute as the detection
+        categories.
+
+        For example, if you want to train a detector that finds Apples and Bananas,
+        and your taxonomy looks like this:
+
+            - object_class: Apple
+            - object_class: Orange
+            - object_class: Banana
+
+        Then you may do something like this:
+
+            adf = combined_ds.annotations_df
+            ddf = combined_ds.data_files_df
+
+            orange_entity_ids = adf[(adf.attribute_id == OBJECT_CLASS_ATTRIBUTE_UUID) &
+                                   (adf.value == "Orange")].entity_id.unique()
+
+            # Filter out offending entities
+            adf = adf[adf.entity_id.isin(orange_entity_ids)]
+
+            # clean up images that are no longer needed
+            ddf = ddf[ddf.data_file_id.isin(adf.data_file_id)]
+
+            combined_ds.annotations_df = adf
+        """
+        return dataset
+
+```
+4. Start training hl train start **TRAINING_RUN_DIR** , eg `hl train start 123/`.
+5. **OPTIONAL**: Run evaluation hl train evaluate **TRAINING_RUN_DIR CHECKPOINT CONFIG** , use the `--create` flag to upload the evaluation metrics to the Highlighter Evaluation linked to the Training Run.
+6. **OPTIONAL**: Export the trained model and upload as a Highlighter Training Run **Artefact**.Eg: `hl training-run artefact create -i 123 -a 123/runs/weights/artefact.yaml`.
diff --git a/content/docs/user-manual/resources/dataset_form.png b/content/docs/user-manual/resources/dataset_form.png
diff --git a/content/docs/user-manual/resources/training_tab_image.png b/content/docs/user-manual/resources/training_tab_image.png