Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 130 additions & 29 deletions content/docs/reference/sdk/highlighter-scaffolds.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,27 +19,15 @@ top = false
If you haven't already, you should checkout [Getting Started With Highlighter SDK](../getting-started-with-highlighter-sdk/)


## Create a new Highlighter project scaffold

Highlighter project scaffolds provide a bunch of the boilerplate we all hate
when starting any new project.

```bash
# create a Highlighter project, follow the prompts
hl new .
```

## Create simple Agent

1. If you have not run `hl new .` to create a scaffold then do that first
- This should have created a directory from the `title_slug` field of prompts, `cd` to that.
- `pip install -e .`
2. `hl generate agent .`. This will:
1. `hl generate agent .`. This will:
- create an `agents/` dir with an agent definition and a data source Capability
for the data type specified in the prompts
- create a `src/<title_slug>/<capability_name>.py` with a dummy implementation
- add a `## Run Agent` section to your `README.md`
3. Run the command in the `## Run Agent` section of the `README.md`
2. Run the command in the `## Run Agent` section of the `README.md`

## Create a new Capability and add it to the Agent
- Download some weights from Huggingface, https://huggingface.co/SpotLab/YOLOv8Detection/blob/3005c6751fb19cdeb6b10c066185908faf66a097/yolov8n.onnx
- Make a `weights` directory and move them to `weights/yolov8n.onnx`
Expand Down Expand Up @@ -152,24 +140,137 @@ class MyPersonDetector(OnnxYoloV8):
hl agent run agents/YOU_AGENT_DEF.json -f VIDEO_PATH
```

## Training
### Training

Training a model is done in 3 steps (4 if you want to tweak the config)
## Training a model using the Highlighter SDK

1. create a TrainingRun in Highlighter and configure the Datasets. (You can
ignore the reset of the configuration for now)

2. `generate` a new training run in your `scaffold` directory. This will download the Dataset
annotations defined in the Highlighter TrainingRun and add a directory to the
`ml_training` directory of the scaffold.

```
hl generate training-run TRAINING_RUN_ID MODEL_TYPE PROJECT_DIR
Concepts

* **Highlighter Training Run**: A record in the Highlighter tool used to configure a training run.
* **Evaluation**: A page in the Highlighter tool used to record and track the performance of trained models.
* **Artefact**: A file containing model weights and accompanying information needed to run a model for inference.(Enables saving and tracking of trained models and results in Highlighter, ensuring others can access and reuse them without retraining.)
* **Highlighter SDK**: A python library including a CLI for interacting with Highlighter and doing common Highlighter related things, like training models.


To train a model using the SDK you must first setup the training run in the Highlighter Account. Then you can start the training run on the development machine using the Highlighter SDK

# MODEL_TYPE: {yolo-det} <-- More to come
## Create New Training Run In Highlighter
1. Navigate to the Training tab in Highlighter
<br />
{{ resize_image(path="docs/user-manual/resources/training_tab_image.png", width=200, height=1, op="fit_width") }}
<br />

2. Click **Train a new model** or select an existing training run (we're going to assume you're training a new model)
3. Fill in the **Name** field with a meaningful name for your training run
4. Select the Capability the what the training run is for.
5. Select the **Evaluation** that will store the training run's evaluation metrics.
6. Select **Datasets**

A training run must have at least one dataset for training and another of testing. There can be no overlay between the training and test annotations.

<br />
{{ resize_image(path="docs/user-manual/resources/dataset_form.png", width=400, height=1, op="fit_width") }}
<br />


7. **Model Template** can be ignored, leave as default.
2. **Config Override** can be ignored, leave as default.
3. Click **Save Training Run**.
4. Note the Training Run Id from the URL, (ie: https://highlighter-marketing-demo.highlighter.ai/training_runs/123).The training run id is 123


## Start Model Training Using The Highlighter SDK

1. `hl generate training-run TRAINING_RUN_ID {yolo-det|yolo-seg|yolo-cls} ML_TRAINING_DIR`
Generates a local training directory for a specified Highlighter training run. Downloads the configuration files, datasets, and model weights (for detection, segmentation, or classification) into the provided directory, allowing you to reproduce, resume, or evaluate the training locally.
```bash
hl generate training-run 123 yolo-seg .
```
optionally you can modify the `PROJECT_DIR/ml_training/TRAINING_RUN_ID/trainer.py`
file to customize your training
Where `123` is the Training Run ID, `yolo-seg` is the type of model to train, and `.` (dot) tells the sdk to create the directory (./123) in the current working directory that will store the training files.

2. **OPTIONAL**: Edit the `cfg.yaml` using your favourite text editor: eg: `vim 123/cfg.yaml`.
3. **OPTIONAL**: Override the default `Trainer` class: You may need to override to default `Trainer` class if you need to:
- Customize the Highlighter Dataset pre-processing to filter out annotations before saving the dataset.
- Customize the cropping that occurs prior to saving a classification dataset
- Train the model on a attribute that is not an OBJECT CLASS
Below is commented example of a custom `Trainer` class that does all 3 of the above.

3. `start` the training: `hl train start TRAINING_RUN_DIR`
4. Modify the config in `ml_training/TRAINING_RUN_ID/` and train again
To create your own you must create a `trainer.py` in your training run's directory.

```python
"""
Import the implementation of the Trainer class you wish to override
"""
from highlighter.trainers.yolov11 import YoloV11Trainer
from highlighter.datasets.cropping import CropArgs
import uuid

"""Inherate from the base implementation
"""
class Trainer(YoloV11Trainer):

crop_args: CropArgs = CropArgs(
# If true, expect to crop non orthogonal rectangles.
# Once extracted the crops will be transformed to be an orthogonal
# rectangle of the same size. Use warped_wh to transform the
# rotated rectangles to a different shape
crop_rotated_rect=False,

# Optional[Tuple[int, int]]
warped_wh=None,

# Optional[float]: proportionally scale the rectangle before cropping
scale=1.0,

# Optional[int]: pad the rectangle before cropping
pad=0,
)

# Only use attributes with this specific attribute_id as the
# categories for your dataset
category_attribute_id: UUID = uuid.UUID(CUSTOM_ATTRIBUTE_UUID)

# Optionally define a list of attribute values to use for your dataset.
# If None, then use the output attributes defined in the TrainingConfigType.input_output_schema.
# If categories is set then the YoloWrite will respect the order they are
# listed.
categories: Optional[List[str]] = None

def filter_dataset(self, dataset: Dataset) -> Dataset:
"""Optionally add some code to filter the Highlighter Datasets as required.
The YoloWriter will only use entities with both a pixel_location attribute
and a 'category_attribute_id' attribute when converting to the Yolo dataset format.
It will the unique values for the object_class attribute as the detection
categories.

For example, if you want to train a detector that finds Apples and Bananas,
and your taxonomy looks like this:

- object_class: Apple
- object_class: Orange
- object_class: Banana

Then you may do something like this:

adf = combined_ds.annotations_df
ddf = combined_ds.data_files_df

orange_entity_ids = adf[(adf.attribute_id == OBJECT_CLASS_ATTRIBUTE_UUID) &
(adf.value == "Orange")].entity_id.unique()

# Filter out offending entities
adf = adf[adf.entity_id.isin(orange_entity_ids)]

# clean up images that are no longer needed
ddf = ddf[ddf.data_file_id.isin(adf.data_file_id)]

combined_ds.annotations_df = adf
"""
return dataset

```
4. Start training hl train start **TRAINING_RUN_DIR** , eg `hl train start 123/`.
5. **OPTIONAL**: Run evaluation hl train evaluate **TRAINING_RUN_DIR CHECKPOINT CONFIG** , use the `--create` flag to upload the evaluation metrics to the Highlighter Evaluation linked to the Training Run.
6. **OPTIONAL**: Export the trained model and upload as a Highlighter Training Run **Artefact**.Eg: `hl training-run artefact create -i 123 -a 123/runs/weights/artefact.yaml`.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.