diff --git a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/SKILL.md b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/SKILL.md index b907063734..0aaf6ac106 100644 --- a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/SKILL.md +++ b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/SKILL.md @@ -111,9 +111,9 @@ Training never runs inside the `nemo` CLI process. After `submit`, the platform' ## Gotchas - Resolve the CLI per **Pre-flight — CLI resolution** before any `nemo …` command; run from the **nemo-platform** git root, not a plugin subfolder. -- Set `NEMO_BASE_URL` (or `NMP_BASE_URL`) only when the user gives a platform URL; default `http://127.0.0.1:8080` (same as `http://localhost:8080`). Track whether the user **overrode** the base URL — see **Platform unreachable** below. +- Set `NMP_BASE_URL` only when the user gives a platform URL; default `http://127.0.0.1:8080` (same as `http://localhost:8080`). The `nemo` CLI reads this env var (see SDK `NMP_BASE_URL`). Track whether the user **overrode** the base URL — see **Platform unreachable** below. - **Platform unreachable** — if any platform API call fails with a connection error (`Connection error`, timeout, refused): - - **User gave a custom URL** (e.g. `10.0.0.51:8080`) or you exported a non-default `NEMO_BASE_URL` / `NMP_BASE_URL`: stop and tell the user the platform is not reachable at that address. Do **not** offer to start local services. + - **User gave a custom URL** or you exported a non-default `NMP_BASE_URL`: stop and tell the user the platform is not reachable at that address. Do **not** offer to start local services. - **Default URL only** (no user override): **ask** whether to start the platform locally. If they agree, from the **nemo-platform** git root run in the **background**: ```bash @@ -139,8 +139,10 @@ Training never runs inside the `nemo` CLI process. After `submit`, the platform' - **Do not use local `docker info`** to pick automodel vs unsloth. Run `nemo jobs list-execution-profiles -f json` against the user's platform (login first only if auth is enabled — see **Authentication**; see `references/troubleshooting.md`). Default output is a table — **`-f json` is required** for scripting; parse **stdout only** (do not pipe `2>&1` into `json.load`). - **Do not merge stderr into stdout when parsing JSON** — `submit`, `explain`, and `-f json` commands write **JSON on stdout**; harmless warnings like `Configuration file not found, using defaults` go to **stderr**. Piping with **`2>&1`** before `json.load` raises `JSONDecodeError` even when submit **succeeded** — a common cause of **duplicate jobs** when the agent re-submits after a parse error. Parse stdout only; redirect stderr if needed (`2>/dev/null`). See `references/troubleshooting.md` § **Parsing CLI JSON**. - For submit/image/plugin errors (both backends), read `references/troubleshooting.md`. Unsloth needs the `nmp-unsloth-training` container image on the **platform host's** Docker daemon (see `docker/unsloth/README.md`). -- **Missing training image on a remote platform** — if the user gave a non-localhost `NEMO_BASE_URL` / `NMP_BASE_URL` (e.g. `10.0.0.51:8080`) and the job errors with `Failed to pull image`, `manifest unknown`, or missing `nmp-unsloth-training` / automodel training image: **do not** run `docker build`, `docker pull`, or `docker buildx bake` on the agent machine. Report with **Report to user** (use **Output adapter fileset (planned):** on error), then append on-target build steps from `references/troubleshooting.md` § **Missing training images**. +- **Missing training image on a remote platform** — if the user gave a non-localhost `NMP_BASE_URL` and the job errors with `Failed to pull image`, `manifest unknown`, or missing `nmp-unsloth-training` / automodel training image: **do not** run `docker build`, `docker pull`, or `docker buildx bake` on the agent machine. Report with **Report to user** (use **Output adapter fileset (planned):** on error), then append on-target build steps from `references/troubleshooting.md` § **Missing training images**. - **Gated HuggingFace models** (Llama, Gemma, …) — confirm `hf-token` + fileset `token_secret` before submit; download fails with `Failed to access upstream storage` / 502 when missing. See **HuggingFace token (gated models)** and `references/troubleshooting.md` § **Gated HuggingFace models**. +- **Post-training eval format** — use the same CHAT `messages` JSONL as training. **Do not** flatten rows to `prompt`/`expected` for the evaluator. Send `messages[:-1]` at inference (exclude final assistant label); score against `messages[-1].content`. See `references/post-training-eval.md` and `references/eval_helpers.py`. +- **LoRA adapters load automatically for eval** — when a LoRA job completes (`save_method: lora`), the adapter is registered on the base model entity and hot-reloaded on any **READY** deployment with `lora_enabled: true`. **Do not** create or update deployments before LoRA eval. **Full SFT** (`finetuning_type: all_weights`) and **merged checkpoints** (`merged_16bit` / `merged_4bit`) register a new **model** entity at `output.name` — **deploy that entity for inference** before chat or eval; full weights are not hot-reloaded onto the base deployment. For LoRA eval, route through the **provider** gateway (`/provider//-/v1` with `model: default--`); the model-entity path (`/model//-/v1`) always hits the base model. See `references/post-training-eval.md` § **Request routing (base vs LoRA)**. ## Workflow @@ -148,7 +150,7 @@ Common steps then **branch by plugin pick**: ```text - [ ] Resolve CLI (Pre-flight — CLI resolution); cd nemo-platform -- [ ] export NEMO_BASE_URL (if user provided endpoint); note whether base URL is user-overridden +- [ ] export NMP_BASE_URL (if user provided endpoint); note whether base URL is user-overridden - [ ] nemo auth status — skip login if auth disabled; if auth enabled and unsigned JWT allowed, `nemo auth login --unsigned-token --email <…>`; if OIDC, `nemo auth login` - [ ] nemo jobs list-execution-profiles -f json — apply Plugin pick rules above (retry login on 401/403) - [ ] On connection error: default URL → ask to start platform (see Platform unreachable); custom URL → report unreachable and stop @@ -162,12 +164,14 @@ Common steps then **branch by plugin pick**: - [ ] nemo customization automodel submit /tmp/job.json --workspace default - [ ] Poll until top-level terminal (`poll_customization_job.sh`; default 15s interval, or 30–60s manual polls) - [ ] Report using output template below +- [ ] Optional: compare base vs adapter on validation — `references/eval_helpers.py …` (LoRA only; CHAT format; adapters hot-reload automatically; see `references/post-training-eval.md`) # unsloth branch (submit → Docker GPU job) - [ ] Write /tmp/job.json using the UnslothJobInput shape (see Fast path — unsloth) - [ ] nemo customization unsloth submit /tmp/job.json --workspace default [--profile ] - [ ] Poll until top-level terminal (`poll_customization_job.sh unsloth-`; default 15s interval) - [ ] Report using output template below +- [ ] Optional: compare base vs adapter on validation — `references/eval_helpers.py …` (LoRA only; CHAT format; adapters hot-reload automatically; see `references/post-training-eval.md`) ``` ## Fast path — automodel @@ -177,7 +181,7 @@ Substitute ``, ``, ``, ``, ` **Setup** ```bash -export NEMO_BASE_URL=http://127.0.0.1:8080 # user override only +export NMP_BASE_URL=http://127.0.0.1:8080 # user override only cd /path/to/nemo-platform nemo auth status # skip login if auth disabled; if enabled + unsigned JWT allowed → login --unsigned-token --email admin@example.com nemo jobs list-execution-profiles -f json # platform GPU profiles → automodel; set training.execution_profile if needed @@ -399,7 +403,7 @@ Pick the path by whether the **base model fits in ~48 GB on one GPU** (LoRA or f | 4B–8B | 1 | 2 | `5e-6` | | >8B | 1 | 1 | lower LR or use TP / shorter seq | -Output type is **model** (full checkpoint), not adapter. Expect much longer runs than LoRA at the same batch. +Output type is **model** (full checkpoint), not adapter. Expect much longer runs than LoRA at the same batch. **Inference:** deploy `default/` as a new model entity — full SFT does not hot-reload onto the base model's LoRA deployment. ### `max_seq_length` scaling @@ -458,7 +462,7 @@ There is no `parallelism` block, no TP / PP / DP, no GBS divisibility math. Mult `load_in_4bit: true` (default) keeps base weights in 4-bit, which is what makes the "smaller per-device batch on bigger models" rule milder than vanilla HF. If you raise `per_device_train_batch_size` and hit OOM (exit 137) or training crashes (exit 1), halve `per_device_train_batch_size` first and double `gradient_accumulation_steps` to keep the effective batch the same. -**Save method.** Default `output.save_method: "lora"` (adapter only — small, fast, deploy-friendly). Use `"merged_16bit"` if the user wants a full-weight checkpoint to deploy without an adapter loader; `"merged_4bit"` only when storage is tight (lossy). Merged methods require `training.finetuning_type: "lora"`. +**Save method.** Default `output.save_method: "lora"` (adapter only — small, fast, hot-reloads on LoRA-enabled deployments). Use `"merged_16bit"` if the user wants a full-weight checkpoint to deploy as a standalone model entity; `"merged_4bit"` only when storage is tight (lossy). Merged methods require `training.finetuning_type: "lora"`. Merged and full SFT outputs must be **deployed for inference** — they do not hot-reload onto the base adapter deployment. **Tuning loop (unsloth):** @@ -513,7 +517,7 @@ After polling reaches a **terminal** status (`completed`, `error`, or `cancelled | Status | Notes | |--------|-------| -| `completed` | Brief success summary (e.g. adapter registered on model entity). When `metrics.train_loss` has ≥2 entries, add a loss-drop sentence: *Loss dropped from \ at step 1 to \ at step \; validation loss was \.* | +| `completed` | Brief success summary. LoRA (`save_method: lora`): adapter registered on base model entity. Full SFT / merged checkpoint: new model entity at `output.name`. When `metrics.train_loss` has ≥2 entries, add a loss-drop sentence: *Loss dropped from \ at step 1 to \ at step \; validation loss was \.* Append **Using the adapter** (LoRA) or **Using the fine-tuned model** (full SFT / merged) with discovered provider name and concrete gateway URLs (see below). | | `error` | Quote `error_details.message` or the failing step; note setup that succeeded before the failure (auth, dataset upload, submit). | | `cancelled` | Cancellation reason if available. | @@ -580,21 +584,113 @@ After polling reaches a **terminal** status (`completed`, `error`, or `cancelled | Output save method | lora | ``` -**Using the adapter (`completed` only)** — after **Training configuration**, run `nemo models get --workspace default` (parse stdout only) to confirm the adapter is listed under `adapters`. Append this section: +**Using the output (`completed` only)** — after **Training configuration**, branch on output type: + +| Output | When | Report section | +|--------|------|----------------| +| LoRA adapter | `save_method: lora` (default) | **Using the adapter** — below | +| Full model | `finetuning_type: all_weights`, or `save_method: merged_16bit` / `merged_4bit` | **Using the fine-tuned model** — below | + +### Using the adapter (LoRA / `save_method: lora`) + +Run these discovery commands (parse stdout only; do not pipe `2>&1` into JSON parsers): + +1. `nemo models get --workspace default` — confirm `` appears under `adapters` with `enabled: true`. +2. `nemo inference providers list --workspace default -f json` — pick a **READY** provider whose `served_models` includes `default/` (base entity). Record its `name` as `` (often matches the deployment name). + +On a deployment with `lora_enabled: true`, the adapter is **hot-reloaded automatically** — no new deployment, deployment update, or provider reconfiguration before inference or post-training eval. Append this section with **concrete URLs and provider name** from discovery: ```markdown ### Using the adapter -The adapter `` is attached to `default/`. List adapters with: +The adapter `` is registered on `default/`. Weights are hot-reloaded on LoRA-enabled deployments serving the **base** entity — no new deployment or provider update after training. + +#### Request routing (base vs LoRA) + +| Target | Gateway path | OpenAI base URL | Request `"model"` field | +|--------|--------------|-----------------|-------------------------| +| **Base** weights | model-entity | `$NMP_BASE_URL/apis/inference-gateway/v2/workspaces/default/model//-/v1` | `default/` | +| **LoRA adapter** | **provider** | `$NMP_BASE_URL/apis/inference-gateway/v2/workspaces/default/provider//-/v1` | `default--` | + +**Common mistake:** posting to the model-entity URL with `"model": "default--"` still runs the **base** model. Base-vs-adapter eval will look identical until LoRA requests use the **provider** URL above. See `references/post-training-eval.md` § **Request routing (base vs LoRA)**. + +#### Chat inference (CHAT-trained models) + +Match training context at inference — send **`messages[:-1]`** (all turns except the final assistant label). Single-turn rows are just the user message; multi-turn rows keep prior user/assistant history. + +| Setting | Value | Why | +|---------|-------|-----| +| `messages` | All turns except the final assistant label from the JSONL row | Same decode path as SFT | +| `max_tokens` | `64` for short assistant labels | Training targets are brief (e.g. MCQA choice text) | +| `temperature` | `0` | Reproducible eval / regression checks | +| `chat_template_kwargs.enable_thinking` | `false` for Qwen3 short-answer SFT | Thinking mode needs extra tokens and changes output shape vs training | + +#### Example — LoRA adapter via provider + +\`\`\`bash +export NMP_BASE_URL= # omit when using default localhost +nemo inference gateway provider post v1/chat/completions --workspace default \\ + --body '{ + "model": "default--", + "messages": [], + "max_tokens": 64, + "temperature": 0, + "chat_template_kwargs": {"enable_thinking": false} + }' +\`\`\` + +#### Example — base model via model-entity (comparison) + +\`\`\`bash +export NMP_BASE_URL= +nemo inference gateway model post v1/chat/completions --workspace default \\ + --body '{ + "model": "default/", + "messages": [], + "max_tokens": 64, + "temperature": 0, + "chat_template_kwargs": {"enable_thinking": false} + }' +\`\`\` + +#### Post-training eval (optional) + +Validation loss from training is **not** accuracy. To compare base vs adapter on the validation split with correct routing: \`\`\`bash -export NEMO_BASE_URL= # omit line when using default localhost cd /path/to/nemo-platform -nemo models get --workspace default +uv run python plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/eval_helpers.py \\ + --model-entity \\ + --adapter \\ + --provider \\ + --dataset-fileset \\ + --split validation.jsonl \`\`\` + +Uses CHAT `messages` rows unchanged from the training fileset (`messages[:-1]` at inference). Repeat `--adapter` for multi-adapter compare. `--provider` is optional when a READY provider is auto-discovered. Set `NMP_BASE_URL` (or pass `--base-url`) when the platform is not localhost. LoRA only — full SFT / merged outputs need a deployed model entity (see **Using the fine-tuned model**). +``` + +### Using the fine-tuned model (full SFT / merged checkpoint) + +When `finetuning_type: all_weights` or `save_method` is `merged_16bit` / `merged_4bit`, the job registers a **model** entity at `output.name` with full fine-tuned weights. **Deploy that entity before inference or eval** — full checkpoints are not hot-reloaded onto the base model's LoRA deployment. + +1. `nemo models get --workspace default` — confirm the fine-tuned model entity exists. +2. Create or update an inference deployment / provider that serves `default/` (same workflow as deploying any model entity). +3. Append this section with the **READY** provider or deployment name and concrete gateway URL. + +```markdown +### Using the fine-tuned model + +Fine-tuned weights are on model entity `default/`. Unlike LoRA adapters, full checkpoints **require a new inference deployment** (or provider update) before chat or eval. + +| Target | Gateway path | OpenAI base URL | Request `"model"` field | +|--------|--------------|-----------------|-------------------------| +| Fine-tuned model | model-entity | `$NMP_BASE_URL/apis/inference-gateway/v2/workspaces/default/model//-/v1` | `default/` | + +Use the same chat settings as LoRA inference (`messages[:-1]`, `max_tokens`, `temperature`, `enable_thinking` as appropriate). Post-training eval: run generation eval against this model-entity URL (not `eval_helpers.py --adapter`, which is LoRA-specific). ``` -Use the user's platform URL in `NEMO_BASE_URL` when they overrode it; omit the export line for default `http://127.0.0.1:8080`. The JSON `adapters` array shows `name`, `fileset`, `finetuning_type`, and `lora_config` for each registered adapter. +Use the user's platform URL in `NMP_BASE_URL` when they overrode it; omit the export line for default `http://127.0.0.1:8080`. Substitute ``, concrete URLs, and entity names with values from discovery — do not leave generic placeholders in the user-facing report. For **LoRA**, do **not** tell the user to update the deployment before calling the adapter — registration on the base model entity is sufficient. For **full SFT / merged**, tell the user they must deploy `` before inference. **Save report to `/tmp`** — unless the user opts out, write the full Markdown report (header, **Training configuration**, **Using the adapter** when `completed`, and **Resources created** when a slug or new filesets were used) to `/tmp/fine-tune-result-.md`. Use the random slug from the run when one was assigned; otherwise use the job id suffix (e.g. `a925b07ff678`). @@ -602,7 +698,7 @@ Use the user's platform URL in `NEMO_BASE_URL` when they overrode it; omit the e | Error type | Append | |------------|--------| -| Missing training image + user-overridden `NEMO_BASE_URL` / `NMP_BASE_URL` | `references/troubleshooting.md` § **Missing training images** — on-target build steps, env vars, re-submit commands. **Do not** `docker build` locally for a remote platform. | +| Missing training image + user-overridden `NMP_BASE_URL` | `references/troubleshooting.md` § **Missing training images** — on-target build steps, env vars, re-submit commands. **Do not** `docker build` locally for a remote platform. | | Download fails / `Failed to access upstream storage` / 502 on gated HF model | `references/troubleshooting.md` § **Gated HuggingFace models** — create/update `hf-token`, add `token_secret` to fileset, confirm HF license, re-submit. | | W&B not syncing / no `[launcher]` secret lines / `WandbCallback requires wandb` / wandb 401 | `references/troubleshooting.md` § **W&B / integrations not working** (jobs-launcher build, secret update, unsloth image). Setup: `references/integrations-setup.md`. | @@ -626,5 +722,6 @@ For other terminal errors, keep the same header template; put remediation detail | W&B / MLflow field reference | `references/hyperparameters.md` § **Integrations (automodel + unsloth)** | | W&B secret + MLflow local server + jobs-launcher | `references/integrations-setup.md` | | Gated HF model auth (`hf-token`, fileset `token_secret`) | `references/troubleshooting.md` § **Gated HuggingFace models** | +| Post-training eval (base vs LoRA, CHAT format parity) | `references/post-training-eval.md`, `references/eval_helpers.py` | Related: `plugins/nemo-automodel/README.md`, `plugins/nemo-unsloth/README.md`, `plugins/nemo-customizer/docs/CUSTOMIZATION.md`, skills **`nemo-files`**, **`nemo-status`**, **`nemo-secrets`**. diff --git a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/dataset-formats.md b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/dataset-formats.md index b03b43e12c..d1d026656f 100644 --- a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/dataset-formats.md +++ b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/dataset-formats.md @@ -54,3 +54,15 @@ Optional fields on the unsloth `dataset` block: - The automodel SFT format `{"prompt": "...", "completion": "..."}` is **not** directly consumable by unsloth — unsloth has no built-in `prompt`/`completion` concatenation. Convert to either messages or pre-rendered text before upload. EMBEDDING and CUSTOM (automodel-only schemas) are not supported by unsloth today. + +## Post-training evaluation + +Eval rows must use the **same CHAT `messages` shape** as training. Do not flatten to `prompt`/`expected` for the evaluator. + +| Training JSONL | Eval dataset | Eval `prompt_template` | Metric reference | +|----------------|--------------|------------------------|------------------| +| `messages` (single- or multi-turn) | Same fileset split (`validation.jsonl`) | `messages[:-1]` — exclude final assistant label — see `post-training-eval.md` | `{{ item.messages[-1].content }}` | + +LoRA inference and eval use the **provider** gateway on the **base** entity (`/provider//-/v1`, `model: default--`). Base model uses the model-entity path. Full SFT / merged checkpoints use the **output** model entity's model-entity URL — deploy first. See `post-training-eval.md` and the **Using the adapter** / **Using the fine-tuned model** sections in `SKILL.md`. + +Shared helpers and compare CLI: `references/eval_helpers.py`. Full workflow: `references/post-training-eval.md`. diff --git a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/eval_helpers.py b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/eval_helpers.py new file mode 100644 index 0000000000..1c866d4fcd --- /dev/null +++ b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/eval_helpers.py @@ -0,0 +1,741 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +"""Post-training evaluation helpers — keep eval dataset shape aligned with CHAT training JSONL. + +**LoRA** (``output.save_method: lora``): adapters registered on the base model entity +are hot-reloaded on deployments with ``lora_enabled: true`` — no deployment update or +new inference deployment before eval. + +**Full SFT** (``finetuning_type: all_weights``) or **merged LoRA checkpoints** +(``save_method: merged_16bit`` / ``merged_4bit``): the job registers a new **model** +entity at ``output.name``. Deploy that entity for inference before chat or eval — full +weights are not hot-reloaded onto the base model's deployment. + +Run from the nemo-platform git root (reads ``$NMP_BASE_URL`` when ``--base-url`` is omitted):: + + export NMP_BASE_URL=http://127.0.0.1:8080 + uv run python plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/eval_helpers.py \\ + --model-entity --adapter --adapter \\ + --provider --dataset-fileset --split validation.jsonl + +Import in agent scripts (add references/ to sys.path or run via uv from repo root). +""" + +from __future__ import annotations + +import argparse +import json +import os +import re +import urllib.error +import urllib.request +from dataclasses import dataclass +from pathlib import Path +from typing import Any, Sequence + +# --- Train/eval format contract (CHAT JSONL) -------------------------------- + +CHAT_ROW_KEYS = frozenset({"messages"}) + +# Inference: all turns except the final assistant label (single- or multi-turn). +CHAT_USER_PROMPT_TEMPLATE: dict[str, Any] = { + "messages": "{{ item.messages[:-1] }}", +} + +# Metric reference: final assistant turn (the label to predict). +CHAT_REFERENCE_TEMPLATE = "{{ item.messages[-1].content }}" + +# Back-compat alias for single-turn MCQA docs/snippets. +CHAT_SINGLE_TURN_USER_PROMPT_TEMPLATE = { + "messages": [{"role": "user", "content": "{{ item.messages[0].content }}"}], +} + +PLATFORM_HTTP_TIMEOUT_SEC = 60 + + +def _assert_message_turn(turn: Any, *, label: str, index: int | str) -> dict[str, Any]: + """Validate one messages[] element is a dict before reading role/content.""" + if not isinstance(turn, dict): + raise ValueError(f"{label}: messages[{index}] must be an object with role/content, got {type(turn).__name__}") + return turn + + +def assert_chat_row(row: dict[str, Any], *, index: int | None = None) -> None: + """Validate one dataset row matches automodel/unsloth CHAT training shape.""" + label = f"row {index}" if index is not None else "row" + if "messages" not in row: + raise ValueError( + f"{label}: expected CHAT format with 'messages' array; got keys {sorted(row)}. " + "Do not flatten to prompt/expected — use references/post-training-eval.md." + ) + messages = row["messages"] + if not isinstance(messages, list) or len(messages) < 2: + raise ValueError(f"{label}: messages must be a list with at least one prompt turn + final assistant label") + first = _assert_message_turn(messages[0], label=label, index=0) + if first.get("role") != "user": + raise ValueError(f"{label}: expected messages[0]=user") + last = _assert_message_turn(messages[-1], label=label, index=-1) + if last.get("role") != "assistant": + raise ValueError(f"{label}: expected final messages[-1]=assistant (the label to score)") + + +def reference_content(row: dict[str, Any]) -> str: + """Return the assistant label for a CHAT row (final turn).""" + assert_chat_row(row) + return row["messages"][-1]["content"] + + +def load_chat_jsonl(path: Path | str) -> list[dict[str, Any]]: + """Load JSONL rows; validate CHAT shape; return rows unchanged.""" + rows: list[dict[str, Any]] = [] + with Path(path).open(encoding="utf-8") as handle: + for index, line in enumerate(handle, start=1): + if not line.strip(): + continue + row = json.loads(line) + assert_chat_row(row, index=index) + rows.append(row) + return rows + + +def load_chat_jsonl_from_platform( + *, + base_url: str, + workspace: str, + fileset: str, + remote_path: str, +) -> list[dict[str, Any]]: + """Download a JSONL split from a platform fileset and validate CHAT rows.""" + url = f"{base_url.rstrip('/')}/apis/files/v2/workspaces/{workspace}/filesets/{fileset}/-/{remote_path.lstrip('/')}" + with urllib.request.urlopen(url, timeout=PLATFORM_HTTP_TIMEOUT_SEC) as response: + content = response.read().decode("utf-8") + rows: list[dict[str, Any]] = [] + for index, line in enumerate(content.splitlines(), start=1): + if not line.strip(): + continue + row = json.loads(line) + assert_chat_row(row, index=index) + rows.append(row) + return rows + + +def chat_metrics(): + """Build default metrics for CHAT SFT eval (exact match + ROUGE + BLEU).""" + from nemo_evaluator_sdk import BLEUMetric, ROUGEMetric + from nemo_evaluator_sdk.metrics.exact_match import ExactMatchMetric + + ref = CHAT_REFERENCE_TEMPLATE + return [ + ExactMatchMetric(reference=ref), + ROUGEMetric(reference=ref), + BLEUMetric(references=[ref]), + ] + + +def normalize_mcqa_answer(text: str) -> str: + """Normalize MCQA model output for comparison with bare choice-text references.""" + text = text.strip() + bold = re.search(r"\*\*(?:[A-E]\.\s*)?([^*]+)\*\*", text) + if bold: + text = bold.group(1) + text = re.sub(r"^[A-E]\.\s*", "", text) + text = re.sub(r"\*\*([^*]+)\*\*", r"\1", text) + return text.strip().lower() + + +def served_model_name(*, workspace: str, entity_or_adapter: str, finetuning: str = "base") -> str: + """Return the ``model`` field for base entity or LoRA adapter requests.""" + if finetuning == "base": + return f"{workspace}/{entity_or_adapter}" + if finetuning == "lora": + return f"{workspace}--{entity_or_adapter}" + raise ValueError("finetuning must be 'base' or 'lora'") + + +def adapter_composite_entity_name(*, model_entity: str, workspace: str, adapter_name: str) -> str: + """LoRA composite model-entity path segment (for reference / OpenAI-route body only). + + The model-entity proxy path ``model/{composite}/-/v1`` requires a dedicated + VirtualModel per composite and typically 404s on stock deployments. Prefer + :func:`provider_gateway_url` for adapter eval. + """ + return f"{model_entity}&adapters/{workspace}/{adapter_name}" + + +def model_entity_gateway_url(*, base_url: str, workspace: str, model_entity: str) -> str: + """OpenAI-compatible inference-gateway URL for a registered base model entity.""" + return f"{base_url.rstrip('/')}/apis/inference-gateway/v2/workspaces/{workspace}/model/{model_entity}/-/v1" + + +def provider_gateway_url(*, base_url: str, workspace: str, provider_name: str) -> str: + """OpenAI-compatible inference-gateway URL for a model provider (LoRA eval route).""" + return f"{base_url.rstrip('/')}/apis/inference-gateway/v2/workspaces/{workspace}/provider/{provider_name}/-/v1" + + +def gateway_path_from_url(url: str) -> str: + """Return ``model-entity`` or ``provider`` from a gateway base URL.""" + if "/provider/" in url: + return "provider" + if "/model/" in url: + return "model-entity" + return "unknown" + + +def _platform_get_json(url: str) -> dict[str, Any]: + with urllib.request.urlopen(url, timeout=PLATFORM_HTTP_TIMEOUT_SEC) as response: + return json.loads(response.read().decode("utf-8")) + + +def find_ready_provider_for_model_entity( + *, + base_url: str, + workspace: str, + model_entity: str, +) -> str | None: + """Return a READY provider name that serves ``workspace/model_entity`` (base or LoRA).""" + url = f"{base_url.rstrip('/')}/apis/models/v2/workspaces/{workspace}/providers?page_size=100&filter.status=READY" + payload = _platform_get_json(url) + base_entity_id = f"{workspace}/{model_entity}" + matches: list[str] = [] + for provider in payload.get("data", []): + if provider.get("status") != "READY": + continue + for served in provider.get("served_models") or []: + entity_id = served.get("model_entity_id") or "" + if entity_id == base_entity_id or entity_id.startswith(f"{base_entity_id}&adapters/"): + matches.append(provider["name"]) + break + if not matches: + return None + # Prefer deployment-backed providers (stable) over arbitrary first hit. + return sorted(set(matches))[0] + + +@dataclass +class JobAdapterInfo: + job_name: str + adapter_name: str + epochs: int | None + backend: str + model_entity: str + dataset_ref: str + status: str + created_at: str | None = None + + def to_dict(self) -> dict[str, Any]: + return { + "job_name": self.job_name, + "adapter_name": self.adapter_name, + "epochs": self.epochs, + "backend": self.backend, + "model_entity": self.model_entity, + "dataset_ref": self.dataset_ref, + "status": self.status, + "created_at": self.created_at, + } + + +def adapter_from_completed_job( + *, + base_url: str, + workspace: str, + job_name: str, +) -> JobAdapterInfo: + """Resolve adapter output name and training epochs from a platform job.""" + url = f"{base_url.rstrip('/')}/apis/jobs/v2/workspaces/{workspace}/jobs/{job_name}" + try: + job = _platform_get_json(url) + except urllib.error.HTTPError as exc: + raise ValueError(f"Job not found: {workspace}/{job_name}") from exc + spec = job.get("spec") or {} + output_name = (spec.get("output") or {}).get("name") or spec.get("name") + if not output_name: + raise ValueError(f"Job {job_name} has no output adapter name in spec") + model = spec.get("model") + model_entity = model.get("name", "") if isinstance(model, dict) else (model or "") + if model_entity.startswith(f"{workspace}/"): + model_entity = model_entity.split("/", 1)[1] + dataset = spec.get("dataset") or {} + dataset_ref = dataset.get("path") or dataset.get("training") or "" + backend = job_name.split("-", 1)[0] if "-" in job_name else "unknown" + return JobAdapterInfo( + job_name=job_name, + adapter_name=output_name, + epochs=(spec.get("schedule") or {}).get("epochs"), + backend=backend, + model_entity=model_entity, + dataset_ref=dataset_ref, + status=job.get("status", "unknown"), + created_at=job.get("created_at"), + ) + + +def list_completed_job_adapters( + *, + base_url: str, + workspace: str, + model_entity: str, + dataset_fileset: str | None = None, + page_size: int = 500, +) -> list[JobAdapterInfo]: + """List completed customization jobs and their output adapter names.""" + url = ( + f"{base_url.rstrip('/')}/apis/jobs/v2/workspaces/{workspace}/jobs?page_size={page_size}&filter.status=completed" + ) + payload = _platform_get_json(url) + dataset_ref = f"{workspace}/{dataset_fileset}" if dataset_fileset else None + model_ref = f"{workspace}/{model_entity}" + results: list[JobAdapterInfo] = [] + for job in payload.get("data", []): + if job.get("status") != "completed": + continue + spec = job.get("spec") or {} + out = (spec.get("output") or {}).get("name") or spec.get("name") + if not out: + continue + model = spec.get("model") + job_model = model.get("name", "") if isinstance(model, dict) else (model or "") + ds = spec.get("dataset") or {} + job_ds = ds.get("path") or ds.get("training") or "" + if model_ref not in str(job_model): + continue + if dataset_ref and dataset_ref not in str(job_ds): + continue + backend = job["name"].split("-", 1)[0] if "-" in job["name"] else "unknown" + results.append( + JobAdapterInfo( + job_name=job["name"], + adapter_name=out, + epochs=(spec.get("schedule") or {}).get("epochs"), + backend=backend, + model_entity=model_entity, + dataset_ref=job_ds, + status=job.get("status", "completed"), + created_at=job.get("created_at"), + ) + ) + results.sort(key=lambda item: item.created_at or "", reverse=True) + return results + + +def build_online_eval_config( + *, + max_tokens: int = 64, + temperature: float = 0, + parallelism: int = 8, + enable_thinking: bool = False, + limit_samples: int | None = None, +): + """RunConfigOnlineModel defaults aligned with Qwen3 CHAT SFT eval.""" + from nemo_evaluator_sdk.values import InferenceParams, RunConfigOnlineModel + + extra_body = {"chat_template_kwargs": {"enable_thinking": enable_thinking}} if not enable_thinking else None + inference_kwargs: dict[str, Any] = {"max_tokens": max_tokens, "temperature": temperature} + if extra_body: + inference_kwargs["extra_body"] = extra_body + return RunConfigOnlineModel( + parallelism=parallelism, + limit_samples=limit_samples, + inference=InferenceParams(**inference_kwargs), + ) + + +def build_platform_model_target( + *, + base_url: str, + workspace: str, + model_entity: str, + adapter_name: str | None = None, + provider_name: str | None = None, +): + """SDK Model target for base entity or LoRA adapter on the platform gateway. + + Base weights use the **model-entity** proxy + (``/model/{entity}/-/v1``). LoRA adapters must use the **provider** proxy + (``/provider/{name}/-/v1``) with ``model: {workspace}--{adapter}`` — the + model-entity path always routes to the base VirtualModel and ignores adapter + names in the request body. + """ + from nemo_evaluator_sdk.enums import ModelFormat + from nemo_evaluator_sdk.values.models import Model + + resolved_provider = provider_name or find_ready_provider_for_model_entity( + base_url=base_url, + workspace=workspace, + model_entity=model_entity, + ) + if not resolved_provider: + raise ValueError( + f"No READY inference provider serves {workspace}/{model_entity}. " + "Deploy the base model (with lora_enabled: true for LoRA eval) or pass --provider ." + ) + + if adapter_name: + return Model( + url=provider_gateway_url( + base_url=base_url, + workspace=workspace, + provider_name=resolved_provider, + ), + name=served_model_name(workspace=workspace, entity_or_adapter=adapter_name, finetuning="lora"), + format=ModelFormat.NVIDIA_NIM, + ) + + return Model( + url=model_entity_gateway_url(base_url=base_url, workspace=workspace, model_entity=model_entity), + name=served_model_name(workspace=workspace, entity_or_adapter=model_entity, finetuning="base"), + format=ModelFormat.NVIDIA_NIM, + ) + + +@dataclass +class EvalSummary: + target: str + model_name: str + gateway_url: str + gateway_path: str + num_samples: int + raw_exact_match: float + normalized_accuracy: float + aggregate_metrics: dict[str, dict[str, float | None]] + + def to_dict(self) -> dict[str, Any]: + return { + "target": self.target, + "model_name": self.model_name, + "gateway_url": self.gateway_url, + "gateway_path": self.gateway_path, + "num_samples": self.num_samples, + "raw_exact_match": self.raw_exact_match, + "normalized_accuracy": self.normalized_accuracy, + "metrics": self.aggregate_metrics, + } + + +def summarize_chat_eval_result(*, target: str, model_name: str, gateway_url: str, result) -> EvalSummary: + """Summarize Evaluator benchmark result for CHAT rows.""" + em_rows = result.per_metric["exact-match"].row_scores + num_samples = len(em_rows) + raw_correct = sum( + 1 for rs in em_rows if rs.sample.get("output_text", "").strip() == reference_content(rs.item).strip() + ) + norm_correct = sum( + 1 + for rs in em_rows + if normalize_mcqa_answer(rs.sample.get("output_text", "")) == normalize_mcqa_answer(reference_content(rs.item)) + ) + aggregate_metrics: dict[str, dict[str, float | None]] = {} + for metric_name, metric_result in result.per_metric.items(): + aggregate_metrics[metric_name] = { + score.name.split(".")[-1]: round(score.mean, 4) if score.mean is not None else None + for score in metric_result.aggregate_scores.scores + } + return EvalSummary( + target=target, + model_name=model_name, + gateway_url=gateway_url, + gateway_path=gateway_path_from_url(gateway_url), + num_samples=num_samples, + raw_exact_match=round(raw_correct / num_samples, 4) if num_samples else 0.0, + normalized_accuracy=round(norm_correct / num_samples, 4) if num_samples else 0.0, + aggregate_metrics=aggregate_metrics, + ) + + +def run_chat_online_eval( + *, + rows: Sequence[dict[str, Any]], + target, + config, + metrics=None, + prompt_template: dict[str, Any] | None = None, +): + """Run online eval on CHAT rows using shared templates.""" + from nemo_evaluator_sdk import Evaluator + + for index, row in enumerate(rows): + assert_chat_row(row, index=index) + if metrics is None: + metrics = chat_metrics() + return Evaluator().run_sync( + metrics=metrics, + dataset=list(rows), + target=target, + prompt_template=prompt_template or CHAT_USER_PROMPT_TEMPLATE, + config=config, + ) + + +def _eval_target( + *, + base_url: str, + workspace: str, + model_entity: str, + adapter_name: str | None, + provider_name: str | None, + rows: Sequence[dict[str, Any]], + config, + target_label: str, +) -> EvalSummary: + target = build_platform_model_target( + base_url=base_url, + workspace=workspace, + model_entity=model_entity, + adapter_name=adapter_name, + provider_name=provider_name, + ) + result = run_chat_online_eval(rows=rows, target=target, config=config) + return summarize_chat_eval_result( + target=target_label, + model_name=target.name, + gateway_url=target.url, + result=result, + ) + + +def compare_adapters( + *, + base_url: str, + workspace: str, + model_entity: str, + adapter_names: Sequence[str], + rows: Sequence[dict[str, Any]], + include_base: bool = True, + provider_name: str | None = None, + max_tokens: int = 64, + enable_thinking: bool = False, + parallelism: int = 8, + limit_samples: int | None = None, +) -> list[EvalSummary]: + """Compare base (optional) and one or more LoRA adapters on the same CHAT rows.""" + config = build_online_eval_config( + max_tokens=max_tokens, + enable_thinking=enable_thinking, + parallelism=parallelism, + limit_samples=limit_samples, + ) + summaries: list[EvalSummary] = [] + if include_base: + summaries.append( + _eval_target( + base_url=base_url, + workspace=workspace, + model_entity=model_entity, + adapter_name=None, + provider_name=provider_name, + rows=rows, + config=config, + target_label="base", + ) + ) + for adapter_name in adapter_names: + summaries.append( + _eval_target( + base_url=base_url, + workspace=workspace, + model_entity=model_entity, + adapter_name=adapter_name, + provider_name=provider_name, + rows=rows, + config=config, + target_label=adapter_name, + ) + ) + return summaries + + +def compare_base_vs_adapter( + *, + base_url: str, + workspace: str, + model_entity: str, + adapter_name: str, + rows: Sequence[dict[str, Any]], + provider_name: str | None = None, + max_tokens: int = 64, + enable_thinking: bool = False, + parallelism: int = 8, + limit_samples: int | None = None, +) -> list[EvalSummary]: + """Compare base model vs one LoRA adapter on the same CHAT validation rows.""" + summaries = compare_adapters( + base_url=base_url, + workspace=workspace, + model_entity=model_entity, + adapter_names=[adapter_name], + rows=rows, + include_base=True, + provider_name=provider_name, + max_tokens=max_tokens, + enable_thinking=enable_thinking, + parallelism=parallelism, + limit_samples=limit_samples, + ) + if len(summaries) == 2: + summaries[1].target = "lora" + return summaries + + +def lift_vs_base(summaries: Sequence[EvalSummary]) -> dict[str, float]: + """Normalized accuracy delta vs the base summary (if present).""" + base = next((summary for summary in summaries if summary.target == "base"), None) + if base is None: + return {} + return { + summary.target: round(summary.normalized_accuracy - base.normalized_accuracy, 4) + for summary in summaries + if summary.target != "base" + } + + +def routing_sanity_warnings( + summaries: Sequence[EvalSummary], + *, + routing_tolerance_pp: float = 0.015, +) -> list[str]: + """Return human-readable warnings when LoRA routing or scores look suspicious.""" + warnings: list[str] = [] + base = next((summary for summary in summaries if summary.target == "base"), None) + for summary in summaries: + if summary.target == "base": + if summary.gateway_path != "model-entity": + warnings.append( + f"base eval used {summary.gateway_path} route; expected model-entity ({summary.gateway_url})" + ) + continue + if summary.gateway_path != "provider": + warnings.append( + f"{summary.target}: LoRA eval used {summary.gateway_path} route " + f"({summary.gateway_url}); expected provider gateway — scores may match base" + ) + if base and abs(summary.normalized_accuracy - base.normalized_accuracy) <= routing_tolerance_pp: + warnings.append( + f"{summary.target}: normalized accuracy {summary.normalized_accuracy:.1%} is within " + f"{routing_tolerance_pp:.1%} of base ({base.normalized_accuracy:.1%}) — verify provider routing" + ) + return warnings + + +def build_eval_payload( + *, + summaries: Sequence[EvalSummary], + base_url: str, + workspace: str, + model_entity: str, + adapter_names: Sequence[str], + provider_name: str | None, +) -> dict[str, Any]: + """Assemble CLI/programmatic JSON output with routing metadata and warnings.""" + routing: dict[str, Any] = {} + if any(summary.target == "base" for summary in summaries): + routing["base"] = { + "gateway_path": "model-entity", + "url": model_entity_gateway_url(base_url=base_url, workspace=workspace, model_entity=model_entity), + "model_field": served_model_name(workspace=workspace, entity_or_adapter=model_entity, finetuning="base"), + } + for adapter_name in adapter_names: + target = build_platform_model_target( + base_url=base_url, + workspace=workspace, + model_entity=model_entity, + adapter_name=adapter_name, + provider_name=provider_name, + ) + routing[adapter_name] = { + "gateway_path": "provider", + "url": target.url, + "model_field": target.name, + } + warnings = routing_sanity_warnings(summaries) + payload: dict[str, Any] = { + "dataset_format": "chat (messages)", + "prompt_template": CHAT_USER_PROMPT_TEMPLATE, + "reference_template": CHAT_REFERENCE_TEMPLATE, + "routing": routing, + "results": [summary.to_dict() for summary in summaries], + "lift_vs_base": lift_vs_base(summaries), + "primary_metric": "normalized_accuracy", + } + if warnings: + payload["warnings"] = warnings + return payload + + +def default_base_url() -> str: + """Platform URL from env or localhost default.""" + return os.environ.get("NMP_BASE_URL") or "http://127.0.0.1:8080" + + +def _parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Compare base vs LoRA on CHAT validation JSONL") + parser.add_argument( + "--base-url", + default=default_base_url(), + help="Platform URL (default: $NMP_BASE_URL or http://127.0.0.1:8080)", + ) + parser.add_argument("--workspace", default="default") + parser.add_argument("--model-entity", required=True) + parser.add_argument( + "--adapter", + action="append", + required=True, + help="Adapter name(s) registered on the model entity (repeat for multi-adapter compare)", + ) + parser.add_argument( + "--provider", + default=None, + help="Inference provider name for LoRA requests (auto-discovered when omitted)", + ) + parser.add_argument("--dataset-fileset", required=True) + parser.add_argument("--split", default="validation.jsonl") + parser.add_argument("--max-tokens", type=int, default=64) + parser.add_argument("--enable-thinking", action="store_true") + parser.add_argument("--limit-samples", type=int, default=None) + parser.add_argument("--output", type=Path, default=None) + parser.add_argument( + "--no-base", + action="store_true", + help="Skip base-model eval (adapter-only comparison)", + ) + return parser.parse_args() + + +def main() -> int: + args = _parse_args() + rows = load_chat_jsonl_from_platform( + base_url=args.base_url, + workspace=args.workspace, + fileset=args.dataset_fileset, + remote_path=args.split, + ) + summaries = compare_adapters( + base_url=args.base_url, + workspace=args.workspace, + model_entity=args.model_entity, + adapter_names=args.adapter, + rows=rows, + include_base=not args.no_base, + provider_name=args.provider, + max_tokens=args.max_tokens, + enable_thinking=args.enable_thinking, + limit_samples=args.limit_samples, + ) + payload = build_eval_payload( + summaries=summaries, + base_url=args.base_url, + workspace=args.workspace, + model_entity=args.model_entity, + adapter_names=args.adapter, + provider_name=args.provider, + ) + text = json.dumps(payload, indent=2) + print(text) + if args.output: + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text(text, encoding="utf-8") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.md b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.md index 2b6e5e0495..277b0be196 100644 --- a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.md +++ b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.md @@ -566,7 +566,7 @@ See **Integrations (automodel + unsloth)** above. |-------|---------|-------| | `name` | auto-derived from `--` | The output model entity / fileset name. | | `description` | `null` | Free-form description carried onto the entity and fileset. | -| `save_method` | `"lora"` | `"lora"` (adapter — small, deploy via NIM/vLLM with adapter loader), `"merged_16bit"` (merged checkpoint, deploy without adapter), `"merged_4bit"` (lossy, storage-tight). `merged_*` requires `training.finetuning_type: "lora"`. | +| `save_method` | `"lora"` | `"lora"` (adapter — hot-reloads on base LoRA deployment; no new inference deploy), `"merged_16bit"` (merged checkpoint — **deploy** `output.name` as model entity), `"merged_4bit"` (lossy, storage-tight; deploy like merged). `merged_*` requires `training.finetuning_type: "lora"`. | After `to_spec`, the canonical `OutputResponse` also carries `type` (`"adapter"` for `save_method: "lora"`, `"model"` otherwise) and `fileset` (defaults to `name`); both are derived — submitter doesn't set them. @@ -598,12 +598,12 @@ Drop `rank` before lowering batch when OOM. Higher `alpha/rank` ratios amplify a ### Save-method picker -| User wants | `save_method` | -|------------|---------------| -| Smallest artefact, deploy via adapter loader (default NIM / vLLM) | `lora` | -| Full-weight checkpoint to deploy without an adapter | `merged_16bit` | -| Disk-tight merged checkpoint (lossy) | `merged_4bit` | -| Full SFT (no LoRA) | `lora` is invalid here; output is always a full model — leave `save_method` at default and ignore the merged options | +| User wants | `save_method` | Inference after training | +|------------|---------------|--------------------------| +| Smallest artefact; hot-reload on base LoRA deployment | `lora` | No new deploy — adapter loads on existing `lora_enabled` deployment | +| Full-weight checkpoint as standalone model | `merged_16bit` | **Deploy** `output.name` as new model entity | +| Disk-tight merged checkpoint (lossy) | `merged_4bit` | **Deploy** `output.name` as new model entity | +| Full SFT (no LoRA) | `lora` is invalid; output is always a full model | **Deploy** `output.name` as new model entity | `merged_*` require `training.finetuning_type: "lora"`. The schema validator surfaces a clear error if violated. diff --git a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/integrations-setup.md b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/integrations-setup.md index 9927d26c17..bbad950c37 100644 --- a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/integrations-setup.md +++ b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/integrations-setup.md @@ -96,7 +96,7 @@ Job JSON references the secret by name: Store the API key in the **platform** secret store. A local `wandb login` cache on your laptop is **not** used by training containers. ```bash -export NEMO_BASE_URL=http://:8080 # omit when using default localhost +export NMP_BASE_URL=http://:8080 # omit when using default localhost cd /path/to/nemo-platform # Create (first time) diff --git a/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/post-training-eval.md b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/post-training-eval.md new file mode 100644 index 0000000000..4fb787ad82 --- /dev/null +++ b/plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/post-training-eval.md @@ -0,0 +1,228 @@ +# Post-training evaluation (train/eval format parity) + +Use after a customization job reaches **`completed`** when the user wants to compare **base vs LoRA** on the validation split. + +## Format contract + +Training and evaluation must use the **same CHAT JSONL row shape**: + +```json +{ + "messages": [ + {"role": "user", "content": ""}, + {"role": "assistant", "content": "