Skip to content

Commit 3051402

Browse files
committed
Add checker report visualization and integrations
1 parent 38c2996 commit 3051402

3 files changed

Lines changed: 1153 additions & 4 deletions

File tree

docs/check.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,68 @@ traincheck-check -f <trace_folder> -i <path_to_invariant_file>
3737
- `-f <trace_folder>`: Path to the folder containing traces collected by `traincheck-collect`.
3838
- `-i <path_to_invariant_file>`: Path to the JSON file containing inferred invariants.
3939

40+
## Report Visualization Options
41+
42+
Both checkers can produce a standalone HTML report and optionally log summary metrics to external monitoring tools.
43+
44+
### Standalone HTML Report (default)
45+
46+
- Output: `<output_dir>/report.html`
47+
- Includes summary counts, relation breakdown, and top violations.
48+
- Disable with `--no-html-report`.
49+
50+
**Offline example**
51+
```bash
52+
traincheck-check -f <trace_folder> -i <path_to_invariant_file>
53+
```
54+
55+
**Online example**
56+
```bash
57+
traincheck-onlinecheck -f <trace_folder> -i <path_to_invariant_file>
58+
```
59+
60+
### W&B Integration
61+
62+
Enable with `--report-wandb`. You can also pass:
63+
`--wandb-project`, `--wandb-entity`, `--wandb-run-name`, `--wandb-group`, `--wandb-tags`.
64+
65+
```bash
66+
traincheck-check -f <trace_folder> -i <path_to_invariant_file> \
67+
--report-wandb --wandb-project <project>
68+
```
69+
70+
```bash
71+
traincheck-onlinecheck -f <trace_folder> -i <path_to_invariant_file> \
72+
--report-wandb --wandb-project <project>
73+
```
74+
75+
### MLflow Integration
76+
77+
Enable with `--report-mlflow`. Optional:
78+
`--mlflow-experiment`, `--mlflow-run-name`.
79+
80+
```bash
81+
traincheck-check -f <trace_folder> -i <path_to_invariant_file> \
82+
--report-mlflow --mlflow-experiment <experiment>
83+
```
84+
85+
```bash
86+
traincheck-onlinecheck -f <trace_folder> -i <path_to_invariant_file> \
87+
--report-mlflow --mlflow-experiment <experiment>
88+
```
89+
90+
### Online Report Refresh
91+
92+
The online checker refreshes the report when violations change, and also on a periodic timer.
93+
Control the interval with `--report-interval-seconds` (default: 10).
94+
95+
```bash
96+
traincheck-onlinecheck -f <trace_folder> -i <path_to_invariant_file> \
97+
--report-interval-seconds 30
98+
```
99+
100+
**Note:** W&B and MLflow logging are optional. If the packages are not installed, TrainCheck will skip logging and emit a warning.
101+
40102
## Interpreting the Results
41103

42104
After running either checking mode, TrainCheck will output a summary of detected invariant violations. Each violation entry typically includes:
@@ -45,4 +107,4 @@ After running either checking mode, TrainCheck will output a summary of detected
45107
- **Invariant description**: Details the specific invariant that was violated.
46108
- **Violation details**: Provides context, such as the step or epoch where the violation occurred.
47109

48-
Review these results to pinpoint silent errors or unexpected behaviors in your ML training pipeline. For more information on result formats and how to diagnose issues, see [5. Detection & Diagnosis](./5-min-tutorial.md#5-detection--diagnosis) in the **5-Minute Tutorial**.
110+
Review these results to pinpoint silent errors or unexpected behaviors in your ML training pipeline. For more information on result formats and how to diagnose issues, see [5. Detection & Diagnosis](./5-min-tutorial.md#5-detection--diagnosis) in the **5-Minute Tutorial**.

0 commit comments

Comments
 (0)