Measure per-step cost of pi07_paligemma outlier check on 8×A100 (profile_step.py)

## Background

PR #357 added `warn_outlier_threshold` (default `10.0`) to `PI07PaligemmaLowLevelConfig`, which makes the training `forward` run `_warn_state_action_outliers` every step. On the common no-outlier path that still does two small `max` reductions plus one `bool(torch.cat(...).any())`, which forces a 1-byte device→host sync per step.

During review we estimated the per-step cost on 8×A100 **analytically** at typically <0.1% of step wall-clock (sub-ms/step), with a ~0.25% pathological ceiling — reasoning that the sync sits at `forward`-start before the heavy backbone, and that `update_policy` already forces a per-step D2H via `gather_for_metrics(...).item()` (`src/opentau/scripts/train.py:123-125`) on top of the DDP gradient all-reduce. **This was never measured on real hardware.**

## Task

Add a `profile_step.py`-based micro-benchmark that measures the actual per-step wall-clock delta of the outlier check, default-on vs disabled, for `pi07_paligemma_low_level`:

- Run `src/opentau/scripts/profile_step.py` on a GPU box with `--policy.warn_outlier_threshold=10.0` (default-on) and `--policy.warn_outlier_threshold=0` (disabled, early-returns before any sync).
- Compare the per-step forward/total wall-clock breakdown between the two.
- Confirm the overhead is in the noise (or quantify it if not), and confirm the `threshold <= 0` path is truly zero-overhead (no D2H).

## Acceptance

- A measured before/after number posted here (and ideally a short note in the config docstring or PR thread).
- If the measured cost is material (say >0.5% of step time), follow up with mitigation (e.g. step-sampling the check, or flipping the default to opt-in).

## References

- Implementation: `_warn_state_action_outliers` in `src/opentau/policies/pi07_paligemma/low_level/modeling_pi07_low_level.py`
- Config field: `warn_outlier_threshold` in `src/opentau/policies/pi07_paligemma/low_level/configuration_pi07_low_level.py`
- Profiler: `src/opentau/scripts/profile_step.py`
- Origin: PR #357 review, item 1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure per-step cost of pi07_paligemma outlier check on 8×A100 (profile_step.py) #360

Background

Task

Acceptance

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Measure per-step cost of pi07_paligemma outlier check on 8×A100 (profile_step.py) #360

Description

Background

Task

Acceptance

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions