Summary
The benchmark skill currently requires --spec, but the KernelBench baseline files already contain enough context to derive most runtime config. This issue proposes making --spec optional for xe-forge-skill benchmark, preserving current behavior when spec is provided, and adding a baseline-driven fallback path when it is not.
Current Behavior
In __init__.py, the benchmark CLI marks --spec as required.
In benchmark.py, benchmark execution always loads spec-derived values:
- input shapes
- flop
- dtype
- input dtypes
- init args
Problem
This makes benchmark usage more rigid than needed. For many workflows, the baseline kernel/model file already provides enough metadata to run correctness/perf comparison without requiring a YAML spec.
Users also demand more control on input generation, as in #38
The widely adopted KernelBench format should also serve the users' needs well.
Proposed Behavior
- Make
--spec optional for benchmark CLI.
- Keep existing spec-driven path unchanged when
--spec is provided.
- When
--spec is omitted, resolve benchmark config from baseline-derived metadata.
- In spec-less mode, use baseline-derived dtype (with optional explicit override --dtype).
Scope
In scope
- CLI argument requirement/help updates for benchmark.
- Refactor benchmark config resolution into:
- spec-backed path
- baseline-backed fallback path
- Add tests for both paths.
Out of scope
- Broad executor redesign.
- Changes to unrelated optimize/pipeline flows.
Implementation Notes
- Update benchmark parser in init.py:
- remove required=True from
--spec
- update help text to document fallback behavior
- Refactor benchmark.py:
- isolate config resolution from execution
- return resolved
input_shapes, flop, dtype, input_dtypes, init_args regardless of source
- Prefer reusing existing analysis/utilities before adding parsing logic (candidate: kernel_analyzer.py)
- Keep executor.py interface unchanged; it should receive resolved values as today.
Acceptance Criteria
xe-forge-skill benchmark <baseline> <optimized> runs without --spec for KernelBench baseline files.
xe-forge-skill benchmark ... --spec ... behavior remains unchanged.
- Spec-less execution resolves dtype from baseline path or --dtype.
- New/updated tests verify:
- no regression in spec mode
- no crash/error for omitted spec in fallback mode
- CLI help reflects that
--spec is optional for benchmark.
Test Plan
- Add benchmark-focused unit tests under tests (or nearest existing skill test location) for:
- spec provided path
- spec omitted fallback path
- Run targeted test subset for benchmark + any touched resolution helpers.
Risks / Open Questions
- Some kernels may not expose enough baseline metadata to infer all fields.
- If dtype cannot be inferred reliably in edge cases, add a minimal opt-in override flag --dtype rather than reintroducing mandatory spec.
Summary
The benchmark skill currently requires
--spec, but the KernelBench baseline files already contain enough context to derive most runtime config. This issue proposes making--specoptional forxe-forge-skill benchmark, preserving current behavior when spec is provided, and adding a baseline-driven fallback path when it is not.Current Behavior
In
__init__.py, the benchmark CLI marks--specas required.In benchmark.py, benchmark execution always loads spec-derived values:
Problem
This makes benchmark usage more rigid than needed. For many workflows, the baseline kernel/model file already provides enough metadata to run correctness/perf comparison without requiring a YAML spec.
Users also demand more control on input generation, as in #38
The widely adopted KernelBench format should also serve the users' needs well.
Proposed Behavior
--specoptional for benchmark CLI.--specis provided.--specis omitted, resolve benchmark config from baseline-derived metadata.Scope
In scope
Out of scope
Implementation Notes
--specinput_shapes,flop,dtype,input_dtypes,init_argsregardless of sourceAcceptance Criteria
xe-forge-skill benchmark <baseline> <optimized>runs without--specfor KernelBench baseline files.xe-forge-skill benchmark ... --spec ...behavior remains unchanged.--specis optional for benchmark.Test Plan
Risks / Open Questions