Skip to content

Make xe-forge-skill benchmark support optional --spec with baseline-derived config fallback #46

@mzweilin

Description

@mzweilin

Summary

The benchmark skill currently requires --spec, but the KernelBench baseline files already contain enough context to derive most runtime config. This issue proposes making --spec optional for xe-forge-skill benchmark, preserving current behavior when spec is provided, and adding a baseline-driven fallback path when it is not.

Current Behavior

In __init__.py, the benchmark CLI marks --spec as required.
In benchmark.py, benchmark execution always loads spec-derived values:

  • input shapes
  • flop
  • dtype
  • input dtypes
  • init args

Problem

This makes benchmark usage more rigid than needed. For many workflows, the baseline kernel/model file already provides enough metadata to run correctness/perf comparison without requiring a YAML spec.

Users also demand more control on input generation, as in #38
The widely adopted KernelBench format should also serve the users' needs well.

Proposed Behavior

  1. Make --spec optional for benchmark CLI.
  2. Keep existing spec-driven path unchanged when --spec is provided.
  3. When --spec is omitted, resolve benchmark config from baseline-derived metadata.
  4. In spec-less mode, use baseline-derived dtype (with optional explicit override --dtype).

Scope

In scope

  • CLI argument requirement/help updates for benchmark.
  • Refactor benchmark config resolution into:
    • spec-backed path
    • baseline-backed fallback path
  • Add tests for both paths.

Out of scope

  • Broad executor redesign.
  • Changes to unrelated optimize/pipeline flows.

Implementation Notes

  • Update benchmark parser in init.py:
    • remove required=True from --spec
    • update help text to document fallback behavior
  • Refactor benchmark.py:
    • isolate config resolution from execution
    • return resolved input_shapes, flop, dtype, input_dtypes, init_args regardless of source
  • Prefer reusing existing analysis/utilities before adding parsing logic (candidate: kernel_analyzer.py)
  • Keep executor.py interface unchanged; it should receive resolved values as today.

Acceptance Criteria

  1. xe-forge-skill benchmark <baseline> <optimized> runs without --spec for KernelBench baseline files.
  2. xe-forge-skill benchmark ... --spec ... behavior remains unchanged.
  3. Spec-less execution resolves dtype from baseline path or --dtype.
  4. New/updated tests verify:
    • no regression in spec mode
    • no crash/error for omitted spec in fallback mode
  5. CLI help reflects that --spec is optional for benchmark.

Test Plan

  • Add benchmark-focused unit tests under tests (or nearest existing skill test location) for:
    • spec provided path
    • spec omitted fallback path
  • Run targeted test subset for benchmark + any touched resolution helpers.

Risks / Open Questions

  • Some kernels may not expose enough baseline metadata to infer all fields.
  • If dtype cannot be inferred reliably in edge cases, add a minimal opt-in override flag --dtype rather than reintroducing mandatory spec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions