Add XeGPU matrix multiplication benchmark

This issue outlines the required steps to add an XeGPU matrix multiplication benchmark.

1. `mlir-gen` should be a Python module.
    * Move to `python` dir tree as a module, e.g., `python/lighthouse/payload_generator/payload_generator.py`
    * Add command line interface, e.g., `python/lighthouse/payload_generator/cli.py`
    * [Map to an executable script](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#creating-executable-scripts), e.g., `mlir-gen` in `pyproject.toml`:
        ```toml
        [project.scripts]
        mlir-gen = "lighthouse.python.ingress.payload_generator:cli"
        ```
2. Generic infrastructure to define and execute workloads
    * A `Workload` object to hold the payload IR and fixed metadata (e.g. problem size).
    * Provides methods for getting payload IR, schedule IR, input arguments for calling payload, (optionally) correctness verification, etc.
    * A generic `execution_engine` wrapper that can execute a `Workload` and can, e.g., run it in a timer loop for benchmarking.
3. XeGPU matmul Workload and benchmarking tools
    * Currently supports only matmul + elementwise post ops.
    * Location: `python/lighthouse/benchmark/xegpu/matmul.py` (?)
    * Generates payload, defines lowering schedule, compiles, executes, measures performance.
    * Uses existing functionality: workload, payload generator, execution toos, etc.
    * Exposed as another CLI command, e.g., `benchmark_xegpu_matmul`
4. Installation mechanism for XeGPU support:
    * Compile LLVM with LevelZero runtime and necessary flags.
    * Hook up LLVM Python bindings with Lighthouse.
        * Simply set `PYTHONPATH="$LLVM_INSTALL_PATT/python_packages/mlir_core/"`
    * Provide an easy-to-use install mechanism.
        * Build instructions in README(?).
        * Or a generic build script, invoked on install(?).
            * `mlir-python-bindings` Python package is _not_ installed in this case(?).
        * When executing kernels, raise a descriptive error if Xe GPU device/drivers are missing.

Examples of the benchmark command line tool usage:

```bash
$ benchmark_xegpu_matmul
sizes=4096,4096,4096 dt=f16,f32 wg-tile=256,256 sg-tile=32,32 ... time(ms): 1.76 GFLOPS: 78072
```

```bash
$ benchmark_xegpu_matmul --sizes 4096 2048 1024 --wg-tile-size 256 256 ...
...
```

```bash
$ benchmark_xegpu_matmul --dump-kernel {intial,tiled,vectorized,bufferized,xegpu-wg,...}
<prints payload IR at this stage of lowering>
```

```bash
$ benchmark_xegpu_matmul --dump-kernel xegpu-wg --dump-schedule
<also dumps transform schdule used to lower to this level>
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add XeGPU matrix multiplication benchmark #15

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add XeGPU matrix multiplication benchmark #15

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions