[mlir-gen] Add mlir builders for llama3.1 and tests #13

kurapov-peter · 2025-11-14T13:57:14Z

Putting up this dirty draft for early feedback/questions. I'm putting together some tests to run a e2e llama3.1 going through linalg on tensors. The goal is to generate some nice linalg that would be optimization friendly. At the moment, there are just functional blocks and pieces that are just smoke-tested. These include naive implementations for rotary embeddings, feed forward, rms, and a bunch of other small snippets that are useful to implement the model. These are already enough to put an attention block together. It'd be nice to test it against the original implementation, but that'd require fairscale as a dependency. For now I only added pytest and kept the pipeline as simple as possible. I also reused the example with the schedule, so now it is a part of every test.

rengolin · 2025-11-14T14:02:34Z

Should this be in examples?

kurapov-peter · 2025-11-14T14:11:35Z

The e2e should be, yup, but this is mostly tests and getters.

python/examples/llama/test_llama3.py

kurapov-peter · 2025-11-24T16:29:28Z

I moved the whole thing to examples and added attention the list of tests.

rengolin

Thanks!

rolfmorel

Nice! Have left some comments inline.

.github/workflows/examples.yml

python/examples/llama/test_llama3.py

rolfmorel · 2025-11-25T16:45:46Z

python/examples/llama/test_llama3.py

+        [xq_scores_map, keys_scores_map, scores_map],
+        [parallel, parallel, parallel, parallel, reduction],
+    )
+    def compute_scores(q_val, k_val, score_val):


Could be written as a linalg.contract, right?

We can move generics to contract and elementwise later. TPP-MLIR has linalg generalization because some passes don't work with the new ops.

python/examples/llama/test_llama3.py

rolfmorel · 2025-11-25T17:07:00Z

python/examples/llama/test_llama3.py

+    module = generate_module(ctx, ir_type)
+    bufferize_module(ctx, module)
+    schedule = create_schedule(ctx)
+    apply_schedule(module, schedule)
+    pm = create_pass_pipeline(ctx)
+    pm.run(module.operation)


Suggested change

module = generate_module(ctx, ir_type)

bufferize_module(ctx, module)

schedule = create_schedule(ctx)

apply_schedule(module, schedule)

pm = create_pass_pipeline(ctx)

pm.run(module.operation)

module = generate_module(ctx, ir_type)

schedule = create_schedule(ctx)

apply_schedule(module, schedule)

Just move the passes from inside bufferize_module(ctx, module) and create_pass_pipeline(ctx) into the start and end of the schedule, i.e. with transform.apply_registered_pass.

I know this antipattern originates in an example script we merged, but we should not let this proliferate. It clearly is already confusing people.

rolfmorel · 2025-11-25T17:13:09Z

python/examples/llama/test_llama3.py

+    return schedule
+
+
+def apply_schedule(kernel: ir.Module, schedule: ir.Module) -> None:
+    interpreter.apply_named_sequence(
+        payload_root=kernel,
+        transform_root=schedule.body.operations[0],
+        transform_module=schedule,
+    )


Suggested change

return schedule

def apply_schedule(kernel: ir.Module, schedule: ir.Module) -> None:

interpreter.apply_named_sequence(

payload_root=kernel,

transform_root=schedule.body.operations[0],

transform_module=schedule,

)

return named_seq

If we do this, you can simply do:

schedule = create_schedule() schedule.apply(module)

If you need access to the Module around the named_sequence, just ask for its .parent.

rolfmorel · 2025-11-25T17:21:04Z

python/lighthouse/utils/runtime_args.py

@@ -1,2 +1,2 @@
 import ctypes
 import torch


I know this PR didn't introduce it, though looking at it now, I feel we should think about compartmentalizing code that depends on heavy dependencies a bit more. That is, not have it in the same module with code that doesn't have the dependency, e.g. get_packed_arg.

rolfmorel · 2025-11-26T16:49:13Z

python/examples/llama/test_llama3.py

+def create_schedule(ctx: ir.Context) -> ir.Module:
+    """
+    Create an MLIR module containing transformation schedule.
+    The schedule provides partial lowering to scalar operations.
+
+    Args:
+        ctx: MLIR context.
+    """
+    with ctx, ir.Location.unknown(context=ctx):
+        # Create transform module.
+        schedule = ir.Module.create()


Suggested change

def create_schedule(ctx: ir.Context) -> ir.Module:

"""

Create an MLIR module containing transformation schedule.

The schedule provides partial lowering to scalar operations.

Args:

ctx: MLIR context.

"""

with ctx, ir.Location.unknown(context=ctx):

# Create transform module.

schedule = ir.Module.create()

def create_schedule() -> ir.Module:

schedule = ir.Module.create()

And just de-indent the rest of the function.

rolfmorel · 2025-11-26T16:50:28Z

python/examples/llama/test_llama3.py

+def bufferize_module(ctx: ir.Context, kernel: ir.Module) -> None:
+    with ctx:
+        pm = PassManager("builtin.module")


Suggested change

def bufferize_module(ctx: ir.Context, kernel: ir.Module) -> None:

with ctx:

pm = PassManager("builtin.module")

def bufferize_module(kernel: ir.Module) -> None:

pm = PassManager("builtin.module")

rolfmorel · 2025-11-26T16:52:37Z

python/examples/llama/test_llama3.py

+def to_ir_type(type_str, ctx):
+    if type_str == "f32":
+        return ir.F32Type.get(context=ctx)
+    elif type_str == "f64":
+        return ir.F64Type.get(context=ctx)


Suggested change

def to_ir_type(type_str, ctx):

if type_str == "f32":

return ir.F32Type.get(context=ctx)

elif type_str == "f64":

return ir.F64Type.get(context=ctx)

def to_ir_type(type_str):

if type_str == "f32":

return ir.F32Type.get()

elif type_str == "f64":

return ir.F64Type.get()

In effect, these .get() methods are doing a ir.Context.current under the hood when you don't pass a context explicitly (just like the Op builders).

python/examples/llama/test_llama3.py

rengolin

I think there's a lot of smaller comments that we can leave for post-merge review. This is an example, and a complicated one at that, and we don't want to over-engineer something that will soon move to a better program.

rengolin · 2025-11-27T15:31:21Z

python/examples/llama/test_llama3.py

+        [xq_scores_map, keys_scores_map, scores_map],
+        [parallel, parallel, parallel, parallel, reduction],
+    )
+    def compute_scores(q_val, k_val, score_val):


We can move generics to contract and elementwise later. TPP-MLIR has linalg generalization because some passes don't work with the new ops.

rengolin requested review from adam-smnk, banach-space and rolfmorel November 14, 2025 14:02

Garra1980 reviewed Nov 18, 2025

View reviewed changes

python/examples/llama/test_llama3.py Show resolved Hide resolved

kurapov-peter force-pushed the rms branch from f7db6c4 to 891ec0f Compare November 24, 2025 16:21

rengolin approved these changes Nov 24, 2025

View reviewed changes

rolfmorel reviewed Nov 25, 2025

View reviewed changes

kurapov-peter force-pushed the rms branch from 0ec7c3a to 411afae Compare November 26, 2025 12:54

kurapov-peter added 4 commits November 26, 2025 13:55

[examples] Add a llama3 example.

9f2dcd9

Add a decorator for mlir context.

c23a5cd

Elide most of .result

d93d68c

Move bufferization and pm run into schedule application.

2db24bc

kurapov-peter force-pushed the rms branch from 411afae to 2db24bc Compare November 26, 2025 12:55

kurapov-peter added 2 commits November 26, 2025 15:35

Make lintner happy.

c0d6937

Run test_llama3 through lit.

fcac388

rolfmorel reviewed Nov 26, 2025

View reviewed changes

python/examples/llama/test_llama3.py Show resolved Hide resolved

rolfmorel reviewed Nov 26, 2025

View reviewed changes

python/examples/llama/test_llama3.py Outdated Show resolved Hide resolved

kurapov-peter added 4 commits November 26, 2025 18:59

Make everything dependent on the current context.

db0237b

[examples] Add the complete forward pass for llama.

49d6f93

Exclude ref model from lit tests.

208ccbd

Make ruff happy.

8493c5b

rengolin approved these changes Nov 27, 2025

View reviewed changes

rengolin merged commit bd87f3f into llvm:main Nov 27, 2025
3 checks passed

[mlir-gen] Add mlir builders for llama3.1 and tests #13

[mlir-gen] Add mlir builders for llama3.1 and tests #13

Uh oh!

Conversation

kurapov-peter commented Nov 14, 2025

Uh oh!

rengolin commented Nov 14, 2025

Uh oh!

kurapov-peter commented Nov 14, 2025

Uh oh!

Uh oh!

kurapov-peter commented Nov 24, 2025

Uh oh!

rengolin left a comment

Choose a reason for hiding this comment

Uh oh!

rolfmorel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolfmorel Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rengolin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rolfmorel Nov 25, 2025 •

edited

Loading