Simplify backward pass by returning local gradients per op by Lyt060814 · Pull Request #115 · karpathy/micrograd

Lyt060814 · 2026-02-12T17:50:02Z

Summary

As suggested by @karpathy in this tweet (a follow-up to the microgpt announcement):

You just return local gradients for each op and get backward() to do the multiply (chaining) with global gradient from loss. So each op just expresses the bare fundamentals of what it needs to: the forward computation and the backward gradients for it.

This PR implements that simplification:

Each op now stores its local gradients in a _local_grads tuple instead of defining a _backward closure
backward() uniformly applies the chain rule by multiplying local gradients with the upstream gradient
_children kept as tuple instead of set() to preserve ordering correspondence with _local_grads

Test plan

pytest test/test_engine.py — forward and backward results match PyTorch (both test_sanity_check and test_more_ops)
demo.ipynb — MLP training on make_moons converges to 100% accuracy, loss curve unchanged
trace_graph.ipynb — computation graph visualization works correctly

simplify backward pass by storing local gradients per op

d15714c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify backward pass by returning local gradients per op#115

Simplify backward pass by returning local gradients per op#115
Lyt060814 wants to merge 1 commit intokarpathy:masterfrom
Lyt060814:simplify-backward-local-grads

Lyt060814 commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lyt060814 commented Feb 12, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant