Skip to content

Simplify backward pass by returning local gradients per op#115

Open
Lyt060814 wants to merge 1 commit intokarpathy:masterfrom
Lyt060814:simplify-backward-local-grads
Open

Simplify backward pass by returning local gradients per op#115
Lyt060814 wants to merge 1 commit intokarpathy:masterfrom
Lyt060814:simplify-backward-local-grads

Conversation

@Lyt060814
Copy link

Summary

As suggested by @karpathy in this tweet (a follow-up to the microgpt announcement):

You just return local gradients for each op and get backward() to do the multiply (chaining) with global gradient from loss. So each op just expresses the bare fundamentals of what it needs to: the forward computation and the backward gradients for it.

This PR implements that simplification:

  • Each op now stores its local gradients in a _local_grads tuple instead of defining a _backward closure
  • backward() uniformly applies the chain rule by multiplying local gradients with the upstream gradient
  • _children kept as tuple instead of set() to preserve ordering correspondence with _local_grads

Test plan

  • pytest test/test_engine.py — forward and backward results match PyTorch (both test_sanity_check and test_more_ops)
  • demo.ipynb — MLP training on make_moons converges to 100% accuracy, loss curve unchanged
  • trace_graph.ipynb — computation graph visualization works correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant