-
-
Notifications
You must be signed in to change notification settings - Fork 6
Streamline neural network exercise strategies #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Removed three strategies with negative improvements and kept only the two best-performing approaches: - Strategy 1: Deeper network + LR schedule + L2 regularization - Strategy 2: Baseline + Armijo line search (best validation MSE) Added technical explanation of Armijo backtracking line search and credited Matyas Farkas for contributing the winning strategy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
✅ Deploy Preview for incomparable-parfait-2417f8 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
@HumphreyYang I am just working on a few things at the moment, but I hope the take a look at this later this afternoon and get this merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR streamlines the neural network exercise solution section by removing underperforming optimization strategies and highlighting the two best-performing approaches: a deeper network with learning rate scheduling and L2 regularization, and a baseline network with Armijo backtracking line search.
Key Changes:
- Removed three optimization strategies with negative performance improvements (-0.000829, -0.000764, -0.000090)
- Retained and renumbered two best-performing strategies (improvements: +0.000024 and +0.000058)
- Added complete Armijo backtracking line search implementation with adaptive step size selection
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lectures/jax_nn.md
Outdated
|
|
||
| 4. Regularization vs architecture: Comparing strategies 3 and 4 shows whether | ||
| regularization is more effective with deeper architectures or simpler ones. | ||
| This strategy and its code was contributed by [Matyas Farkas](https://www.matyasfarkas.eu/). |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subject-verb agreement error: "This strategy and its code" is a plural subject requiring the plural verb "were" instead of "was".
| This strategy and its code was contributed by [Matyas Farkas](https://www.matyasfarkas.eu/). | |
| This strategy and its code were contributed by [Matyas Farkas](https://www.matyasfarkas.eu/). |
|
Many thanks @jstac and @mmcky — the exercise looks great! It reminds me of the convex optimization classes I took. I read through the lecture carefully and noted a few small questions and suggestions that might help:
based on the output: with To align with the JAX version, we could change Also, the Keras model does not use the dimensions in
it looks like
input: 1 -> 6 so we have
Please let me know if I’ve misunderstood anything — I can push a commit to update the lecture if these suggestions seem useful! |
|
@HumphreyYang I just merged #266 so I will check off the unused import comment. |
|
Many thanks @mmcky, I realized that the |
thanks @HumphreyYang I wonder where the preview build is getting |
|
Hi @mmcky, it is coming from |
|
|
|
Many thanks @mmcky, the |
|
Great review @HumphreyYang many thanks! Good catch regarding the Keras layers! We need to cut one, as you say. Please check the runtime and MSE after the change, on a GPU build. The discussion of relative times and the table before the exercises might need to change. As for the layer numbering convention, I follow this:
PS Feel free to enter the competition to reduce validation MSE ! |
|
Many thanks @jstac for your confirmation! I’ll leave the layer count out. Is there anything else I should keep constant, and do I have your permission to make changes for the other items on the list in this PR? I can open a new branch after this is merged as well. I’ll join the race after this busy month of paper revisions! |
|
Thanks @HumphreyYang . All your comments are spot on and you are free to make those changes.
Sure. Please don't feel the need to invest too much time in this --- I know you are busy. |
|
Hi guys, I just pushed a commit to fix the listed items and the code length. I’ll update the discussions on MSE once I’m back from dinner and have checked the preview run! |
|
Hi @jstac, I checked the MSE in the preview and the results are consistent with the previous discussions. I made another commit to address some minor points:
since the layer output is controlled by Please let me know if you spot any more improvement! |
|
Great work @HumphreyYang , many thanks! This all looks great. @mmcky It would be helpful to have the nice lecture-specific deployment messages here as well (if it's not a big job, since we'll dismantle this series before too long). |
|
@mmcky Please review, merge and make live when you are ready. |
Summary
This PR streamlines the exercise section of the JAX neural network lecture by removing underperforming strategies and highlighting the most effective approaches.
Changes
Removed 3 strategies with negative improvements:
Kept 2 best-performing strategies:
Added Armijo line search implementation: Complete implementation of gradient descent with backtracking line search for adaptive step size selection
Added technical explanation: Detailed explanation of how Armijo backtracking works and why it performs well
Attribution: Credited Matyas Farkas for contributing the winning Armijo line search strategy
Results
The Armijo line search strategy achieved the best validation MSE (0.040810) with competitive runtime (0.41s), demonstrating that adaptive step size selection provides meaningful improvements for neural network training.
🤖 Generated with Claude Code