|
9 | 9 | "\n", |
10 | 10 | "A more efficient approach is to use information about the slope of the potential energy surface to guide our search. \n", |
11 | 11 | "\n", |
12 | | - "The **gradient descent method** (also called the **steepest descent method**) works by analogy to releasing a ball on a hill and letting it roll to the bottom. At any point on our potential energy surface, the **gradient** tells us which direction is “uphill” and how steep the surface is at this point. With this information, we can step in the opposite direction (i.e., downhill), then recalculate the gradient at our new position, and repeat until we reach a point where the gradient is $\\sim0$.\n", |
| 12 | + "The **gradient descent method** (also called the **steepest descent method**) works by analogy to releasing a ball on a hill and letting it roll to the bottom. At any point on our potential energy surface, the **gradient** tells us which direction is \"uphill\" and how steep the surface is at this point. With this information, we can step in the opposite direction (i.e., downhill), then recalculate the gradient at our new position, and repeat until we reach a point where the gradient is $\\sim0$.\n", |
13 | 13 | "\n", |
14 | 14 | "The simplest implementation of this method is to move a fixed distance every step.\n", |
15 | | - "#### Algorithm\n", |
| 15 | + "\n", |
| 16 | + "#### Algorithm (Fixed Step Size)\n", |
16 | 17 | "\n", |
17 | 18 | "1. Start at an initial guess position $r_0$\n", |
18 | | - "2. Calculate the gradient $U^\\prime = \\frac{dU}{dr}$ at this point\n", |
19 | | - "3. Move in the direction opposite to the gradient (i.e., downhill), by some amount $\\Delta r$.\n", |
20 | | - "4. Repeat until the gradient is close to zero.\n", |
| 19 | + "2. Calculate the gradient $U^\\prime(r) = \\frac{\\mathrm{d}U}{\\mathrm{d}r}$ at this point\n", |
| 20 | + "3. Move in the direction opposite to the gradient (i.e., downhill):\n", |
| 21 | + " $$r_{i+1} = r_i - \\Delta r \\times \\mathrm{sign}(U^\\prime(r_i))$$\n", |
| 22 | + " where $\\Delta r$ is a fixed step size and $\\mathrm{sign}(U^\\prime)$ gives us the direction (+1 or -1)\n", |
| 23 | + "4. Repeat until the gradient is close to zero\n", |
| 24 | + "```{note}\n", |
| 25 | + "In molecular simulations, we often express the gradient of the potential energy in terms of the **force**. The force is defined as the negative gradient of the potential energy:\n", |
| 26 | + "\n", |
| 27 | + "$$F(r) = -\\frac{\\mathrm{d}U}{\\mathrm{d}r} = -U^\\prime(r)$$\n", |
| 28 | + "\n", |
| 29 | + "This relationship means that forces naturally point \"downhill\" towards lower energy, whilst the gradient points \"uphill\" towards higher energy. When particles move in the direction of the force, they move towards lower potential energy.\n", |
| 30 | + "```" |
| 31 | + ] |
| 32 | + }, |
| 33 | + { |
| 34 | + "cell_type": "markdown", |
| 35 | + "id": "cd899cfa-e2eb-4a33-b0c2-20860f6635cf", |
| 36 | + "metadata": {}, |
| 37 | + "source": [ |
| 38 | + "#### Exercise: Fixed Step Size Gradient Descent\n", |
21 | 39 | "\n", |
22 | | - "#### Exercise:\n", |
23 | 40 | "Write a function to calculate the first derivative of a harmonic potential:\n", |
24 | 41 | "\n", |
25 | | - "$$U^\\prime = 2k(r - r_0)$$\n", |
| 42 | + "$$U^\\prime(r) = k(r - r_0)$$\n", |
| 43 | + "\n", |
| 44 | + "Using this function, write code to perform a gradient descent search to find the minimum of your harmonic potential energy surface. Use $r=1.0$ Å as your starting position.\n", |
| 45 | + "\n", |
| 46 | + "Your code should:\n", |
| 47 | + "- Use a `for` loop with a **maximum of 50 iterations** to prevent infinite loops\n", |
| 48 | + "- Store the position and gradient at each iteration in lists (for plotting later)\n", |
| 49 | + "- Update the position at each step using: $r_{i+1} = r_i - \\Delta r \\times \\mathrm{sign}(U^\\prime(r_i))$\n", |
| 50 | + "- Print the iteration number, position, and gradient at each step\n", |
| 51 | + "- Stop early (using `break`) when $|U^\\prime(r)| < 0.001$\n", |
| 52 | + "- Report whether the algorithm converged or hit the maximum iteration limit\n", |
26 | 53 | "\n", |
27 | | - "Using this function, write code to perform a gradient descent search, to find the minimum of your harmonic potential energy surface. Use $r=1.0$ Å as your starting position, and a step size $\\Delta r=0.1$ Å.\n", |
| 54 | + "**Part 1:** Start with $\\Delta r = 0.01$ Å. Does the algorithm converge? How many iterations does it require?\n", |
28 | 55 | "\n", |
29 | | - "_Note_: You should not need to run for more than 10 steps.\n", |
| 56 | + "**Part 2:** Now try $\\Delta r = 0.1$ Å to see if a larger step size converges faster. What behaviour do you observe? Does it still converge?\n", |
30 | 57 | "\n", |
31 | | - "What happens using this algorithm when you get close to the minimum? What happens if you decrease the step size to $\\Delta r=0.01$ Å?" |
| 58 | + "**Questions to consider:**\n", |
| 59 | + "- Why does the larger step size cause oscillation?\n", |
| 60 | + "- The minimum is at r = 0.74 Å and we start at r = 1.0 Å. Can you explain why Δr = 0.01 Å converges but Δr = 0.1 Å does not, based on this distance?\n", |
| 61 | + "- What would you predict for Δr = 0.05 Å?" |
32 | 62 | ] |
33 | 63 | }, |
34 | 64 | { |
|
37 | 67 | "metadata": {}, |
38 | 68 | "source": [ |
39 | 69 | "#### Rescaling the Step Size\n", |
40 | | - "Using a fixed step-size makes our method very sensitive to the choice of step-size. Large step sizes will overshoot the minimum and then oscillate back and forth. Small step sizes will get closer to the minimum, but at the cost of needing to perform many more calculations.\n", |
41 | 70 | "\n", |
42 | | - "A common approach designed to address this problem is to rescale the step size for each iteration, based on how far we think we are from the minimum. A simple model is that we can expect the gradient to be steep if we are a long way from the minimum, but shallow if we are already close to the minimum, so we make our step-size proportional to the (negative) local gradient.\n", |
| 71 | + "Using a fixed step size makes our method very sensitive to the choice of step size. As you have seen, large step sizes overshoot the minimum and oscillate back and forth, whilst small step sizes converge more reliably but require many more iterations.\n", |
| 72 | + "\n", |
| 73 | + "A common approach designed to address this problem is to rescale the step size for each iteration based on how far we think we are from the minimum. A simple model assumes that the gradient will be steep if we are far from the minimum, but shallow if we are already close. Therefore, we make our step size proportional to the local gradient magnitude.\n", |
| 74 | + "\n", |
| 75 | + "Instead of moving a fixed distance $\\Delta r$, we take a step proportional to the gradient:\n", |
43 | 76 | "\n", |
44 | 77 | "$$\\Delta r_i = -\\alpha U^\\prime(r_i)$$\n", |
45 | 78 | "\n", |
46 | | - "or \n", |
| 79 | + "or equivalently, proportional to the force:\n", |
47 | 80 | "\n", |
48 | 81 | "$$\\Delta r_i = \\alpha F(r_i)$$\n", |
49 | 82 | "\n", |
50 | | - "where $F(r_i)$ is the **force** (i.e., the negative gradient of the energy) at $r_i$.\n", |
| 83 | + "where:\n", |
| 84 | + "- $\\alpha$ is a constant called the **learning rate** or **step size parameter**\n", |
| 85 | + "- $F(r_i) = -U^\\prime(r_i)$ is the force at position $r_i$\n", |
51 | 86 | "\n", |
52 | | - "#### Exercise:\n", |
53 | | - "1. Write a new version of your steepest descent code that rescales the step to be proportional to the local force, with $\\alpha = 0.01$. You should write this as a loop that iteratively updates the current position, i.e.\n", |
| 87 | + "The update equation then becomes:\n", |
54 | 88 | "\n", |
55 | | - "$$r_{i+1} = r_i + \\alpha F(r_i).$$\n", |
| 89 | + "$$r_{i+1} = r_i - \\alpha U^\\prime(r_i)$$\n", |
56 | 90 | "\n", |
57 | | - "By combining a suitable `if` statement with `break`, have your code stop and report the predicted equilibrium bond-length when $U < \\left|0.001\\right|$.\n" |
| 91 | + "or equivalently:\n", |
| 92 | + "\n", |
| 93 | + "$$r_{i+1} = r_i + \\alpha F(r_i)$$\n", |
| 94 | + "\n", |
| 95 | + "With this adaptive approach:\n", |
| 96 | + "- When far from the minimum, $|U^\\prime|$ is large → we take large steps\n", |
| 97 | + "- When close to the minimum, $|U^\\prime|$ is small → we take small steps\n", |
| 98 | + "\n", |
| 99 | + "The parameter $\\alpha$ controls the overall \"aggressiveness\" of the optimisation. Too large and we overshoot; too small and we converge slowly." |
58 | 100 | ] |
59 | 101 | }, |
60 | 102 | { |
61 | 103 | "cell_type": "markdown", |
62 | | - "id": "182260eb-a629-414d-9b1a-a4587ed2326e", |
| 104 | + "id": "e84df851-bdbd-48d5-99a1-da0df6ea6034", |
63 | 105 | "metadata": {}, |
64 | 106 | "source": [ |
65 | | - "2. How does changing $\\alpha$ affect your rate of convergence? Experiment with larger and smaller values of $\\alpha$.\n", |
| 107 | + "#### Exercise: Adaptive Step Size Gradient Descent\n", |
| 108 | + "\n", |
| 109 | + "1. Write a new version of your gradient descent code that rescales the step to be proportional to the local force. You should write this as a loop that iteratively updates the current position:\n", |
| 110 | + "\n", |
| 111 | + "$$r_{i+1} = r_i + \\alpha F(r_i)$$\n", |
| 112 | + "\n", |
| 113 | + "where $F(r_i) = -U^\\prime(r_i) = -k(r_i - r_0)$.\n", |
| 114 | + "\n", |
| 115 | + "Your code should:\n", |
| 116 | + "- Start from $r = 1.0$ Å\n", |
| 117 | + "- Use a `for` loop with a **maximum of 50 iterations**\n", |
| 118 | + "- Update the position at each iteration\n", |
| 119 | + "- Print the iteration number, position, gradient, and step size taken at each iteration\n", |
| 120 | + "- Stop early (using `break`) when $|U^\\prime(r)| < 0.001$\n", |
| 121 | + "- Report the final predicted equilibrium bond length and number of iterations required\n", |
| 122 | + "\n", |
| 123 | + "**Part 1:** Start with $\\alpha = 0.01$. Does it converge? How many iterations does it take? Compare this to the fixed step size results.\n", |
| 124 | + "\n", |
| 125 | + "**Part 2:** Try $\\alpha = 0.001$. What happens to the convergence rate?\n", |
| 126 | + "\n", |
| 127 | + "**Part 3:** Try $\\alpha = 0.1$. Does the algorithm remain stable?\n", |
66 | 128 | "\n", |
67 | | - "_Note_: You might want to set a maximum number of iterations, e.g., 30." |
| 129 | + "**Questions to consider:**\n", |
| 130 | + "- How does the step size change as you approach the minimum? Watch the \"Step (Å)\" column in your output.\n", |
| 131 | + "- Why doesn't the adaptive method oscillate like the fixed step size method with $\\Delta r = 0.1$ Å did?\n", |
| 132 | + "- Compare $\\alpha = 0.01$ (adaptive) with $\\Delta r = 0.01$ Å (fixed): which converges faster and why?\n", |
| 133 | + "- What happens when $\\alpha$ is too large? Can you explain this behaviour?\n", |
| 134 | + "- Based on your experiments, what seems to be a good choice of $\\alpha$ for this problem?" |
68 | 135 | ] |
69 | 136 | }, |
70 | 137 | { |
71 | 138 | "cell_type": "code", |
72 | 139 | "execution_count": null, |
73 | | - "id": "5c131eda-e2e5-47c4-bfe2-92f9331b2df6", |
| 140 | + "id": "2100979f-a7ce-4818-b2fa-43bf4848966b", |
74 | 141 | "metadata": {}, |
75 | 142 | "outputs": [], |
76 | 143 | "source": [] |
|
92 | 159 | "name": "python", |
93 | 160 | "nbconvert_exporter": "python", |
94 | 161 | "pygments_lexer": "ipython3", |
95 | | - "version": "3.11.10" |
| 162 | + "version": "3.12.3" |
96 | 163 | } |
97 | 164 | }, |
98 | 165 | "nbformat": 4, |
|
0 commit comments