Cubic example using gradient descent first
A cubic function can have two turning points. One is a local maximum and the other is a local minimum.
Gradient descent moves toward a minimum, so it will not give the maximum point.
Step 1 choose a cubic function
Let the function be
$$ J(x) = x^{3} - 3x $$
Step 2 write the gradient descent update rule
We use the update rule
$$ x_{\text{new}} = x_{\text{old}} - \alpha \frac{dJ}{dx} $$
For this example, take \(\alpha = 0.10\) and start at \(x_{0} = 2\).
Step 3 use the derivative only for the update
To run the update, we need the slope expression:
$$ \frac{dJ}{dx} = 3x^{2} - 3 $$
Step 4 calculate 10 iterations
Substitute into the update:
$$ x_{n+1} = x_{n} - 0.10(3x_{n}^{2} - 3) $$
| n |
xn |
3xn2 − 3 |
xn+1 = xn − 0.10(3xn2 − 3) |
| 0 | 2.000000 | 9.000000 | 1.100000 |
| 1 | 1.100000 | 0.630000 | 1.037000 |
| 2 | 1.037000 | 0.225107 | 1.014489 |
| 3 | 1.014489 | 0.087604 | 1.005729 |
| 4 | 1.005729 | 0.034482 | 1.002281 |
| 5 | 1.002281 | 0.013707 | 1.000910 |
| 6 | 1.000910 | 0.005464 | 1.000364 |
| 7 | 1.000364 | 0.002184 | 1.000145 |
| 8 | 1.000145 | 0.000874 | 1.000058 |
| 9 | 1.000058 | 0.000350 | 1.000023 |
| 10 | 1.000023 | 0.000140 | 0.999... (keeps moving to 1) |
What happened
Starting from \(x_{0} = 2\), the values moved toward \(x = 1\).
That point is the minimum for this cubic.
Now confirm using calculus (turning points)
Now we do the usual calculus method to see all turning points.
$$ \frac{dJ}{dx} = 3x^{2} - 3 $$
$$ 3x^{2} - 3 = 0 $$
$$ x^{2} = 1 $$
$$ x = -1,\ 1 $$
So the cubic has two stationary points: one at \(x = -1\) and one at \(x = 1\).
Gradient descent gave us only the \(x = 1\) point because it is the minimum side from our starting position.
Important note
Gradient descent gives only the minimum value.
That is why we did not get the other critical value \(x = -1\) from the iterations.
If you want the maximum
If you want the maximum of \(J(x)\), convert it into a minimum problem by defining
$$ g(x) = -J(x) $$
Minimizing \(g(x)\) is the same as maximizing \(J(x)\).