Intro:
Gradient descent is a first order optimisation algorithem used for finding for the local minimum of a real-valued function
Usually the functions
where
Examples of Multivariate Functions
-
Two Variables:
This function takes two input variables, and , and produces a single output which is the sum of the squares of the inputs. -
Three Variables:
This function takes three input variables, , , and , and produces a single output which is the sum of the products of the pairs of inputs.
The gradient at a point, points in the direction of steepest ascent 1.
Gradient decent relies on that
Detailed Explanation
-
: This symbol represents the gradient of the function . The gradient is a vector of partial derivatives of with respect to its input variables. If is a function of variables, is a vector with components. -
: This means that the gradient is evaluated at the specific point . The point is in the domain of . -
: The symbol denotes the transpose of the gradient vector. If the gradient is originally a column vector, its transpose will be a row vector, and vice versa. -
: The minus sign indicates that we are considering the negative of the transposed gradient vector evaluated at .
Example
Consider a function
-
Gradient Calculation:
-
Evaluating at
: -
Transposing the Gradient: If
is a row vector, its transpose is still but often, it is considered a column vector transposed to a row vector. -
Negative Transposed Gradient:
This negative gradient is used to update the current point
If for a small step-size
-
Mathematics for Machine Learning, Section 5.1, https://yung-web.github.io/home/courses/mathml.html ↩