Gradient Descent for Multiple Linear Regression
Explanation and implementation of gradient descent for multiple linear regression using vectorization.
Introduction
You've learned about gradient descent for multiple linear regression and also vectorization. Let's put it all together to implement gradient descent for multiple linear regression with vectorization. This will be exciting.
Let's quickly review what multiple linear regression looks like. Using our previous notation, we can write it more succinctly with vector notation. We have parameters w_1
to w_n
as well as b
. But instead of thinking of w_1
to w_n
as separate numbers, let's collect all of them into a vector W
, so W
is a vector of length n
. We will think of the parameters as a vector W
and a scalar b
. Previously, we defined multiple linear regression as:
Now, using vector notation:
The dot here refers to the dot product.
Cost Function and Gradient Descent
Our cost function can be defined as:
But instead of thinking of J
as a function of multiple parameters w_1
through w_n
and b
, we write it as:
Where W
is the vector of parameters and b
is a scalar.
Here's what gradient descent looks like:
We repeatedly update each parameter w_j
as:
And for b
:
The partial derivatives represent the gradient of the cost function with respect to w_j
and b
.
Implementing Gradient Descent
With multiple features, gradient descent is slightly different. Here's what we had with one feature:
We had an update rule for w
and a separate update rule for b
.
Now, with n
features, we have:
For each j = 1, 2, ..., n
.
Similarly, we update b
as:
This is how gradient descent works for multiple features.
Normal Equation
An alternative to gradient descent is the normal equation, which solves for W
and b
directly using linear algebra. The normal equation is:
This method does not require iterations but can be computationally expensive when the number of features is large.
Although not as generalizable as gradient descent, some machine learning libraries might use this method in the backend for linear regression.