Running Gradient Descent
This document describes the process of running gradient descent for linear regression, illustrating how the algorithm updates the model parameters and approaches the global minimum.
Introduction
Let's see what happens when you run gradient descent for linear regression. Let's go see the algorithm in action.
Here's a plot of the model and data on the upper left and a contour plot of the cost function on the upper right and at the bottom is the surface plot of the same cost function.
Key Notes
Often w
and b
will both be initialized to 0, but for this demonstration, let's initialize w = -0.1
and b = 900
. So this corresponds to f(x) = -0.1x + 900
.
Now, if we take one step using gradient descent, we end up going from this point of the cost function out here to this point just down and to the right and notice that the straight line fit is also changed a bit.
Let's take another step. The cost function has now moved to this third and again the function f(x)
has also changed a bit. As you take more of these steps, the cost is decreasing at each update. So the parameters w
and b
are following this trajectory.
And if you look on the left, you get this corresponding straight line fit that fits the data better and better until we've reached the global minimum. The global minimum corresponds to this straight line fit, which is a relatively good fit to the data.
Special Topic
Now, you can use this f(x)
model to predict the price of your client's house or anyone else's house. For instance, if your friend's house size is 1250 square feet, you can now read off the value and predict that maybe they could get, I don't know, $250,000 for the house.
Batch Gradient Descent
To be more precise, this gradient descent process is called batch gradient descent
. The term batch gradient descent refers to the fact that on every step of gradient descent, we're looking at all of the training examples, instead of just a subset of the training data.
In computing gradient descent, when computing derivatives, when computing the sum from ( i = 1 ) to ( m ), batch gradient descent looks at the entire batch of training examples at each update. I know that batch gradient descent may not be the most intuitive name, but this is what people in the machine learning community call it.
If you've heard of the newsletter The Batch, that's published by DeepLearning.AI. The newsletter The Batch was also named for this concept in machine learning.
Working
It turns out that there are other versions of gradient descent that do not look at the entire training set, but instead look at smaller subsets of the training data at each update step. But we'll use batch gradient descent for linear regression.
So that's it for linear regression. Congratulations on getting through your first machine learning model. I hope you go and celebrate or I don't know maybe take a nap in your hammock.
In the notebook that follows this part, you'll see a review of the gradient descent algorithm as well as how to implement it in code. You'll also see a plot that shows how the cost decreases as you continue training more iterations. And you'll also see a contour plot, showing how the cost gets closer to the global minimum as gradient descent finds better and better values for the parameters w
and b
.
So remember that to do the notebook, you just need to read and run this code. You will not need to write any code yourself and I hope you take a few moments to do that.
Also, become familiar with the gradient descent code because this will help you to implement this and similar algorithms in the future yourself.
Conclusion
You now know how to implement linear regression with one variable, and that brings us to the close of this section. In the next section, you'll learn to make linear regression much more powerful. Instead of one feature like the size of a house, you will learn how to get it to work with lots of features. You'll also learn how to get it to fit nonlinear curves. These improvements will make the algorithm much more useful and valuable. Lastly, you'll also go over some practical tips that will really help in getting linear regression to work on practical applications.