Welcome back. In this part, you'll learn to make linear regression much faster and much more powerful.
Let's start by looking at the version of linear regression that uses not just one feature, but multiple features.
In the original version of linear regression, you had a single feature x, the size of the house, and you were able to predict y, the price of the house. The model was:
f(w,b)(x)=w∗x+b
But now, what if you had not only the size of the house as a feature, but also the number of bedrooms, the number of floors, and the age of the home in years? This would give you much more information to predict the price.
To introduce a bit of new notation, we will use the variables X_1, X_2, X_3, X_4 to represent the four features. For simplicity, let's introduce even more notation:
We'll write X_j (for short) to represent the list of features, where j goes from 1 to 4, since we have 4 features. We will use lowercase n to denote the total number of features, so in this example, n = 4.
As before, we'll use X^i to denote the i-th training example, which now will be a list of 4 numbers (a vector of all the features of the i-th training example). For example, X^2 would be the features of the second training example: [1416, 3, 2, 40]. If we want to refer to a specific feature, we write X^i_j, so:
X32=2
This refers to the number of floors in the second training example.
Now that we have multiple features, the model can be defined as follows:
f(w,b)(X)=w1∗X1+w2∗X2+w3∗X3+w4∗X4+b
For example, a housing price prediction model might estimate the price as:
0.1∗X1+4∗X2+10∗X3−2∗X4+80
If predicting in thousands of dollars, this means that the base price starts at 80,000,increasesby100 for each additional square foot, 4,000perbedroom,10,000 per floor, and decreases by $2,000 for every year of age.
In general, if you have n features, the model becomes:
To make this more compact, we define W as a vector of the weights [w_1, w_2, ..., w_n] and X as a vector of the features [X_1, X_2, ..., X_n]. This allows us to rewrite the model as a dot product:
f(w,b)(X)=W⋅X+b
The dot product W \cdot X is calculated as:
w1∗X1+w2∗X2+...+wn∗Xn
Thus, multiple linear regression can be written compactly using vectors. This is known as multiple linear regression, which uses multiple input features.
In this section, we explored the concept of multiple linear regression, where multiple features are used to predict the target variable. We introduced vector notation and showed how it simplifies the representation of the model. Next, we will see how vectorization can speed up the implementation of such algorithms.