Multiple Features

Introduction

Welcome back. In this part, you'll learn to make linear regression much faster and much more powerful.

Let's start by looking at the version of linear regression that uses not just one feature, but multiple features.

MF1

In the original version of linear regression, you had a single feature x, the size of the house, and you were able to predict y, the price of the house. The model was:

f(w,b)(x) = w * x + b

But now, what if you had not only the size of the house as a feature, but also the number of bedrooms, the number of floors, and the age of the home in years? This would give you much more information to predict the price.

Multiple Features in Linear Regression

MF2

To introduce a bit of new notation, we will use the variables X_1, X_2, X_3, X_4 to represent the four features. For simplicity, let's introduce even more notation:

We'll write X_j (for short) to represent the list of features, where j goes from 1 to 4, since we have 4 features. We will use lowercase n to denote the total number of features, so in this example, n = 4.

Notation for Features, Training Examples, and Parameters

MF3

As before, we'll use X^i to denote the i-th training example, which now will be a list of 4 numbers (a vector of all the features of the i-th training example). For example, X^2 would be the features of the second training example: [1416, 3, 2, 40]. If we want to refer to a specific feature, we write X^i_j, so:

X^2_3 = 2

This refers to the number of floors in the second training example.

MF5

Now that we have multiple features, the model can be defined as follows:

f(w,b)(X) = w_1 * X_1 + w_2 * X_2 + w_3 * X_3 + w_4 * X_4 + b

For example, a housing price prediction model might estimate the price as:
$0.1 * X_1 + 4 * X_2 + 10 * X_3 - 2 * X_4 + 80$
If predicting in thousands of dollars, this means that the base price starts at $80,000, increases by$ 100 for each additional square foot, $4,000 per bedroom,$ 10,000 per floor, and decreases by $2,000 for every year of age.

In general, if you have n features, the model becomes:

f(w,b)(X) = w_1 * X_1 + w_2 * X_2 + ... + w_n * X_n + b

Generalization to Vector Form

MF6

To make this more compact, we define W as a vector of the weights [w_1, w_2, ..., w_n] and X as a vector of the features [X_1, X_2, ..., X_n]. This allows us to rewrite the model as a dot product:

f(w,b)(X) = W \cdot X + b

MF7

The dot product W \cdot X is calculated as:

w_1 * X_1 + w_2 * X_2 + ... + w_n * X_n

Thus, multiple linear regression can be written compactly using vectors. This is known as multiple linear regression, which uses multiple input features.

Conclusion

In this section, we explored the concept of multiple linear regression, where multiple features are used to predict the target variable. We introduced vector notation and showed how it simplifies the representation of the model. Next, we will see how vectorization can speed up the implementation of such algorithms.