Cost Function with Regularization
In this part, we will build on the intuition from the previous section and develop a modified cost function for learning algorithms that apply regularization.
Introduction
In the last part, we saw that regularization tries to make the parental values W1 through WN small to reduce overfitting. In this part, we'll build on that intuition and develop a modified cost function that you can use to apply regularization effectively.
Recap: The Quadratic Fit
Let's jump in and recall this example from the previous part, where we saw that if you fit a quadratic function to the data, it provides a good fit.
However, fitting a high-order polynomial leads to overfitting. But suppose we had a way to make the parameters W3 and W4 small, say close to 0. Here's what happens:
Modifying the Cost Function
Let's modify the cost function by adding terms like
So instead of minimizing the original objective function, you are penalizing the model if W_3
and W_4
are large.
With this new cost function, you'll minimize it when W_3
and W_4
are both close to 0.
Generalizing the Regularization Term
In practice, we often don't know which features to penalize, so we penalize all parameters W_j
. By doing this, you reduce overfitting by minimizing unnecessary complexity.
Where \lambda
is a regularization parameter, determining the importance of this penalty.
Balancing Regularization
Let's penalize all the parameters W_1
to W_{100}
and B
. A common practice is to scale the regularization term by dividing \lambda
by 2m
, where m
is the training set size.
This scaling ensures the same \lambda
works even if the training set grows larger.
Choosing the Right Lambda Value
Choosing \lambda
is crucial. If \lambda = 0
, the model overfits. If \lambda
is too large, the model underfits. The goal is to find a middle ground.
Conclusion
Regularization helps by striking a balance between fitting the training data and keeping the parameters small to avoid overfitting. In the next section, we will explore how to apply regularization to linear and logistic regression.