Regularized Logistic Regression

Introduction

In this part, we will see how to implement regularized logistic regression. Similar to regularized linear regression, we will modify logistic regression to prevent overfitting. Overfitting happens when the model is too complex and captures noise in the data, which leads to poor generalization to unseen examples.

Logistic Regression and Overfitting

We saw earlier that logistic regression can be prone to overfitting, especially when using high-order polynomial features. Let’s take a closer look:

RLOR (1)

In particular, fitting logistic regression with many features can lead to a complex decision boundary, which risks overfitting the training set. A simpler decision boundary that generalizes better is preferable.

RLOR (2)

Regularized Cost Function

We modify the cost function for logistic regression by adding a regularization term:

J(w,b)=1mi=1m[y(i)log(f(i))+(1y(i))log(1f(i))]+λ2mj=1nwj2J(w, b) = -\frac{1}{m} \sum_{i=1}^{m} \left[y^{(i)} \log(f^{(i)}) + (1 - y^{(i)}) \log(1 - f^{(i)}) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} w_j^2

Here, λ is the regularization parameter. Regularization discourages large weights by penalizing them in the cost function.

RLOR (3)

Minimizing this regularized cost function prevents parameters w_j from becoming too large, allowing for better generalization to new examples.

RLOR (4)

Gradient Descent for Regularized Logistic Regression

Just like in regularized linear regression, we can update the weights and bias using gradient descent:

wj:=wjα(1mi=1m(f(i)y(i))xj(i)+λmwj)w_j := w_j - \alpha \left( \frac{1}{m} \sum_{i=1}^{m} (f^{(i)} - y^{(i)}) x_j^{(i)} + \frac{\lambda}{m} w_j \right) b:=bα(1mi=1m(f(i)y(i)))b := b - \alpha \left( \frac{1}{m} \sum_{i=1}^{m} (f^{(i)} - y^{(i)}) \right)

Note that the update rule for b remains unchanged, as we do not regularize b.

RLOR (5)

Implementing Regularized Logistic Regression

To implement regularized logistic regression, you apply the gradient descent update rules above. The only difference between regularized linear and logistic regression is that the prediction function f is now the sigmoid of z, where:

f(i)=11+ez(i)f^{(i)} = \frac{1}{1 + e^{-z^{(i)}}}

RLOR (6)

Intuition for Regularization

The regularization term shrinks the weights w_j by a small amount on every iteration of gradient descent, which helps prevent overfitting.

wj:=wj(1αλm)α(gradient term)w_j := w_j \cdot (1 - \alpha \frac{\lambda}{m}) - \alpha \cdot \left( \text{gradient term} \right)

This helps logistic regression generalize better to new examples.

RLOR (7)

Final Thoughts

By now, you’ve learned how to implement regularized logistic regression to reduce overfitting, even with a large number of features. You should practice this in the upcoming labs and apply regularization to avoid overfitting in logistic regression.

RLOR (8)

Congratulations on reaching the end of this section! There’s much more to learn, and in the next part, we’ll explore Neural Networks and their fascinating applications in Deep Learning.

On this page

Edit on Github Question? Give us feedback