In the last part, we explored the loss function and the cost function for logistic regression. In this part, you'll learn a slightly simpler way to express these functions, which will make the implementation smoother when we get to gradient descent for fitting the parameters of a logistic regression model.
Let’s begin by revisiting the loss function from the previous section.
Since we're dealing with a binary classification problem, where y can only take the values 0 or 1, we can rewrite the loss function in a simpler form. This simpler version still accurately reflects the relationship between the prediction and the target label.
Here’s how the simplified loss function looks:
Given a prediction f(x) and a target label y, the loss can be defined as:
Loss=−[y∗log(f(x))+(1−y)∗log(1−f(x))]
This expression, while concise, is equivalent to the more complex formulation we previously used. Let’s break down why this works by considering the possible values of y.
Here, the first term becomes 0, and the second term is 1 - y = 1.
The loss function becomes:
Loss=−log(1−f(x))
This is again consistent with the original, more complex formula. As a result, no matter whether y = 0 or y = 1, this simpler loss function yields the correct result.
We can now apply this simplified loss function to define the cost function for logistic regression. As a reminder, the cost function J is the average loss across the training set of m examples.
You can bring out the negative signs and simplify it as follows:
This is the cost function used to train logistic regression models. It’s widely adopted because it is derived from the maximum likelihood estimation principle from statistics, which ensures that this cost function is convex.
You don't need to dive into the details of maximum likelihood estimation for now. Just know that this cost function has a nice property of being convex, meaning we can apply gradient descent efficiently.
The upcoming notebook will show how this cost function is implemented in code. I recommend you take a look because you will be implementing it later during the practical lab. The notebook also demonstrates how different parameter choices affect the cost.
The plot shows how the better fitting blue decision boundary has a lower cost relative to the magenta decision boundary. With this simplified cost function, we are now ready to jump into applying gradient descent to logistic regression.