In this lab, you will use a single perceptron neural network model to solve a simple classification problem.
Packages
Let's first import all the packages that you will need during this lab.
import numpy as npimport matplotlib.pyplot as pltfrom matplotlib import colors# A function to create a dataset.from sklearn.datasets import make_blobs # Output of plotting commands is displayed inline within the Jupyter notebook.%matplotlib inline # Set a seed so that the results are consistent.np.random.seed(3)
Classification is the problem of identifying which of a set of categories an observation belongs to. In case of only two categories it is called a binary classification problem. Let's see a simple example of it.
Imagine that you have a set of sentences which you want to classify as "happy" and "angry". And you identified that the sentences contain only two words: aack and beep. For each of the sentences (data point in the given dataset) you count the number of those two words (x1 and x2) and compare them with each other. If there are more "beep" (x2>x1), the sentence should be classified as "angry", if not (x2<=x1), it is a "happy" sentence. Which means that there will be some straight line separating those two classes.
Let's take a very simple set of 4 sentenses:
"Beep!"
"Aack?"
"Beep aack..."
"!?"
Here both x1 and x2 will be either 0 or 1. You can plot those points in a plane, and see the points (observations) belong to two classes, "angry" (red) and "happy" (blue), and a straight line can be used as a decision boundary to separate those two classes. An example of such a line is plotted.
fig, ax = plt.subplots()xmin, xmax = -0.2, 1.4x_line = np.arange(xmin, xmax, 0.1)# Data points (observations) from two classes.ax.scatter(0, 0, color="b")ax.scatter(0, 1, color="r")ax.scatter(1, 0, color="b")ax.scatter(1, 1, color="b")ax.set_xlim([xmin, xmax])ax.set_ylim([-0.1, 1.1])ax.set_xlabel('$x_1$')ax.set_ylabel('$x_2$')# One of the lines which can be used as a decision boundary to separate two classes.ax.plot(x_line, x_line + 0.5, color="black")plt.plot()
This particular line is chosen using common sense, just looking at the visual representation of the observations. Such classification problem is called a problem with two linearly separable classes.
The line x1−x2+0.5=0 (or x2=x1+0.5) can be used as a separating line for the problem. All of the points (x1,x2) above this line, such that x1−x2+0.5<0 (or x2>x1+0.5), will be considered belonging to the red class, and below this line x1−x2+0.5>0 (x2<x1+0.5) - belonging to the blue class. So the problem can be rephrased: in the expression w1x1+w2x2+b=0 find the values for the parameters w1, w2 and the threshold b, so that the line can serve as a decision boundary.
In this simple example you could solve the problem of finding the decision boundary just looking at the plot: w1=1, w2=−1, b=0.5. But what if the problem is more complicated? You can use a simple neural network model to do that! Let's implement it for this example and then try it for more complicated problem.
You already have constructed and trained a neural network model with one perceptron. Here a similar model can be used, but with an activation function. Then a single perceptron basically works as a threshold function.
The neural network components are shown in the following scheme:
Similarly to the previous lab, the input layer contains two nodes x1 and x2. Weight vector W=[w1w2] and bias (b) are the parameters to be updated during the model training. First step in the forward propagation is the same as in the previous lab. For every training example x(i)=[x1(i)x2(i)]:
z(i)=w1x1(i)+w2x2(i)+b=Wx(i)+b.(1)
But now you cannot take a real number z(i) into the output as you need to perform classification. It could be done with a discrete approach: compare the result with zero, and classify as 0 (blue) if it is below zero and 1 (red) if it is above zero. Then define cost function as a percentage of incorrectly identified classes and perform backward propagation.
This extra step in the forward propagation is actually an application of an activation function. It would be possible to implement the discrete approach described above (with unit step function) for this problem, but it turns out that there is a continuous approach that works better and is commonly used in more complicated neural networks. So you will implement it here: single perceptron with sigmoid activation function.
Sigmoid activation function is defined as
a=σ(z)=1+e−z1.(2)
Then a threshold value of 0.5 can be used for predictions: 1 (red) if a>0.5 and 0 (blue) otherwise. Putting it all together, mathematically the single perceptron neural network with sigmoid activation function can be expressed as:
z(i)a(i)=Wx(i)+b,=σ(z(i)).
If you have m training examples organised in the columns of (2×m) matrix X, you can apply the activation function element-wise. So the model can be written as:
ZA=WX+b,=σ(Z),
where b is broadcasted to the vector of a size (1×m).
When dealing with classification problems, the most commonly used cost function is the log loss, which is described by the following equation:
Note that the obtained expressions (7) are exactly the same as in the section 3.2 of the previous lab, when multiple linear regression model was discussed. Thus, they can be rewritten in a matrix form:
Let's get the dataset you will work on. The following code will create m=30 data points (x1,x2), where x1,x2∈{0,1} and save them in the NumPy array X of a shape (2×m) (in the columns of the array). The labels (0: blue, 1: red) will be calculated so that y=1 if x1=0 and x2=1, in the rest of the cases y=0. The labels will be saved in the array Y of a shape (1×m).
m = 30X = np.random.randint(0, 2, (2, m))Y = np.logical_and(X[0] == 0, X[1] == 1).astype(int).reshape((1, m))print('Training dataset X containing (x1, x2) coordinates in the columns:')print(X)print('Training dataset Y containing labels of two classes (0: blue, 1: red)')print(Y)print ('The shape of X is: ' + str(X.shape))print ('The shape of Y is: ' + str(Y.shape))print ('I have m = %d training examples!' % (X.shape[1]))
Implementation of the described neural network will be very similar to the previous lab. The differences will be only in the functions forward_propagation and compute_cost!
def layer_sizes(X, Y): """ Arguments: X -- input dataset of shape (input size, number of examples) Y -- labels of shape (output size, number of examples) Returns: n_x -- the size of the input layer n_y -- the size of the output layer """ n_x = X.shape[0] n_y = Y.shape[0] return (n_x, n_y)(n_x, n_y) = layer_sizes(X, Y)print("The size of the input layer is: n_x = " + str(n_x))print("The size of the output layer is: n_y = " + str(n_y))
Implement the function initialize_parameters(), initializing the weights array of shape (ny×nx)=(1×1) with random values and the bias vector of shape (ny×1)=(1×1) with zeros.
def initialize_parameters(n_x, n_y): """ Returns: params -- python dictionary containing your parameters: W -- weight matrix of shape (n_y, n_x) b -- bias value set as a vector of shape (n_y, 1) """ W = np.random.randn(n_y, n_x) * 0.01 b = np.zeros((n_y, 1)) parameters = {"W": W, "b": b} return parametersparameters = initialize_parameters(n_x, n_y)print("W = " + str(parameters["W"]))print("b = " + str(parameters["b"]))
Implement forward_propagation() following the equation (4) in the section 2.1:
ZA=WX+b,=σ(Z).
def forward_propagation(X, parameters): """ Argument: X -- input data of size (n_x, m) parameters -- python dictionary containing your parameters (output of initialization function) Returns: A -- The output """ W = parameters["W"] b = parameters["b"] # Forward Propagation to calculate Z. Z = np.matmul(W, X) + b A = sigmoid(Z) return AA = forward_propagation(X, parameters)print("Output vector A:", A)
Your weights were just initialized with some random values, so the model has not been trained yet.
Define a cost function (5) which will be used to train the model:
def compute_cost(A, Y): """ Computes the log loss cost function Arguments: A -- The output of the neural network of shape (n_y, number of examples) Y -- "true" labels vector of shape (n_y, number of examples) Returns: cost -- log loss """ # Number of examples. m = Y.shape[1] # Compute the cost function. logprobs = - np.multiply(np.log(A),Y) - np.multiply(np.log(1 - A),1 - Y) cost = 1/m * np.sum(logprobs) return costprint("cost = " + str(compute_cost(A, Y)))
Calculate partial derivatives as shown in (8):
∂W∂L∂b∂L=m1(A−Y)XT,=m1(A−Y)1.
def backward_propagation(A, X, Y): """ Implements the backward propagation, calculating gradients Arguments: A -- the output of the neural network of shape (n_y, number of examples) X -- input data of shape (n_x, number of examples) Y -- "true" labels vector of shape (n_y, number of examples) Returns: grads -- python dictionary containing gradients with respect to different parameters """ m = X.shape[1] # Backward propagation: calculate partial derivatives denoted as dW, db for simplicity. dZ = A - Y dW = 1/m * np.dot(dZ, X.T) db = 1/m * np.sum(dZ, axis = 1, keepdims = True) grads = {"dW": dW, "db": db} return gradsgrads = backward_propagation(A, X, Y)print("dW = " + str(grads["dW"]))print("db = " + str(grads["db"]))
Update parameters as shown in (9):
Wb=W−α∂W∂L,=b−α∂b∂L.
def update_parameters(parameters, grads, learning_rate=1.2): """ Updates parameters using the gradient descent update rule Arguments: parameters -- python dictionary containing parameters grads -- python dictionary containing gradients learning_rate -- learning rate parameter for gradient descent Returns: parameters -- python dictionary containing updated parameters """ # Retrieve each parameter from the dictionary "parameters". W = parameters["W"] b = parameters["b"] # Retrieve each gradient from the dictionary "grads". dW = grads["dW"] db = grads["db"] # Update rule for each parameter. W = W - learning_rate * dW b = b - learning_rate * db parameters = {"W": W, "b": b} return parametersparameters_updated = update_parameters(parameters, grads)print("W updated = " + str(parameters_updated["W"]))print("b updated = " + str(parameters_updated["b"]))
def nn_model(X, Y, num_iterations=10, learning_rate=1.2, print_cost=False): """ Arguments: X -- dataset of shape (n_x, number of examples) Y -- labels of shape (n_y, number of examples) num_iterations -- number of iterations in the loop learning_rate -- learning rate parameter for gradient descent print_cost -- if True, print the cost every iteration Returns: parameters -- parameters learnt by the model. They can then be used to make predictions. """ n_x = layer_sizes(X, Y)[0] n_y = layer_sizes(X, Y)[1] parameters = initialize_parameters(n_x, n_y) # Loop for i in range(0, num_iterations): # Forward propagation. Inputs: "X, parameters". Outputs: "A". A = forward_propagation(X, parameters) # Cost function. Inputs: "A, Y". Outputs: "cost". cost = compute_cost(A, Y) # Backpropagation. Inputs: "A, X, Y". Outputs: "grads". grads = backward_propagation(A, X, Y) # Gradient descent parameter update. Inputs: "parameters, grads, learning_rate". Outputs: "parameters". parameters = update_parameters(parameters, grads, learning_rate) # Print the cost every iteration. if print_cost: print ("Cost after iteration %i: %f" %(i, cost)) return parameters
You can see that after about 40 iterations the cost function does keep decreasing, but not as much. It is a sign that it might be reasonable to stop training there. The final model parameters can be used to find the boundary line and for making predictions. Let's visualize the boundary line.
def predict(X, parameters): """ Using the learned parameters, predicts a class for each example in X Arguments: parameters -- python dictionary containing your parameters X -- input data of size (n_x, m) Returns predictions -- vector of predictions of our model (blue: False / red: True) """ # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold. A = forward_propagation(X, parameters) predictions = A > 0.5 return predictionsX_pred = np.array([[1, 1, 0, 0], [0, 1, 0, 1]])Y_pred = predict(X_pred, parameters)print(f"Coordinates (in the columns):\n{X_pred}")print(f"Predictions:\n{Y_pred}")