Let's take what we've learned and put it together into an algorithm to let your neural network make inferences or predictions. This will be an algorithm called forward propagation. Let's take a look.
I'm going to use as a motivating example, handwritten digit recognition. For simplicity, we are just going to distinguish between the handwritten digits zero and one. So it's a binary classification problem where we're going to input an image and classify, is this the digit zero or the digit one? You'll also get to play with this yourself later in the notebook.
For the example on the slide, I'll use an eight by eight image. This image of a '1' is a grid or matrix of eight by eight (64 pixel intensity values), where 255 denotes a bright white pixel and zero denotes a black pixel.
Different numbers represent different shades of gray between black and white.
Given these 64 input features, we use a neural network with two hidden layers. The first hidden layer has 25 neurons or units. The second hidden layer has 15 neurons or units. Finally, the output layer (or output unit) calculates the chance of this being a '1' versus a '0'.
Let's step through the sequence of computations your neural network will need to make to go from input (X) (the 64 numbers) to the predicted probability (a3).
The first computation is from (X) to (a1), which is performed by the first hidden layer.
a1[1]=g(W[1]X+b[1])
Notice that (a1) has 25 numbers because the hidden layer has 25 units. Thus, the parameters run from (W_1^1) through (W_25^1), as well as (b_1^1) through (b_25^1).
We could also write (a0) as equal to the input feature value (X). So let's compute (a1).
Next, we compute (a2) by applying the activation function to (W^2a1 + b^2). This layer has 15 units, so the parameters range from (W_1^2) to (W_15^2), and similarly for (b).
The computation steps take input (X) and compute (a1), then (a2), and finally (a3), which is the output. We can denote this as (f(X)).
Since these computations proceed from left to right, we call this algorithm forward propagation, as we propagate activations in a forward direction. In contrast, backward propagation is used for learning, which you'll learn next week.
A neural network with more hidden units in earlier layers and fewer hidden units as you approach the output layer is a typical choice. You'll see more examples in the notebook.