The fundamental building block of most modern neural networks is a layer of neurons. In this part, you'll learn how to construct a layer of neurons and once you have that down, you'd be able to take those building blocks and put them together to form a large neural network. Let's take a look at how a layer of neurons works.
Here's the example we had from the demand prediction example where we had four input features that were set to this layer of three neurons in the hidden layer that then sends its output to this output layer with just one neuron.
This hidden layer inputs four numbers, and these four numbers are inputs to each of three neurons. Each of these three neurons is just implementing a logistic regression function.
Take this first neuron. It has two parameters, w and b.
To denote that this is the first hidden unit, I'm going to subscript this as w_1, b_1. What it does is output some activation value a, which is g(w_1 * x + b_1), where g(z) is the logistic function, 1 / (1 + e^(-z)). Maybe this results in an activation value a_1 of 0.3, meaning there's a 30% chance of affordability based on the input features.
Now let's look at the second neuron, which has parameters w_2 and b_2. The second neuron computes a_2 = g(w_2 * x + b_2) and might output 0.7, suggesting a 70% chance of awareness of this t-shirt.
In this example, these three neurons output 0.3, 0.7, and 0.2, and this vector of three numbers becomes the vector of activation values a that is passed to the final output layer of this neural network.
When you build neural networks with multiple layers, it's useful to give the layers different numbers. By convention, this layer is called layer 1, and the next one is layer 2.
The input layer is sometimes called layer 0. Today, there are neural networks with dozens or even hundreds of layers. We'll introduce superscript notation to distinguish between them.
For example, the output of layer 1 will be denoted as a^[1]. Similarly, parameters of the first neuron in layer 1 will be denoted w_1^[1], b_1^[1], and so on for other neurons.
Now, let's zoom into the computation of layer 2, which is the output layer. The input to layer 2 is the output of layer 1, so a^[1] is the vector 0.3, 0.7, 0.2.
Since the output layer has only one neuron, it computes a single value, a^[2], based on the inputs from layer 1, using the logistic function again.
The output of layer 2 is a scalar, say 0.84. If we want a binary prediction (1 or 0), we can apply a threshold at 0.5: if the output is greater than 0.5, we predict y_hat = 1; otherwise, we predict y_hat = 0.
This is how a neural network layer works. Each layer applies logistic regression units to its input, computes a vector of activation values, and passes that on to the next layer until the final output layer makes a prediction.