In this video, we will cover deep neural networks. In this video, we will review deep neural networks with multiple hidden layers, we will also show how to implement deep neural networks in PyTorch. Consider the following diagram of a neural network with D input dimensions, three neurons in the hidden layer, and an arbitrary number of neurons in the output layer. In the last section, we showed that by adding a hidden layer to a neural network, we can generate a decision function to separate non-linearly separable data. If we add more neurons to the hidden layer, we can obtain a more complex decision function but this leads to over-fitting. By adding more hidden layers, we usually increase the performance of our model, while decreasing the risk of over-fitting. In this example, we add one hidden layer. If a network has more than one hidden layer, it is called a deep neural network. In deep neural networks, hidden layers can have a different number of neurons. For example, the first hidden layer has 3 neurons and the second hidden layer has 5 neurons. Let's see how to build a deep neural network in PyTorch. This is our neural network model with two hidden layers and one output layer. As before, D_in is the size of the number of input features, it is also the size of the input dimensions for each neuron in the hidden layer. This is the constructor for the first hidden layer. The parameter H1, is the number of neurons in the hidden layer. This is the constructor for the second hidden layer. The size of the input dimension for each number is H1. This is the number of neurons in the previous hidden layer. The number of neurons in the second hidden layer, is given by H2. Finally, we have our output layer. The size of the input dimension H2, is the number of neurons in the previous hidden layer. The number of neurons in the final layer, is equal to the number of classes and is given by D_out. Let's see how the forward function works. We take input x, we apply the function linear, and apply the sigmoid, and assign it to x. This is the first activation. We repeat the process for the second layer. This is the input to the output layer. We then apply the linear function to the final activation, given the output of the network. We can construct a deep neural network model with a tanh activation function. We only need to replace the sigmoid function with a tanh activation function in the forward function in the previous model. Again, we can also construct a deep neural network model with the relu activation function. We only need to replace the sigmoid function with the relu activation function in forward, in the previous model. Consider the following network. The first hidden layer has 3 neurons. As the input has 3 dimensions, each neuron will have 3 input dimensions. The second layer has 4 neurons. As the input has 3 dimensional, each neuron will have 3 inputs. The output layer has 3 neurons for each class. As a result, the input size is 4. Each neuron will have 4 input dimensions. The attribute parameters is a useful parameter to check the shape of the network. We could also use the nn.Sequential module to construct our deep neural network. Since we have two hidden layers instead of one, we just add an extra layer. Let's train our deep neural network model with the MNIST dataset. Here, we create a validation and training dataset object, we create a validation and training loader, and we create a cross entropy function. In the training function, we store the loss and validation accuracy. In this example, we calculate the loss for each iteration, we calculate the accuracy of the validation data for each epoch. We will use two hidden layers. We will have an input dimension of 784 representing the number of pixels for the images in the MNIST dataset. The two hidden layers have 50 neurons each. The output layer has 10 neurons, each representing a digit from 0-9. We will use the Stochastic Gradient Descent optimizer to train the model. Here, we only use the sigmoid activation function for the model. But in the lab, we will also use models that implement the tanh and relu activation functions. In the lab, we will train deep neural network models for digital recognition using the MNIST dataset. We will use the sigmoid tanh and relu activation functions. We will see the relu and tanh activation function performs better than the sigmoid activation function, in terms of the loss. The relu and tan h activation function performs better than the sigmoid activation function in terms of validation accuracy. We can continue adding more hidden layers, building deeper networks. Later on, we will see how to add an arbitrary number of layers and train these networks.