0:00

In this video, I want to start telling you about how we represent neural networks.

In other words, how we represent our hypothesis or

how we represent our model when using neural networks.

Neural networks were developed as simulating neurons or

networks of neurons in the brain.

So, to explain the hypothesis representation

let's start by looking at what a single neuron in the brain looks like.

Your brain and

mine is jam packed full of neurons like these and neurons are cells in the brain.

And two things to draw attention to are that first.

The neuron has a cell body, like so, and moreover,

the neuron has a number of input wires, and these are called the dendrites.

You think of them as input wires, and these receive inputs from other locations.

And a neuron also has an output wire called an Axon, and

this output wire is what it uses to send signals to other neurons,

so to send messages to other neurons.

So, at a simplistic level what a neuron is, is a computational unit that

gets a number of inputs through it input wires and does some computation and

then it says outputs via its axon to other nodes or to other neurons in the brain.

Here's a illustration of a group of neurons.

The way that neurons communicate with each other is with little pulses of

electricity, they are also called spikes but that just means pulses of electricity.

So here is one neuron and what it does is if it wants a send a message what it

does is sends a little pulse of electricity.

Varis axon to some different neuron and here, this axon that is this open wire,

connects to the dendrites of this second neuron over here,

which then accepts this incoming message that some computation.

And they, in turn, decide to send out this message on this axon to other neurons,

and this is the process by which all human thought happens.

It's these Neurons doing computations and

passing messages to other neurons as a result of what other inputs they've got.

And, by the way, this is how our senses and our muscles work as well.

If you want to move one of your muscles the way that where else in your neuron may

send this electricity to your muscle and that causes your muscles to contract and

your eyes, some senses like your eye must send a message to your brain while it

does it senses hosts electricity entity to a neuron in your brain like so.

In a neuro network, or rather, in an artificial neuron network that we've

implemented on the computer, we're going to use a very simple model of

what a neuron does we're going to model a neuron as just a logistic unit.

So, when I draw a yellow circle like that, you should think of that as a playing

a role analysis, who's maybe the body of a neuron, and

we then feed the neuron a few inputs who's various dendrites or input wiles.

3:14

And the neuron does some computation.

And output some value on this output wire, or

in the biological neuron, this is an axon.

And whenever I draw a diagram like this,

what this means is that this represents a computation of h of

x equals one over one plus e to the negative theta transpose x,

where as usual, x and theta are our parameter vectors, like so.

4:31

Finally, one last bit of terminology when we talk about neural networks,

sometimes we'll say that this is a neuron or

an artificial neuron with a Sigmoid or logistic activation function.

So this activation function in the neural network terminology.

This is just another term for that function for

that non-linearity g(z) = 1 over 1+e to the -z.

And whereas so far I've been calling theta the parameters of the model,

I'll mostly continue to use that terminology.

Here, it's a copy to the parameters, but in neural networks, in the neural network

literature sometimes you might hear people talk about weights of a model and

weights just means exactly the same thing as parameters of a model.

But I'll mostly continue to use the terminology parameters in these videos,

but sometimes, you might hear others use the weights terminology.

5:34

What a neural network is, is just a group of this different neurons strong together.

Completely, here we have input units x1, x2, x3 and once again,

sometimes you can draw this extra note x0 and Sometimes not, just flow that in here.

And here we have three neurons which have written 81, 82, 83.

I'll talk about those indices later.

And once again we can if we want add in just a0 and

add the mixture bias unit there.

There's always a value of 1.

And then finally we have this third node and the final layer, and

there's this third node that outputs the value that the hypothesis h(x) computes.

To introduce a bit more terminology, in a neural network,

the first layer, this is also called the input layer because this is where we

Input our features, x1, x2, x3.

The final layer is also called the output layer because that layer has a neuron,

this one over here, that outputs the final value computed by a hypothesis.

And then, layer 2 in between, this is called the hidden layer.

The term hidden layer isn't a great terminology, but this ideation is that,

you know, you supervised early,

where you get to see the inputs and get to see the correct outputs, where

there's a hidden layer of values you don't get to observe in the training setup.

It's not x, and it's not y, and so we call those hidden.

And they try to see neural nets with more than one hidden layer but

in this example, we have one input layer, Layer 1, one hidden layer, Layer 2,

and one output layer, Layer 3.

But basically, anything that isn't an input layer and

isn't an output layer is called a hidden layer.

7:29

So I want to be really clear about what this neural network is doing.

Let's step through the computational steps that are and

body represented by this diagram.

To explain these specific computations represented by a neural network,

here's a little bit more notation.

I'm going to use a superscript j subscript i to denote the activation

of neuron i or of unit i in layer j.

So completely this gave superscript to sub group one,

that's the activation of the first unit in layer two, in our hidden layer.

And by activation I just mean the value that's computed by and

as output by a specific.

In addition, new network is parametrize by these matrixes, theta

super script j Where theta j is going to be a matrix of weights controlling

the function mapping form one layer, maybe the first layer to the second layer,

or from the second layer to the third layer.

8:34

This first hidden unit here has it's value computed as follows,

there's a is a21 is equal to the sigma function of the sigma activation function,

also called the logistics activation function,

apply to this sort of linear combination of these inputs.

And then this second hidden unit has this activation

value computer as sigmoid of this.

And similarly for this third hidden unit is computed by that formula.

So here we have 3 theta 1 which is matrix

of parameters governing our mapping

from our three different units, our hidden units.

Theta 1 is going to be a 3.

9:35

Theta 1 is going to be a 3x4-dimensional matrix.

And more generally, if a network has SJU units in there j and

sj + 1 units and sj + 1 then the matrix theta j

which governs the function mapping from there sj + 1.

That will have to mention sj +1 by sj + 1 I'll just be clear

about this notation right.

This is Subscript j + 1 and that's s subscript j, and

then this whole thing, plus 1, this whole thing (sj + 1), okay?

So that's s subscript j + 1 by,

10:21

So that's s subscript j + 1 by sj + 1 where

this plus one is not part of the subscript.

Okay, so we talked about what the three hidden units do to compute their values.

Finally, there's a loss of this final and

after that we have one more unit which computer h of x and

that's equal can also be written as a(3)1 and that's equal to this.

And you notice that I've written this with a superscript two here,

because theta of superscript two is the matrix of parameters, or

the matrix of weights that controls the function that maps from the hidden units,

that is the layer two units to the one layer three unit, that is the output unit.

To summarize, what we've done is shown how a picture like this over here defines

an artificial neural network which defines a function h

that maps with x's input values to hopefully to some space that provisions y.

And these hypothesis are parameterized by parameters

denoting with a capital theta so that, as we vary theta,

we get different hypothesis and we get different functions.

Mapping say from x to y.