[MUSIC]

So, I've just promised you a lot of cool stuff that you can

do with unsupervised learning.

Now, let's cover how you do this, because otherwise would be a cheat.

Now, as I've mentioned, there's many methods at play here.

But let's start from the most simple to understand and the most,

sort of, kind of, general one, the autoencoders.

Autoencoders is the kind of models that encode the data in

hidden representation and then decode it backwards.

Now, this seems like a weird problem unless you want to compress the data, but

trust me, they have a lot of surprises there.

Now, again, autoencoders consist of two parts as the name suggests,

those are encoder and decoder.

If your data is denoted by x, then you can encode x, maybe images,

cat images, into a hidden representation encoder of x.

So that you can then decode it backwards with the decoder into the original

representation.

The mathematical objective here is again, weird.

You want to compress image and decompress it backwards.

So that the decompressed image is as lossless as possible.

It kind of, it resembles the initial image in the sense of

minimizing the pixel res MSE error, for example.

It means squared error, to be accurate.

Now this is immediately useful when you want to compress the data.

But this representation that you learn is also very useful if you want to

apply classification or regression methods on top of it.

For example, you could take image raw pixels, and you probably know that when

most cases, for example gradient boosting is useless when applied to raw pixels.

But instead, you can feed it, not with the raw pixels, but

with this hidden representation that you found without encoders.

Well, this is all nice and good, but, in fact, you've already learned some kind of

autoencoder if you've studied even the basic topics of machine learning.

Because you've probably already know such things like a simple component analysis,

or singular value decomposition or maybe non negative matrix factorization.

In fact, those are all familiar to you if you use scikit-learn or caret.

But the general idea behind all those methods is they take a large matrix,

this is usually an object feature matrix of your dataset, your pixels go here.

You chose a particular image,

you try to represent this matrix as a product of two or more matrices.

For example, it would be matrix that maps your data, your full roll,

into some hidden representation.

And a second matrix that maps your hidden representation back into

the original pixel-wise representation.

You try to learn a couple or more matrices depending on your method,

to minimize some kind of reconstruction error.

For a singular value decomposition, one way to do so

is to minimize the mean squared error between your original matrix and

the product of two kind of substitute matrices.

Look at this matrix position thingy differently.

One way to rewrite it is a process that first takes your data and

kind of compresses it.

Here's the encoder part, compresses it linearly to a hidden representation.

And the second part then becomes the decoder,

that takes your hidden representation and

converts it backwards into pixels or whatever the form the data was in.

To minimize the mean squared error between what was fed into the network and

what emerged from it.

Now, one initial way to expand this is we usually do with neural networks is to

pretend that linear compression, linear decompression is somehow insufficient for

us and make it nonlinear with address, of course.

Now you think your encoder, instead of having a linear transformation,

stick in a few dense layers or maybe other layers that you've learned about.

Maybe with some dropout or whatever fancy names you remember.

And then your autoencoder becomes nonlinear.

And as we probably know or believe since the last two weeks,

non-linear presentations can be more powerful in terms of they can learn

more abstract features there.

And the question is, imagine your data format is not

just arbitrary set of features, but an image.

So there's three channels, RGB, with, say, a 100 by 100 pixel grid.

Is there maybe some particular architecture that you can use to compress

the data and decompress it thereafter?

So that your features are having some nice properties that are desirable for

images, like being able to transfer the same feature one meter to the right and

still have this feature recognized.

Yes, right, one way to deal with it is to use the convolutional layers, or

convolutional architecture in general.

So, on the slide we have this super small one-layer, one convolution,

one pooling architecture.

But you could, of course, use a lot of stacked convolutions and poolings, or

maybe some residual layers or inception models, whatever you prefer for

a particular problem.

The general idea is that anything that maps your

input into hidden representation, and anything that maps it backward to

the original presentation from the hidden one fits as the model of autoencoder.

Provided it's differentiable, of course.

So if it's that easy, you can even deal without dense layers at all.

So you can take maybe convolutional encoder and

then go straight to the convolutional decoder.

This way your hidden representation is a small image-like format.

[MUSIC]