And, if you remember from the previous videos,

it wasn't much of a difference but 400 neurons in two layers,

was the best performing model,

so, that's what we're using here.

Okay, and I'll just give it a name,

I'm calling it DLB for no particular good reason. Let's try building that.

And that's going to take a while,

so I'll come back and show you the results in a moment.

Okay, 17 and a half minutes later,

welcome back to my sad little world of watching

a progress bar inch slowly across the screen.

Let's see how it did. Took a while to build those 12 models.

What we get when we look at the grid iput,

let's move this over a bit actually, is,

we get to see what values of each parameter gave us what log loss.

So, just skimming through,

we can see the zeros,

for hidden dropout ratio, over in the middle.

We have a point four near the bottom and point fours near the top, not very helpful.

Input dropout ratio, on the other hand, is very interesting.

Our best four models all use zero,

our worst three models,

all use the highest value, point three.

It's really pointing to the idea that,

we want to use all of our 300 input columns.

What about the regularization parameters?

Same value top and bottom.

Higher values one times tend to the minus 5,

was the highest value were used for l2.

That's coming down the bottom, in the bottom half.

So, maybe a smaller value for l2,

but no real ideas for l1.

What we do notice,

is quite similar set of parameters, say, this one,

where they only differ by l1,

otherwise identical, give us a large difference between point 58 and point 35.

This one has the same l1,

l2 as this one,

so, we count point 36 to point 551.

What I'm sensing, is there's quite a bit of noise in

this log loss. But, let's push ahead.

So, I'm going to drop point six from the hidden dropout ratios,

drop point 23 and point three from input.

I'm going to make eight more models.

It's important I keep the same grid ID which have gone as DLB.

And as the comment says,

I've changed the seed,

so that if the random variation,

the random discrete does give us exactly the same set of parameters again,

we'll be able to compare and judge just how much that random noise element is.

But yes, all I've done on the second grid,

is remove some of the choices for hyper parameters,

I haven't added anything new.

So, let's give that one a run.

I'm estimating, I'll see you in about 12 minutes.

In case you ever see this happen,

the grid is telling me it's 100 percent,

but it's still building the last model.

So, if you see 100 percent and it seems to have hung,

just be patient for a couple of minutes.

Here's the results of a previous one of these grids,

and best one came out with zero hidden dropout ratio.

We've already established point two on the dropout ratios seems to be good,

with a smidgen of l1 and a smidgen of l2.

And there she goes.

So, yeah i can't do that in my head,

but it took quite a while, again.

So, we made 12 models initially and then we've made the eight models,

trying to narrow in.

You can see, of the additional eight, these were the best three.

We can tell that by the seed or we can look at the model number.

Because the model ID is the grid ID,

underline model and then a sequential number.

So, comparing this to the previous run,

we have actually got the same l1,

l2 values at the top of the grid.

Anyway, this is our best model,

if you want to extract it,

let's move this back over, okay.

We're going to run this command to extract it.

So, our grid contains the model IDs as a list,

it's already sorted, so I grab the first one.

And then I pass that to h2o getModel.

And then I can save that model,

and it's been saved with that filename.

We can also, evaluate it, i mean,

on the valid data set and then on the test data set.

16.6 percent error on test,

16.5 percent on validation.

Validation and test errors are roughly the same, which is a good sign.