Welcome to Workshop 9.

This is the first workshop of the course on solving algorithms.

Okay. Workshop 9 is about preparing Medicinal herbs.

We saw in chapter four how Shannon was able to resolve the curse issue in the village.

And, then no more new kids there would become ill again.

But, then, there are already a whole bunch of boys and girls. They are already sick.

So, Shannon will have to prepare some herbs to heal these children.

There are different kinds of herbs,

including Ginseng, Poria, Angelica, and so on.

And, then there are different steps for preparing these herbs, as well.

Steps including cutting, soaking,

brushing, bleaching, and so on.

There are different steps for preparing

different herbs and the ordering could be important.

For example, for preparing Ginseng,

it could be cut only after it is bleached.

And then, also, the different steps,

they might have to utilize certain tools.

We would have a pottery knife.

We will have iron knife,

and a gold knife as well.

But, then some of these steps,

they require the same tools.

And, in that case those tasks which are sharing the same tool, they cannot overlap.

So, what's the objective?

What is the goal of Shannon?

Of course, Shannon wants to get it all over and done with as quickly as possible, right?

Certainly.

We want to get all the tasks finished,

which is an often usual objective we have when we are doing scheduling kind of problems.

So, should we look at the MiniZinc model?

Yes, let's have a look at the MiniZinc model.

So, if we open the project file attached to this workshop,

you should pop up a window like this.

We can look through our MiniZinc model.

So, here's the data, of course.

We've got int as the number of tasks, and,

for each task, a duration.

We've got n, which is the number of precedences.

And, there's only two arrays here,

the pre and the post array.

And, we basically requiring that the pre-task finishes before the post-task starts.

So, that's the standard way of specifying precedences.

Exactly. We've seen that in plenty of other examples in this course.

Then, we've got a number of disjunctions, right?

So, we've got o disjunctions,

and each of these distinctions is a set of tasks,

and none of those tasks can overlap.

So, we have an array there and each array element is a set.

That's right.

A set of task, and all the elements of this set,

they cannot overlap with each other.

Exactly.

Right.

All right. Then the principal decisions are these usual,

and the scheduling problem is,

when do we start this task, right?

So, that's the start time.

We've also got another version of those nx decisions here, the end time.

So, this is the start time plus the duration.

Obviously, these are tightly tied.

If I fix all the start times,

I'm obviously going to fix the the end times,

because the durations are known.

But, we can also fix the end times,

and then we'll fix all the start times.

So, that means we only have to decide on one set of this decision variables.

Exactly.

All right. So, let's look at our model of the problem.

So, here's our precedent's constraint, that's pretty straightforward.

We're just saying the end time of

the pre-task is less than or equal to the start of the post-task.

That clearly means that the first task finishes before the second task begins.

Then we're going to do something interesting.

We're going to introduce some new decision variables, b.

This is between pairs of tasks.

This is going to keep track of the relative ordering of pairs of tasks.

But, now, we only care about pairs of task which are in disjunction.

We only care about the relative ordering.

So, which one's before which one,

in the case if they can't be at the same time.

Okay.

All right. So, the first part of this here is

basically setting all of the b's that we don't care about to zero,

because we're never going to worry about them.

So, we don't care about a b If we have two task, t1 and t2.

If t1's [inaudible] to t2,

we're not care about it because we'll always think of the path,

the task per IJ in the order I less than J.

All right? So, not the other around, not JI.

Okay. So, the reason why we need to have this constraint is

because we are defining a big two-dimensional array of Bs?

Yeah.

And, for every payoff task.

But then, if the two tasks,

they are not within a distinctive relationship, then we don't care.

So, we would like to set those Bs, are to be zero.

Yes.

And also, for every pair,

we have two variables.

So, we also want to get rid of half of them.

So, this gets rid of a half we don't care about.

And, this other test here is,

if there's no disjunction where these two tasks appear,

so they can overlap with each other in any way,

then we're just going to set those Boolean to zero because, we don't care about.

Okay.

All right. So, the real purpose of these Booleans is for this constraint here.

So, we're looking through every one of our disjunctive sets.

We're taking any two tasks from

that disjunctive set where t1 is less than t2 because we always look in that order.

And, we say this Boolean, t1,

t2 is true, means that task 1 finishes before task 2 begins.

And, since there in disjunction,

if that's not the case,

then it has to be the other way around.

If it's false, it means the task two finishes before task one begins.

Okay. So in some sense,

this Boolean variable is used to indicate

whether a task should finish before the other one.

Exactly. The task that we care about,

which are the one in disjunction. All right.

Finally, we're looking at the makespan,

which is just the maximum n time.

Then, we're going to minimize that makespan, all right.

I can see that you have already prepare a whole bunch of this.

Okay, so we've prepared, because we're going to have to,

well, we've got three choices for variables to put here.

All right. We can take start times,

or end times, or the Boolean variables.

Wow.

And, we have four choices to put for val selection.

Let's say, four choices to put for value selection.

So, input order is?

Input order is just we try them in the order they appear in their own.

And, first fail is to select a variable with the smallest domain.

At the current time.

Yes, at the current time.

And, smallest, remember, looks for the variable which

has the smallest current value left over in its current domain.

And, largest is just the opposite.

Exactly.

Indomain min is setting to its minimum value, its maximum value.

The median is in the middle.

It's not likely to be very useful for scheduling.

And, random, which is seems like a totally bad idea,

but it's not often not as bad as you think.

All right.

So, let's just start off with a very straightforward input order.

We're just going to fix the variables in order, indomain min.

Almost a default kind of search strategy,

if you don't think about anything at all.

And, let's start with our simplest example here and run our model.

So, notice one thing,

we've got the statistics coming out here.

We've gone into our configuration,

we've actually turned on statistics for solving.

Also, because later on we'll run some examples which take a while,

we've added the solver flags and we've got a time limit of

6,000 milliseconds or six seconds.

Just so we don't spend a lot of time looking at the screen.

It really depends on how much time you have for your experiments.

Exactly. All right.

So, we can see that we found a solution in 12 values.

Fairly quickly.

Let's try, so indomain min is sort of a obvious thing because we're trying to minimize.

What if we do indomain max.

Yes.

Let's have a try and see what happens.

So, then if you sort flash parts,

but we found a lot more solutions on our way to the optimal.

And, if you look at the phase,

let's say, 39 versus the previous one was 12.

A lot larger.

Exactly. So, really, indomain min was much better than that, right?

What about if we try

indomain medians and we're going to set it to a middle value. All right.

So, if you'd see we got a few answers,

48 failures, which is worst in fact.

So, really, sitting in the middle of

a range in a scheduling problem is pretty bad, right?

Because you're not going to learn very much from that.

Let's try random as well.

Let's try random. Of course,

with random we could be super lucky.

Also 23, it's actually not too bad.

It was better than median and better than max,

which was the opposite,

and not as good as min.

Min is still the best.

Yeah. Well, because we're trying to minimize makes.

Exactly.

Exactly. What happens here if we go back to min,

and let's try the end time variables.

All right, so, you see,

the same number of failures in this case.

We basically have, if we try switch between them, basically going to get.

It's the same in this case.

Basically, almost the same, the same time, almost everything the same.

Actually, exactly the same.

Interesting.

Can you try larger examples?

Yes, we should probably try a larger example.

You can't learn everything from just looking at one example, that's for sure.

So, let's look at the second example.

So here we don't start time.

Right.

Seven failures, it went straight to the optimal.

Let's try end time.

Good, come back.

Exactly the same.

Well, it's because we're doing input order.

Right.

When you're doing input order,

the start time and the end time are basically the same thing.

Yes.

So, that's not surprising at all.

So, let's try something a bit cleverer. Let's say.

I mean, how would a human being do when he or she is trying to schedule such task?

Yes.

Well, perhaps we could try smallest,

because we would like to select the variable with

the smallest remaining value in a domain of the variable.

All right. So, if we do the start time,

we pick the smallest start time,

we set it to its minimum value.

This is a very standard scheduling example,

where we just try to make things happen as early as possible,

starting from the beginning of time.

All right. So, we find a solution in six failures.

Of course, we're doing very small examples here.

If we do the end time now,

we should get a difference, right?

Because the smallest end time is not necessarily the same as the smallest start time.

All right. So, we do see difference, right?

32 failures versus 15.

Let's go to a slightly bigger example.

So, here's example three, that's using end time.

But, only five failures is this.

So, it did very well. Maybe once I've looked,

what happens when we do start time?

So, six five. There you can see,

actually, the time. The end time was better.

All right. Of course,

end time is finally what we're trying to optimize.

So, in that sense, it's not a bad idea to minimize on end time as well.

Right.

All right.

We have not tried the B variables, the Booleans.

Ok. Right.

We should definitely try the B variables.

All right. Because it's,

It's a two dimensional array.

So, let's just make it one dimensional to fit in the in search.

And, now, smallest doesn't really make sense for Boolean variables, right?

The smallest value will be zero for the Boolean variables.

There's just zero.

Unless it's a fixed.

There aren't really fixed zero,

fix to one, in which case we're not going to do anything.

First fail doesn't make sense either.

First fail doesn't make sense, either.

So, basically, input order is about all we can do.

Now, we'll see in the next week

some more interesting variable selection policies

which we will be able to differentiate between Booleans.

All right. But let's run this from the Booleans. All right.

Nine failures. So, how does that compare?

Not as good as the end time which is six failures, or the five failures.

Right.

Yes. Let's have a look at another example.

All right. So, there we're now on example four.

We had four failures.

Now, actually, I want to keep track.

So, basically, that will be the in search one.

Let's do the other.

Our current based search strategy is smallest, all right.

And then, indomain min.

Complete.

So, we basically want to compare these two.

So, if we compare that one, you can see.

Wow.

Okay. So, now.

Time out?

Yes.

Time out. Operating system error.

[LAUGHTER] And, you can see - Okay, it's found the solution at 31.

What happened when we used the Booleans?

Well, we found the solution at 31 in point seven seconds.

Marvellous.

Right. So, this one actually found the best solution, but it couldn't prove it.

And, it spent a lot of time modifying.

And, it's trying to prove optimality,

but it wasn't able to, right?

Let's have a look at a different example.

Example five. Off we're going to go.

Time out again.

Time out again.

Okay. And, we didn't prove optimality. There's no double line there.

But, if we can flip back to the Boolean search that we're doing. What happens?

Wow.

Create optimality.

Within a second.

Yes, within a second.

Interestingly saying, even though this Boolean search really now it's nothing, right?

Because basically we're just picking the input orders and sending them to false.

We know nothing about what's going on.

We're using no clever dynamic things about what's happening in the server.

And, there's not according to any of the scheduling knowledge that we have.

No. Exactly. It's still winning.

Okay. But let's try one of the bigger examples.

Okay. So, let's try number 11.

Okay. So, you can see we find plenty of solutions.

Number 11. We're going to hit time out presumably.

Stuck it to ten forty eight.

We lost out statistics.

Let's try the other one.

That was on the Boolean,

so we've got to make span at ten forty eight in six seconds.

What if we run our scheduling example?

You can immediately notice, right,

before we even hit the time out,

that we've got way better best solutions, right.

Our best solution we found at six forty two,

which is way, way better.

Two times better.

Than the one that we got out of the Booleans.

And, this is what we're seeing - there's a kind of contrast here.

There's two things that we can be aiming for.

If we're aiming to get the best solution quickly,

then actually this standard scheduling approach,

smallest variable, sort of pack the mean,

you see very good solutions quickly.

If you're aiming to prove optimality,

actually the search space for optimality is much,

much smaller if we search on the Booleans.

All right. So, it depends on the size of your problem.

And, of course and actually,

we can actually build a much more complicated search series.

But, you have to do it directly in the solver.

We can't do it through MiniZinc at present, where we actually combine these two things.

We'd want to be searching on the B variables,

but searching in order to find the small values as quickly as possible.

And, that's how modern scheduling approaches are actually done.

That's surprising. But, I suppose our learners,

they should try all the experiments.

So, using three types of variables to search on,

four kinds of variables selection,

and then four kinds of values selections altogether for the eight different experiments.

Yes. All right.

You need to do some scripting if you really want to do this fully.

And, indeed you may find you can do better.

That's definitely the case if you put the end time there.

That can be sometimes very competitive,

the start time. All right.

But, remember we were doing this on this model.

We haven't improved model, right?

Because, now, we remember about global constraints.

This non overlap constraint is exactly a disjunctive constraint.

Why didn't we model the model this way?

There's be more natural way once we know about the disjunctive.

Exactly.

Right, so, we look through all of that disjunctive sets.

We just have all the start time for

all the tasks in that disjunctive set and all the durations.

And, then we just called disjunctive.

But, we do not have to Boolean variables anymore.

We do not have the Boolean variables anymore.

So, is this better?

Well, let's run on the same data poll.

So, remember our - So, we've got the search space here is the standard search strategy,

right, with S smalls in domain min.

So, it's the same when we were doing before.

Right, we're actually not going to get as far,

on this particular example.

So, we got the 682.

This is before we actually got to 642, I think.

So, for that particular example didn't work.

Now, when we hoped the disjunctive actually understands more about the problem, right?

If we go back to an earlier one,

and run that, so we can see that's number five.

We've got seven failures, point 115 seconds.

If we run prepare,

and just check we're doing the right search.

Okay, so, you can see our number five was really a bad example for our search.

It got trapped. We got stuck.

Found 39, which is probably the optimal,

but it couldn't prove it, versus the disjunctive.

So, you can see the power of the global in action.

So, in general, the global constraint should be able to

help solve the problem a lot more efficiently.

Exactly.

Although sometimes we can be lucky if we just look at one example.

Sure.

We may not get a better first solution or the better solution in a particular time.

All right.

So, if we look back over the kind of questions that we meant to answer,

a kind of parted answers will say is,

okay, what's the best thing for finding a first solution?

Well, basically, you can go in any order,

you'll find the first directly.

Particularly when you have the disjunctive constraint in.

What's the most complete?

So, obviously, searching on the Boolean variables gave us the most complete search.

Because it can prove optimality, in most cases.

Yes. It's basically the search space becomes smaller.

Because, actually, all we need to know to solve

this problem is the relative order of these disjunctive tasks.

Once we know that, then the rest is very simple to work out.

So, search space have many fails to find

the optimal solution again goes to the Boolean variables, right?

Because it just makes the search space so much smaller.

But, when we want to find the first solution,

how good the first solution is, the Booleans are terrible.

They have no idea about the objective, right?

Whereas, our smallest very quickly gets a good solution.

Often the best solution was first.

Because the boolean variable is only deciding on how they not overlap, right?

Exactly. It doesn't know anything about the objective at all.

Right.

And, if we look at robustness like,

how good a solution we got in a fix time,

so we were doing six seconds.

Then you can see again that was

this standard scheduling approach which was much better than the Boolean approach.

Smallest and in domain min.

Yes.

All right. So, that brings us the end of this workshop.