So I'm starting with another copy of the last slide from the end of the previous

lecture, the hill climbing lecture. And what we said in that slide was that

the big idea, the two big ideas, of this new method for placement with random hill

climbing were that there was this temperature control parameter that we

started at a very hot value. And we cooled very slowly over the course

of many, many, many swaps, random swaps, that we're going to try to drive the

placement process forward. And that this temperature parameter was a

hill climbing control parameter and when the temperature was hot, we were more

tolerant of. Of changes in the gate locations that

made the placement wire length worse. And when the temperature control

parameter was cold, we were not so tolerant.

And what tolerant meant was that we actually used some random numbers in some

interesting ways to calculate some probabilities to decide whether we should

or should not keep a gate swap that made things worse.

This is an instance of a very famous general optimization method that has a

very famous name. This is an algorithm called Simulated

Annealing and so in this lecture, we are going to talk about.

Simulated Annealing in the context of placement, but we're also going to give

you a little background about why this thing has this name.

So let's let's just talk about a little context here.

So here's a physical analogy for you. Suppose you want to make a perfect

crystal. What does perfect mean for a crystal?

It means that all the atoms are lined up in their crystal lattice sites.

And that's actually the lowest energy state.

That's what the atoms really want to do. So I've got a little example over here of

a crystal that has lousy order. It is a disordered crystal.

And I've got basically a little 8 by 8 grid of, of, pink circles, that,

Are trying to look like atoms. And then I've got four little red atoms

that are in sort of strange locations. They're clearly not where they want to

be. This is disordered as a higher energy.

How do you take this physical system and actually make it a really good high

quality crystal, a low energy crystal, a crystal where all the atoms are on their

lattice sites? The answer is you get the material really

really hot. So the atoms have the energy to move

around even to bad places which is to say, places not on their lattice crystal

sites. And then you cool it very, very slowly so

the atoms settle into good locations. And the idea is that when there's a lot

of energy, the atoms can move around even to bad places and when it's not hot, the

atoms have to sort of stay where they found.

And by getting it really hot and cooling it really slowly, you actually encourage

it to go into this, from this highly disordered state to an ordered state.

This process has a name, it's called annealing.

So, you anneal it, you get it really hot and you cool it really slowly and

hopefully you get a crystal with, well it's probably not going to be perfect and

it's probably not going to be minimum but it's going to be really good.

It's going to have very few dislocations in the crystal lattice.

So that's physically how you do that. What if you actually wanted to write a

computer program to simulate that thing? What kind of physics do you need to know

to make that actually work. Well, you would have to have some kind of

a data structure that would represent the crystal lattice.

And you know, that could just be a bunch of atoms with x and y or x,y and x

coordinates. I'm just going to draw it in two

dimenstions here. And you would have, just like our placer

had. An inner loop where you would randomly

grab two gates and swap their locations and evaluate the change in a number that

mattered to you, and that change was the wirelength.

For the physical system, you grab an atom and you move it someplace, that's what

I'm showing here. You grab the blue atom and you move it

someplace, and the thing you calculate is the energy, then in particular you

calculate the change in energy, because that's what the crystal cares about.

And you also need to know how hot it is, because whether or not this is a good

thing to do, whether or not this is a likely thing to do, depends on how hot it

is. And so let's ask a question.

To model a real physical crystal, where we grab an atom, and we move it to a new

location and we calculate the change in energy, delta e.

What is the probability that that actually should happen in my computer

program? And the answer is 1.

If delta e goes down, if delta e is smaller, if this change in the atomic

location makes the energy better. You should keep it.

That's a good thing to do. That's a physically reali-, reasonable

thing to do. What if, to model a real physical

crystal, what if the energy goes up? What if you grab the atom and you move it

to a worse place in terms of the overall crystal lattice organization?

What if delta e is bigger than zero? Now, in your computer program, what's the

probability that you ought to, you know, make that happen?

And interestingly, the answer is e of the minus delta e over kt.

That's the physics of the real universe of taking that atom and putting it

someplace where the energy got locally, microscopically at the atomic level, a

little worse. Delta e is the change in the energy.

T is the temperature, and k amazingly enough, is the Boltzmann constant.

In real physics the units all have to line up.

If you probably took a physics class in your life you probably got yelled at by

some professor, lecturer, instructor or teaching assistant, because your units

didn't add up. You know, you can't take newtons and add

meters. It's just not okay, and similarly here

the thing that's in the exponent has to be without Units and so the thing on the

top and the thing on the bottom. The nits all have to cancel.

And so if the thing at the top is energy. And the thing on the bottom is

temperature there has to be something that links the way we measure temperature

to the way we measure energy. For this to get the real probability.

That the real atom, makes this real physical move and amazingly enough,

that's a famous number. That's a famous physical constant.

That's the Boltzmann constant and that's Ludwig Boltzmann over there on the right,

a picture from Wikipedia. The Boltzmann constant, in real physics.

Converts the units of temperature to the units of energy.

So what, you know, Ludwig was doing back there in the 19th century, was pioneering

some of the initial mathematics of statistical thermodynamics to be able to

sort of write these equations about what happens with what probabilities with what

energies. This is a long time ago.

Whats amazing is this e to the minus energy over temperature thing is the sort

of the foundational model how the probabilities works.

Now, I caution you that you don't need a k when you're doing a placement.

Because you don't want a placement. Nobody cares what the units of wire

length and temperature are. So as far as you're concerned, pretend

the temperature is measured in meters. Right?

It doesn't matter. It's all artificial.

It just has to be a probability. It has to be a number between 0 and 1.

We tune the parameters so that everything works.

But if you're doing real physics, with real atoms and real crystals and you want

real probabilities with real energies, the units all have to work and that's

where Ludwig comes in. This new method is called simulated

annealing. It's a very famous general optimization

method. You can optimize lots of different things

with it. It is, however, widely used in VLSI CAD.

It was invented at IBM in the early 1980s by Kirkpatrick, Gelatt, and Vecchi, from

this very famous paper from Science Magazine.

Which is a very interesting paper to read, I recommend it, if you can get

yourself a copy. and in fact some of the earliest examples

that our friends at IBM, in fact I know Scott pretty well used were actually

electronic design examples, ship design examples.

They were actually designing parts of IBM computers back there in the early 1980s,

and getting fabulously good results. And it's this idea of annealing something

getting it really hot. Allowing both downhill changes that make

things better. And uphill changes that make things

worse, with a probability controlled by the temperature.

But applying this in the form of a, of software, of a computer program, to

things that are not physical systems. Things like gates moving around on the

surface of chips. So, here's some pseudo-code for a

simulated annealing placer and it looks kind of complicated.

And what I want to say is that we're going to read this sort of from the

inside out to convince you that it's actually something very close to what you

already know. So, ignore the stuff with color behind

it. Ignore the yellow stuff and ignore the

kind of pink brown stuff, and let's just start with the simple white parts here,

and so up at the top it says that start with a random initial placement.

Okay, that's just like the greedy random iterative improvement stuff that you

know. And then it's got this part here that

says for s equals one to m times the number of gates, m is just how many swaps

we're going to do per gate, so pick a number.

1000. I'm going to do 1000 times the number of

gates swaps and that's going to be my placement.

And then look at the code inside the for loop, swap 2 random gates gi and gj.

Calculate delta l, the change of the wire length if delta l is less than 0, then

accept this swap, else. And then ignore the stuff with the brown

background, else undo this swap, okay. Well, great that's just the greedy

placer. What happens now when we add the,

simulated annealing stuff, is that, the first thing we do, is we add this inner

loop stuff that says, if the wire length went up, if delta l is greater than zero

than we generate a uniform random number r.

That's this thing here, and we calculate that probability p which is e to the

minus delta l over t. And we ask if r is less than p.

And if that's true then we accept this uphill swap.

It's a new worse placement but our hill climbing control parameter key says it's

okay to do. Alright, so that's the probabilistic swap

acceptance, and then the yellow stuff, is this new outer loop, this annealing

temperature control loop. And that says that you start with the

temperature being hot. And then you have this, this basically

this Boolean right, just a flag, that says frozen is false, and it says while

not frozen, do, we do a whole bunch of gate swaps at this temperature.

And then at the end it says, if the overall wire length is still decreasing

over the last few temperatures, then okay we're still annealing, what do you do?

You make the temperature a little cooler because as the annealing runs, you have

to make the temperature control go down so you don't keep taking all the big

uphill moves. So this is an arbitrary number.

T is 0.9T. We cool it.

And if it's not the case, then, if it is the case that the total sum of the wire

length has stopped changing. So let's say it just, you know, it hasn't

changed more than, you know, 1 100th of a percent over the last five temperatures.

Okay, we're done, right? I mean, we're just not going to find

anymore improvement in the wire length. Then we set frozen as true, we drop out

of the loop and we return the final placement as the best solution.

So, it's not that much different, it's just the temperature outer loop, the

yellow stuff in this diagram, and the brown-pink stuff.

The, you know? e to the minus delta, l over t.

Let me go uphill with a certain probability stuff that's really new here.

This probabilistic acceptance of swap stuff is really famous.

So this little snippet of code. If uniform random less than exp delta l

over t. Then accept the swap else undo the swap.

This little piece of code is sort of a version of a very famous idea.

So this is the idea that randomly accepting a perturbation of a system with

a specific probability will correctly simulate the physics of that real system.

This has a name. This is called the Metropolis Criterion.

And it's named after this guy, Nicholas Metropolis, Nick Metropolis, a very

famous guy. what was Nick doing when he was sort of

doing this? Nick was working with some very famous

people like John Von Neuman and Stanislaw Ulam, who's a very famous mathematical

physicist. Way back at the dawn of atomic physics in

the 1950's, when people had computers the size of football fields built out of

thousands of vacuum tubes. And they were trying to write the first

computer, some of the first computer programs, to simulate how the physics of

things, you know, with atoms doing interesting things, worked.

And these are the guys who invented the first Monte Carlo simulations, if you've

ever heard that word. And this sort of inner loop thing that is

generating particular probability distributions, checking that the

perturbation you make either does or doesn't fit within some probabilistic

acceptance criterion. That's really famous and so when you see

lots of computer. algorithms that are based on random sorts

of acceptance methods for how things make progress.

You'll often see metropolis the name show up.

Metropolis criterion, metropolis algorithms.

Lots of interesting places we use this idea.

This is really famous. How does this metropolis criterion make

our overall algorithm behave? and the answer is, in some very unusual

ways. In some ways that you've probably haven't

experienced before. So it really is random.

You might accept the swap. You might not accept the swap.

It depends on the random number you generate.

And the way to maybe explain that is suppose the temperature's 10 and delta l

is 20, plus 20. So either the minus delta l over t.

Is either the minus 2. Its about 0.14.

So this is about a 14% probability you accept the swap.

You know, what does that mean? So here's a, let me give you a concrete

example of what that means. Supposed this temperature you're trying a

really large number of swaps, in the, in the, in the, in the placement.

So, I mean, suppose you're trying a million swaps it this temperature.

And suppose it turns out that 5,000 of those different swaps all happen to

evaluate that they change the wire length by delta l is plus 20.

Then roughly 0.14 of those swaps that you try that all have a wire length

degradation of 20. Approximately 14% of them are 0.14 x 5000

equals 700 of them will be accepted. A random 700 of those 5000 which all

change the placement in exactly the same way.

They all make the placement 20 units of wire length longer.

About 700 of them will get accepted and about 4,300 of them won't.

And if you change the random number seed so that you get a different stream of

random numbers and you run your placer again.

You will get a different set of placement swaps proposed, and let's assume that you

again try about 5,000 of them that have delta l as 20.

You will still expect about 700 of them to be accepted, but it'll be a completely

different set of swaps with a completely different set of results.

It's a very different universe when you have probabilistic algorithms like this,

and the thing that's amazing is how well this stuff works.

So, how well does it work? And the answer is, it works great.

There is nothing else to say other than it works great.

So this is the same 2,500 gate benchmark, this is an annealer that looks very much

like the annealer pseudo-code that I showed you a couple slides ago.

The hot temperature is 800, the m number for how many moves per gate, how many

eval, swaps per gate, is 100. So we try 2500 times 100.

which is what, 250,000? swaps per temperature.

And then we lower the temperature this is the curve of the progress.

So the first thing I'll just say is we got about 30% less half-perimeter

wirelength total than the random iterative improvement algorithm, and

that's a gigantic number. That's a very big deal.

So let me tell you how you read an annealing curve, because this is a pretty

standard looking annealing curve. The x axis is temperature.

Okay? And the thing that's interesting is that,

it's a log scale. So what you see is the temperature is

0.1, 1, 10, 100, 1000, going from left to right.

And the other thing to remember, right? Is that you read the curve from right to

left. Because the temperature starts hot.