but this is also a canonical way in which heaps are used. And in this application it

will be natural to call a heap by a synonymous name, a priority queue. So what

I want you to think about for this example is that you've been tasked with writing

software that performs a simulation of the physical world. So you might pretend, for

example, that you're helping write a video game which is for basketball. Now why

would a heap come up in a simulation context? Well, the objects in this

application are going to be events records. So an event might be for example

that the ball will reach the hoop at a particular time and that would be because

a player shot it a couple of seconds ago. When if for example the ball hits the rim,

that could trigger another event to be scheduled for the near future which is

that a couple players are going to vie for the rebound. That event in turn could

trigger the scheduling of another event, which is one of these players? commits, an

over the back foul on the other one and knocks them to the ground. That in turn

could trigger another event which is the player that got knocked on the ground gets

up and argues that a foul call, and so on. So when you have event records like this,

there's a very natural key, which is just the timestamp, the time at which this

event in the future is scheduled to occur. Now clearly a problem which has to get

solved over and over and over again in this kind of simulation is you have to

figure out what's the next event that's going to occur. You have to know what

other events to schedule; you have to update the screen and so on. So that's a

minimum computation. So a very silly thing you could do is just maintain an unordered

list of all of the events that have ever been scheduled and do a linear path

through them and compute the minimum. But you're gonna be computing minimums over

and over and over again, so again that light bulb should go on. And you could say

maybe a heap is just what I need for this problem. And indeed it is. So, if you're

storing these event records in a heap. With the key being the time stamps then

when you extract them in the hands for you on a silver platter using logarithmic time

exactly which algorithm is going to occur next. So let's move on to a less obvious

application of heaps, which is a problem I'm going to call median maintenance. The

way this is gonna work is that you and I are gonna play a little game. So on my

side, what I'm going to do is I'm going to pass you index cards, one at a time, where

there's a number written on each index card. Your responsibility is to tell me at

each time step the median of the number that I've passed you so far. So, after

I've given you the first eleven numbers you should tell me as quickly as possible

the sixth smallest after I've given you thirteen numbers you should tell me the

seventh smallest and so on. Moreover, we know how to compute the median in linear

time but the last thing I want is for you to be doing a linear time computation

every single time step. [inaudible] I only give you one new number? Do you really

have to do linear time just to re-compute the median? If I just gave you one new

number. So to make sure that you don't run a linear time selection algorithm every

time I give you one new number, I'm going to put a budget on the amount of time that

you can use on each time step to tell me the median. And it's going to be

logarithmic in the number of numbers I've passed you so far. So I encourage you to

pause the video at this point and spend some time thinking about how you would

solve this problem. Alright, so hopefully you've thoug ht about this problem a

little bit. So let me give you a hint. What if you use two heaps, do you see a

good way to solve this problem then. Alright, so let me show you a solution to

this problem that makes use of two heaps. The first heap we'll call H low. This

equal supports extract max. Remember we discussed that a heap, you could pick

whether it supports extract min or extract max. You don't get both, but you can get

either one, it doesn't matter. And then we'll have another heap H high which

supports extract min. And the key idea is to maintain the invariant that the

smallest half of the numbers that you've seen so far are all in the low heap. And

the largest half of the numbers that you've seen so far are all in the high

heap. So, for example, after you've seen the first ten elements, the smallest five

of them should reside in H low, and the biggest five of them should reside in H

high. After you've seen twenty elements then the bottom ten, the smallest ten,

should, should reside in H low, and the largest ten should reside in H high. If

you've seen an odd number, either one can be bigger, it doesn't matter. So if you

have 21 you have the smallest ten in the one and the biggest eleven in the other,

or vice-versa. It's not, not important. Now given this key idea of splitting the

elements in half, according to the two heaps. You need two realizations, which

I'll leave for you to check. So first of all, you have to prove you can actually

maintain this invariant with only O of log I work in step I. Second of all, you have

to realize this invariant allows you to solve the desired problem. So let me just

quickly talk through both of these points, and then you can think about it in more

detail, on your own time. So let's start with the first one. How can we maintain

this invariant, using only log I work and time step I, and this is a little tricky.

So let's suppose we've already processed the first twenty numbers, and the smallest

ten of them we've all worked hard to, to put only in H low. And the biggest ten of

th ''em we've worked hard to put only in H high. Now, here's a preliminary

observation. What's true, so what do we know about the maximum element in h low?

Well these are the tenth smallest overall and the maximum then is the biggest of the

tenth smallest. So that would be a tenth order statistic, so the tenth order

overall. Now what about in the, the hi key? What s its minimum value? Well those

are the biggest ten values. So the minimum of, of the ten biggest values would be the

eleventh order statistic. Okay, so the maximum of H low is the tenth order

statistic. The minimum of H high Is the [inaudible] statistic, they're right next

to each other; these are in fact the two medians Right now So When this new element

comes in, the twenty-first element comes in, we need to know which heap to insert

it into and well it just, if it's smaller than the tenth order statistic then it's

still gonna be in the bottom, then it's in the bottom half of the elements and needs

to go in the low heap. If it's bigger than the eleventh order statistic, if it's

bigger than the minimum value of the high heap then that's where it belongs, in the

high heap. If it's wedged in between the tenth and eleventh order of statistics, it

doesn't matter. We can put it in either one. This is the new median anyways. Now,

we're not done yet with this first point, because there's a problem with potential

imbalance. So imagine that the twenty-first element comes up and it's

less than the maximum of the low heap, so we stick it in the low heap and now that

has a population of eleven. And now imagine the twenty-second number comes up

and that again is less than the maximum element in the low heap, so again we have

to insert it in the low heap. Now we have twelve elements in the low heap, but we

only have ten in the right heap. So we don't have a 50. 50, 50 split of the

numbers but we could easily re-balance we just extract the max from the low heap and

we insert it into the high heap. And boom. Now they both have eleven, and the low

heap has the smallest el even, and the high heap has the biggest eleven. So

that's how you maintain, the invariant that you have this 50/50 split in terms of

the small and the high, and between the two heaps. You check Where it lies with

respect to the max of the low heap and the mid of the high heap. You put it in the

appropriate place. And whenever you need to do some re-balancing, you do some

re-balancing. Now, this uses only a constant number of heap operations when a

new number shows up. So that's log I work. So now given this discussion, it's easy to

see the second point given that this invariant is true at each time step. How

do we compute the median? Well, it's going to be either the maximum of the low heap

and/or the minimum of the high heap depending on whether I is even or odd. If

it's even, both of those are medians. If I is odd, then it's just whichever heap has

one more element than the other one. So the final application we'll talk about in

detail in a different video. A video concerned with the running time of

Dijkstra's shortest path algorithm. But I do wanna mention it here as well just to

reiterate the point of how careful use of data structures can speed up algorithms.