0:02

Welcome back. Today, we're going to look at Priority Queues which is a variant of

sorting that generalizes the idea to provide more flexible data structure that

we can use for all sorts of applications. To get started, we'll look at the API and

some elementary implementations. So at a week or so ago, we looked at collections

in Java and the idea of elementary data structures where we insert and delete

items. And then, the data structures differ on the basis of which item we choose

to delete. We looked to the push down stack where we removed the item that was

most recently added, And the queue where we remove the item that was least recently

added. And then, we talked about randomized queue or bag where we might

remove a random or an arbitrary item. Today, what we're going to talk about is

called the priority queue and for that, we insert items, but when it's time to

remove, we consider the items to have a total order and we want to remove the

largest or the smallest item. So this little example if we insert P, Q, and E

then when we do remove max, we want to get the Q out and for later, we insert X, A,

and M and then we removed max. The largest one in there is X. We'll insert P,

L, and E and then, after a while, we remove max P. So, that's our basic setup.

That's our definition of what a priority queue is. So, the API will look very

similar to our stack or queue API with a difference that we want to have generic

items that are comparable. So the Java language for that is in the class header.

We say that our generic type Key extends Comparable of Key. And that just means

that our keys must be Comparable and we must have a compareTo() method that will

compare it to another key. Otherwise we have a constructor and actually for some

applications, it's convenient to have a constructor that takes an array of keys as

argument. Then, an insert() that puts something in like push in a stack or

enqueue in a queue. Then delete the maximum. I referred to delete the minimum

just to avoid confusion, we have the implementation separate implementation

usually MinPQ, where we delete the minimum. Then test isEmpty() and we also

sometimes have extra method that just gives us the value of the largest key and

also size which is useful sometimes in collections. You might also want it to be

iterable but we'll skip that for now. There are lots and lots of applications of

priority queues. Even though it emerged as a data structure relatively late in the

game now that we see that there are many algorithms that are much easier to

implement when we think about the priority key abstraction. We have data that we are

processing we can't process it all at once. We have to save it some of way. And then,

what the priority queue tells us is - let's organize it in some ways that we are

always taking the best one to process next. And it turns out to be very close to

a generic algorithmic design technique that we will be looking at in many, many

different applications. Today, we're going to talk about event-driven simulation

which is an interesting idea that is based on priority queues but it's also used in

numerical computation and we'll see in algorithms for data compression and graph

searching that it's useful. And in many other applications in Computer Science and

in scientific computing. It generalizes the stack and the queue and gives us a data

structure that we can use to effectively design algorithm of all sorts. So here's

just a particular client that will help explain the idea. So our, our challenge is

let's say this is on the web we have billions of transactions, you know, and they

are streaming through our data warehouse or processor in some way. And just a very,

very huge number of transactions. In fact, we couldn't possibly hope to even store

them all. There's trillions of them there coming through as fast as possible. But

we're interested in the biggest ones and so maybe it's the biggest amount of

dollars, or the biggest cost, or whatever it might happen to be. And so we can pick

some numbers that we can store. I would like to, to store the, the thousand

biggest ones. So, you can imagine a credit card company looking for fraud - it's going

to care about keeping track of the largest transactions. So, maybe we can store

millions or thousands of them. So that's our parameter M - that's the number we can

afford to store but the total number of items we couldn't possibly afford to store

them. So and this is just some test data where we've got all, all these

transactions and so we are going to be able to take in data like this and again

an unbounded stream of data. But let's say, we want to keep track of the top five

[cough] values using the third column as the value. So we're going to look at a

client program called TopM that takes the command-line argument, how many and

this case, it's going to say five and then it's going to take from standard input the

transactions and it will print out the top five. So, that's a canonical example of a,

a priority queue client that we need to design a program that can do that. So,

with the priority queue abstraction that's not too difficult to do. So we are going to

use a min-oriented priority queue so that's going to keep, it'll [cough] it'll

be one where we can delete the minimum and, and it'll be generic so we'll have a

transaction type that holds this information including natural ordering

where it's ordered by dollars that last column. So, we'll build a new priority

queue, min priority queue or we'll have the capability to delete the minimum. And

then we read from standard input. We read the line, build the transaction from the

information on that line. And that will fill in the fields and then, we put that

transaction on the priority queue. Now, if the priority queue has more than M items

because we inserted that one, then we want to delete the smallest one there and that

way, we're keeping track of the largest M. Whenever we get a new one, then we

throw away the smallest one that's there. So, even with this huge stream of items

coming through, we're only keeping track of the M largest items and that's a fine

canonical client for priority queue. Now how we are going implement or solve this

problem or you can think of lots of ways to go ahead and solve this problem of

finding the largest M items in the stream of N items. So, for example, you could

sort them. And then just look at the M at the end but by sending up the problem, I

already kind of ruled that one out because we don't have the space to sort them all,

to store them all. So sorting is out of the question. We'll look at couple of

elementary priority queue implementations that are straightforward. You know, keep

the items like we would in the stack and then when we need to find the smallest or

the largest look at, look at them all. But that's going to be too slow. M is large

and N is huge, and M<i>N is going to be too slow. What would what we we'll look at is</i>

very simple and practical implementation using a data structure called the binary

heap that gets the job done in time proportional to N log M and only M space.

And that's pretty close to the best that we could do in theory and is very

important and useful, practical implementation and data structure.

Alright. So here's just an overview of two elementary implementations for priority

queues using the example operations that I gave before. So you can imagine keeping

the item, say, in a linked list or in a doubling array and just keeping just an

order just as we would in the, in the stack just keeping in the way that they

come in. And we'll put a new item at the end of the array and remove it from the

end of the array. Or you could do it in a linked list, and then when it's time to find the,

remove the maximum, you have to scan through everything to find the maximum.

So, so, that's a one way you could implement this with, with a linked list or

with the resizing array. Or you might to say well let's try to keep things in

order. And then that might involve some work with the, it's like insertion sort,

you find a place to put the new item and then put it in the right place. And again,

you could do this with a linked list or with the resizing array but then, with array,

you'd have to move all the larger ones over one position to fit the new item in. When

we insert E and that's supposed to keep them in order, we have to move over L, M,

P, and P to get the E and then so forth. But the advantage of that might be that

removing the maximum is easy. We just take away the one at the end. To remove the

Q - we know it's at the end to remove the max. At this point, that's X - it's

right at the end, and P is right at the end. So you can imagine the implementations of

priority queues using these two basic strategies. Not much code involved. This

is a an ordered array implementation of priority queues and it's quite straight

forward. We didn't put in the this is the cheat version where we require the client

to provide a capacity but we could easily change this to a resizing array. So

insert() just puts it at the end, and since its unordered delete maximum has to go

through the entire array to try to find the maximum when it refines it and the

changes that we're the one at the end and then removes it the same way that we do

within the stack. It'll use less and exchange just like we would in sorting

methods. So that's a fine implementation if the priority queue was going to be tiny

all the time. So if you, without even implementing it, you can understand this

table that if we use an unordered array implementation we can get insertion done

in constant time but we have to look at everything to delete the

maximum or even find the maximum. If we keep it in order, we can find the maximum

or delete it at constant time but it takes us linear time to insert. And since as

with stack and queue operations, these insert and deletes might be intermixed in

arbitrary ways and there might be huge numbers of them either one of these is

very attractive because they're going to take N times the number of operations.

Whereas what we can hope for and what we actually will achieve is to get log N time

for all operations, time proportion to log N for all operations. With the clever data

structure and interesting implementation we can actually achieve that goal. That's

the basic API and some elementary implementations for priority queues.