0:04

Now, we're going to finish up by looking at some extra operations that are

Â supported for string keys when we use Tries.

Â So, we're going to look at a couple of character based operations that can, that

Â are quite useful and, and can be supported by the string symbol table API with tries.

Â One important one is a prefix match. So that's given a prefix, like say sh,

Â What we want is the method to return all the keys.

Â Let's start with that prefix. In this case, it'll be shells, shore and

Â she. Another one is so-called wild card match.

Â So that's when we don't specify one of the characters, or multiple characters.

Â So we put a dot to say we'll take any key that matches the other characters and we

Â don't care what that one is. So he. would match she and the.

Â And another one is so-called longest prefix.

Â So now we've got a long key and we want to find the best match that's in our symbol

Â table that matches that key. The key in the symbol table that has the

Â longest match with our key. So going, in this case, the ones that's

Â the longest prefix of shells sort is shells.

Â Later on we're going to see applications of some of these and the important idea

Â for now is that these are easy to articulate.

Â And not only that, they're easy to implement with tries.

Â Nts tr, and Turner research tree. So, we're going to take our standard API

Â and add these four methods. So, while there's keys that returns all

Â keys. And, and, as is our usual practice.

Â When we have a lot of things to return, we're going to return'em as a, as an

Â iterable. So the clients can just, iterate through

Â the what's returned. And that's usually what clients want to

Â do. And then we have keys with prefix.

Â Give a string s, and the iterable will return all the keys in the symbol table

Â that have s as a prefix. If the client wants to find the associated

Â values, they can do gets to get the values.

Â And then we have keys that match so that's where we allowed dot to be a wild card, in

Â S, and we want to return all the keys that, match, S, taking dot to be a wild

Â card. And then longest prefix of S is the key in

Â there that's the longest prefix of S. We're gonna see later on.

Â This particular one plays, an important, role in a, in a more complicated

Â algorithm. We could also go ahead and add all the

Â ordered symbol table methods that we considered when we looked at binary search

Â trees like floor, and, and rank. And, we're not going to take time to do

Â that right now. It's very straightforward.

Â 3:30

Look at for these implementations is ordered iteration.

Â And this is trie-traversal, so we are going to do an in-order traversal of the

Â trie, that is, visit the left, now visit the middle, visit the right.

Â In that traversal, we can maintain the sequence of characters on the path from

Â the root to the current node just by adding an, adding a character when we go

Â down and removing a character when we go up.

Â And when we encounter values, Then we can put out this, the characters

Â that we have, and that's the way to pull out all the keys in the trie.

Â So, for example, in this case we go down and we have b, and then we hit the y's.

Â So we can output,. Say onto a queue we can output by.

Â And then we go back, we have s, e and a. And, we find a value, so we go ahead and

Â put out the a. Now we drop the a and do the l, and that's

Â the l, l, l, l,. and then we get to that s, we have a value we can put ourselves.

Â And then we can drop these, get back to the s, and get the sh.

Â She, finite value, and put, put the she in the queue.

Â Lls and put shells on the queue. Drop all those values down to the h, and

Â get us sho, shor, shore, Find the value and drop shore onto the

Â queue. And then finally, t, th, the and, find a

Â value and put the, down the queue. So it's easy to, very easy to keep track

Â of the sequence of characters on the path from the rout to every node and every node

Â check for value and programmer key. That's a ordered iteration and

Â implementations of these extra operations are just based on multiplying ordered

Â iteration. So this is a implementation of keys with,

Â which is essentially of the operation that I was tracing on the last slide.

Â So if client wants keys, we make a queue. Then we call a recursive routine that

Â collects all the keys in the try, starting at that node.

Â And, given a null string, which is, the, sequence of characters on the path to the

Â given node. When that recursive routine, returns, then

Â we just return the queue. And the recursive routine does encode what

Â I said in words, most of the time. This is for a try.

Â You can do the same thing for a Turner research tree with just a little more

Â code. So for every character we just move down

Â the trie for that character add the character to the prefix and also pass the

Â queue along. If we get a null value, we put what we

Â have on the queue when we get null when we return. So a very simple implementation of

Â ordered iteration of the recursive. So that's the implementation of the keys

Â method. So prefix matches an other things like

Â that going to work. Are going to work, just by modifying that,

Â and you're familiar with, prefix matches, When you type, nowadays, in your browser,

Â a search, function, you're getting, a prefix match of all the things that you

Â typed or other people have typed, that, matches those strings.

Â And you find that also nowadays when searching in your address books.

Â It's a quite, common function nowadays. So let's look at how that looks, in an

Â R-way trie. So what about finding all the keys in a

Â symbol table that start with a given, prefix.

Â Well, we just search for that prefix. And then, then just do a collect at the

Â keys in that sub tri. So that's, very straightforward.

Â So you get the node. That is the one you get to by starting

Â with that prefix and then you just call the recursive collect and boom you're

Â done. Extremely simple, implementation of keys

Â with a prefix in an R-way trie.. What about longest prefix?

Â And here's one that, is, is, very, actually very heavily used, in the

Â internet. If your query strings are IP addresses on

Â the internet and you have some given destination IP address, the router has a

Â lot of IP addresses in a routing table. An it wants to choose the one that will

Â get you as far as close to the destination as you can.

Â So it's going to choose the longest prefix.

Â So if you want to get to 128 by 112, you can think of these as hops on the way to

Â where you want to get, where eleven is the final destination.

Â And essentially this table says, it knows how to get this far and so that's what it

Â wants is the longest prefix. The longest prefix matches this one. Well,

Â it doesn't know how to get any further than 112 and this other one only to 128.

Â I, and this operation gets performed extremely often on the internet nowadays.

Â Just a quick note, It's not the same as the floor function

Â and it actually is a kind of a string operation.

Â These things actually, usually on the internet, they're not represented as

Â strings. They're represented as binary numbers.

Â But in machine or assembly language implementation tries are even easier.

Â You can just take a bunch of bits and use them as index into a table to move down

Â the try. And actually tries are pretty old because

Â so easy to implement data structures in that way in the past.

Â You'd be surprised at how much of our computational infrastructure is built by

Â Program or consists of programs that are written in machine or assembly language

Â that can make use, efficient, really efficient use of low level representations

Â like this. But it's also useful in higher-level

Â languages. So what does it look like in an R-way

Â trie.. Well, all we're going to do is just do a

Â search and then we're going to keep track of the longest key that we encountered.

Â So, we have a path. And on that path there is the most

Â recently seen value that's the longest key that we found If we end that at no link

Â that's fine. It's the, if there is a value on that node

Â that's kind of a no link that's the value we're going to return.

Â So that's a very straightforward implementation.

Â Just keeping track of the longest key encountered on the search for our key.

Â That's implementation of longest prefix of and the usual set up of, of making

Â recursive calls. And this code is quite straightforward.

Â And application of this one and so-called T9 texting.

Â And now you know, when we do a course like this, we try to keep up with modern

Â technology. And modern technology is moving almost as

Â fast as we can. There are lots of young people who really

Â don't know about texting with keypads anymore.

Â But there's a certain range of, you know, five or ten years where people got

Â extremely adept at doing so called multi-tap input, where the only keys on

Â the phone were the nine keys to dial numbers.

Â And then you had three letters associated with each number,

Â Like on old dial telephones. And to enter a letter you had to like to

Â enter an H you had to tap twice, cuz that's the second letter on the four key

Â you had to tap four twice or something like that.

Â So, The so-called T9 text input would use the

Â kinds of algorithms that we're talking about to make it so that maybe you didn't

Â have to do multi-tap. So rather than type two 4's, you would do

Â a tries type search to figure out which word that you typed.

Â And that now looked, It'll look good to us as a potential

Â application for a while. Glad we didn't spend too much time on it.

Â If you study this keyboard, maybe you'll see, see why.

Â Well you think about how to implement it. But once we get into it, we realize, Kevin

Â realized there's no S in this sample keypad, what happened to the S?

Â So what are we going to implement because even if there's no S there, so what

Â exactly are they doing and well maybe this is a bit of a fantasy that this is the

Â response that, maybe these people lived in a world with no S's.

Â Mmm. Well anyway, we've moved on from tries

Â from, from that type of, way of entering text.

Â But I still want to mention a few more ideas because the basis behind tries and

Â Turner research trees is still out there and is still really an important part of

Â our infrastructure. And there's some really great algorithms

Â that we just don't have time to cover. Now one of them's an old algorithm called

Â Patricia. And, This one, is a really interesting and

Â intricate algorithm. Particularly when implemented for binary

Â tries, where you just do a bit at a time. And people, again, implemented this kind

Â of algorithm in machine language. And got extremely efficient performance.

Â If we cast it in, in, our way tries that we've talked about it's really the best

Â way to think about it is, a way to remove one way branching.

Â It seems wasteful, to have, all these nodes that just have one branch.

Â And so one of the main ideas behind Patricia there's others that, don't

Â really, show up, in the level, high level representation we're using.

Â But one of the main ideas behind Patricia was rather than associate a character with

Â each node, Associated sequence of characters with

Â each node, so you just don't have any one way branching.

Â Implementing this is maybe one step beyond this course but maybe it's within what we

Â could do in this course. Where just not going to take the time to

Â do so. And you'll find implementations that in

Â practice avoid the one-way branching that are used in many, many applications,

Â performance critical applications for searching nowadays, I already mentioned IP

Â routing tables. There's probably, you know, no piece of

Â code that's executed more often than that one.

Â That's based on a trie type algorithm. And we have these other applications

Â listed as well. It's got some other names too.

Â Another thing, is, so called, suffix tree. So, that's, building a tree from a suffix

Â table. So, we talked about having, for suffix

Â sorting, we talk about applications. You can also build a search structure from

Â suffixes of a string. That admits, all kinds of, fast string

Â processing application. And again, usually, eliminate on way

Â branching in suffix trees, and also amazing, amazingly you can get'em

Â constructed in linear time. And there's all kinds of interesting

Â applications of suffix trees probably the most important now a days are in

Â computation biology databases. Again extensions of the kinds of

Â algorithms that we've talked about today. So I think the bottom line for considering

Â string search trees is that it's a real success story in algorithm design and

Â analysis. It's a number of clever algorithms that

Â really have made a difference in the kinds of operations that we can perform in the

Â amount of data that we can handle in modern applications.

Â We, we started with red-black BST's which is a pretty good solution for a general

Â symbol table. And also hash tables, which are also

Â widely used. But with tries, and Turner research tries

Â we have a performance guarantee where we'll only have to really access log-in

Â characters. And when you think about that, that's,

Â even when N is huge, that's going to be a pretty small number.

Â Say we're looking among billions, billions of things log in, even in you only have

Â two wave branching would be 30. And when we have 256 way branching, it's

Â way, way smaller. And that's just the number of, so it's the

Â number of characters accessed. And the, really the bottom line is, if

Â you, you can set things up nowadays even on the internet when there's huge, huge

Â amounts of data out there you can set things up so that you can only need to

Â look at maybe 100 bits to get at anything. We think about 100 bits, that's specifies

Â two to the 100th of possibilities. And two to the 100th is a huge number.

Â There are not two to the 100th pieces of information, even on the internet, even

Â will ever exist on the internet in the. Into this galaxy.

Â But with just 100 bits, which really isn't too much, we can search for anything

Â efficiently. And that's an amazing success story for

Â algorithm design and analysis. So that completes our look at tries and

Â Turner research tries.

Â