0:03

We're going to finish off by looking at the idea of.

Counting directly with generating functions.

This is going to be first step to ease us into really important role that

generating functions play, in analytic combinatorics.

so it's a, so it's a really an alternate view of what generating functions can do

for us, and it is very combinatorial. I will be much more formal, thorough and

extensive coverage of this, but still it is good to look at the same problem then,

we just looked at from this point of view in this lecture.

So what we're going to do is define a class of combinatorial objects that have

an associated size function. And the generating function's going to be

a sum over all members of the class. We'll still get to the same point of

getting a, an equation that's, a generating function must satisfy.

And from that point on, the analysis will be the same.

but getting that equation is much more natural and fundamental with this kind of

correspondence. It seems kind of abstract.

But, so, let's look at a specific example.

So lets say T is a set of all binary trees.

then we're going to define a size function.

that looks like absolute value. if you have any binary tree, it's going

to be the number of internal nodes in that tree.

[COUGH] Now, what we're interested in is a counting sequence and that's what we're

interested in before. It's the number of binary trees from the

set of all binary trees that have size function equal to n.

So, the, that's the number of binary trees with n internal nodes.

That what we were counting before. This is just a, a formal definition, tree

by tree. Now, what's that good for?

Well. If we define the generating function to

be the sum over all possible trees, of Z to the size of the tree, then that,

treats each term individually [COUGH].

Each tree contributes individually to a term in the generating function, but it's

exactly the same as if the trees of size N were collected together and counted by

T sub N. So that's a fundamental idea on, of this

equation we, some over all possible trees.

but since it's Z to the size of the tree, all the ones of the same size are going

to contribute to the coefficient of that size, say N, and give us T sub N.

so it's the same generating function looked at a different way.

And so and the reason that's helpful is that [COUGH] we can look at the way that

we define the combinatorial structures. to give us the equation that we want for

the generating function. and so, this is how it works for binary

trees so definition of a binary tree. with [COUGH] is that it's a root node

with a binary tree on the left and a binary tree on the right.

and say on the left there's T sub L internal nodes,

on the right there's T sub R internal nodes.

if we take that sum of all possible trees and it's got all possible left nodes.

All possible right nodes then but we had [COUGH] Z to the size of the whole tree

that's the size of the left, the size of the right, +1.

So, the decomposition that we use to define what we mean by, what is a binary

tree? immediately gives this equation on the

generating function. there's, it starts out with a one.

That's, for the empty tree that doesn't compose.

So the double sum is for non empty trees. so

that's a first key step to understand that the,

And the decomposition, the recursive decomposition, the way that we define

binary trees, immediately translates to an equation on the generated function.

so now that equation, those T sub N and T sub R are just afformal variables and

they're independent, so, we can immediately split that sum up into Z

times sum overall Teesabellsi to the T sub L over all T sub R the T sub R and

then get the answer that T of Z equals, one plus Z, T of Z squared.

Just the same as we found before by worrying about all the counting, but this

is much more direct. Really the equation says that, a binary

tree is empty or it's a root connected to two binary trees.

And actually we're going to see a very formal way to really go right from the

[COUGH] description of what it is right down to that generating function

equation. so here's another way to look at it, just

to make sure so the generating function is really got a term for each tree.

So those are all the possible trees and it just keeps going.

So each tree of size 3 is represented. It's a sum over all trees, Z to that tree

size. now when we're worrying about the counts,

we're just collecting all the terms for the same exponent, we're just doing the

algebra. so

now but if you multiply TTfz, of Z * T of Z, it's like taking two trees and

composing them. Well that's what we mean by binary tree,

take two trees and compose them. .

so T1+zT(z^2), of Z = 1 + ZT^2, If we write the things out symbolically as

trees then you can see immediately that this tree here comes from one of those,

and one of those and so forth. it's in fact, some mathematicians prefer

to work with, the symbols in, in some cases in, in commonotorics, people, go

very far working with completely symbolic representations.

It's a good way to think about it, it's, there's a term in that generating

function corresponding to each tree, when we're trying to find out how many there

were of each size, we're just doing the algebra of, collecting by size.

Now, that's, important, but there's, another idea that is related to this,

that, I want to introduce, and that's all about cost.

So a lot of times, it's not just about the size, it's about some other property

of the, of the structure. and so so, we'll use, we'll do two

examples. One's a very easy example.

How many one bits are there in a random bit stream?

Well everybody knows it should be about half.

but still that'll be a warm up. We'll get the right answer for that.

A more complicated thing is that's [COUGH] if we're using a binary

tree in a computer representation, it's important to know how many of the nodes

have both leaves external. you can save space in that way in a lot

of situations. so if we have a random binary tree, how

many leaves are there? then maybe that's not so easy.

in fact with generating functions we can have a unified approach to studying

perameters. and again that's one of the key benefits

of the way, analytic combinatorics way of looking at things.

and I want to do these examples to show the advantages of using generating

functions to help us do calculations and analysis like this.

so again it's the same kind of idea. We're going to have a class it's, the

same idea. We're going to have a class of

combinatorial objects. our model is going to be that all objects

of each size are equally likely. and that's realistic in a lot of

situations. so lets look at how the calculations look

when we're trying to find the value of a parameter.

so let's say the all, set of all objects in the class is p and then we have a size

function which, for every object in the class, we have a defined size.

then we have the counting function so that's the number of objects that have a

size n. so that's what we've been talking about

up to this point. but now let's say we also have a cost

associated with each object. And then, in that case we're going to be

interested in the number of objects that have a given size and a given cost.

And, were, want to know that, so we can do the calculation of the average value.

So that is the expected cost of an object of size n, is going to be the probability

that the object cost of an object of size n is k.

which is the number that F costs K divided by the total number.

then we're assuming their all equally likely.

times k so that's the definition of the expected cost.

and so that's all a familiar calculation if we know all these quantities.

But what. This point of view buys for us is, well

let's notice that we can factor out the p sub n, and we can just count up the total

cost. of all objects of size n.

That's called the cumulative, the accumulated cost.

so every object's got a cost associated with it.

and then we take all the objects of size n.

Add up all their costs. And divide that by the number of objects

of size n that's the expected cost. It's a trivial calculation but still it's

an, an important distinction. so the, the idea is we, if we can compute

the accumulated costs, then we can get the expected costs just by dividing the

accumulated cost by the number of objects.

now what's that have to do with generating functions?

Well again [COUGH]. we start out with the, same situation.

let's take a look at the generating functions.

So we have already talked about the counting generating function.

If we sum over all objects in the class Z to their size, then just algebra collects

their terms to get the [COUGH] coefficient of Z to the N as the number

of objects of size N. but we can have a similar.

Generating function to compute the cumulated cost.

If we look at the function c of z which is the sum for every object in the class.

Its cost times Z to the size. Then, those things are going to

collect by size. And so the coefficient of z^n in that is

going to be nothing other than for all values of k, the sum of k times p and k.

It just collects the objects of cross k by size.

And it will get them all. So the coefficient of z^n in that

function c(z) is exactly the accumulated cost.

So that gives us the average cost. We just extract coefficients from those

two generating functions. the bottom line is.

If we want to compute the expected value of a cost, we have two GF counting

problems to solve. and their similar.

we know how to solve we've already done examples where we can solve GF counting

problems. And now we can get average values of

parameters by solving GF counting problems.

so again, it's a, it's a. We, at trivial calculation, to say, we're

going to compute the average by computing the total and divide by the number.

But still, it's fundamental because now everything is counting.

And we're going to have very powerful mechanisms for counting.

So let's see how it works for the two examples that I mentioned.

How many one bits in a random bit stream? Okay so these are sort of orbit strings.

the number of bits is our size function so that's going to be the size function

for any bit string then in with the cost functional B will call ones B that's the

number of one bits in a given bit string. number of bit strings of size n that's

besoben. and that's 2n.

[COUGH], and then CeceVan, we can use GS to get that, but, no let's no, no hit,

and we're happy with that. And then, the'cumulated cost function

Cece Van, is the total number of one bits in all bit strings of size N.

So counting g f, so that's b of z, sum over all bitstrings z to its size and

that's equal to two to the n, z to the n. One over two, one over 1-2z.

so that's, b of z. And actually, we can, get that formally

just from the definitions. But, I'm sure you believe that one.

what about the cumulative cost GF? So, [COUGH].

So that's that's the function. so now, what we want to do is similar to

what we did for the Catalan. Is to use a, a recursive description of

what a bit string is, to get us a, a formula for this function.

And well, what's a bit string? It's either a zero or a one, followed by

another bit string. So [COUGH], for if we're going to have

for all bits strings the number of 1's. That's going to be e-, equal to.

for all bit strings you can put a zero or a one in the front and that, that'll give

you all possible bit strings. In this whole set of bit strings there's

one, one bit. plus there's two.

How ever many one bits there are in the bit string b prime.

And that's summed over all b, b prime. And we had, length of b.

But it's the length of b prime, plus one. So again, that recursive description

immediately gives that formula for the accumulated cost, GF.

And so now, just doing the math. that the first term gives us a ZV of Z,

and the second term gives us a 2Z C of Z. So that's an equation for the accumulated

cost function and we've got the what B of Z is.

And so we can just solve that equation. It's

[COUGH] Zb of Z is one over one minus 2Z. And then C of Z, bring 2Z Cof C over to

the left hand side. And so we divide by another factor of 1 -

2Z. and so that's a proof that C of Z is

equal to Z over 1 - 2C squared. So now we have explicit expressions for

both the enumerating GF and the accumulated cost GF.

And all we need to do is extract coefficients from those two functions in

order to get the average cost. 2zz/(1-2*z^2) / 1 - 2z^2 is, that's a

standard generating function that we've seen several times before.

And so the bottom line is, the coefficient of z^n and the accumulated

cost is n*2^n. N minus one, coefficient is z to the n,

in [COUGH] and the numeration is to the N,

and that gives us the result that the expected cost is N over two.

Again that's a warm up we kind of knew its n over two.

now lets do a problem that you might have a lot of difficulty solving some other

way and that's these in binary trees. So again a leaf in a binary tree is an

internal node whose children of both are external.

so [COUGH] for example, in the trees of size two, each one of them has one, leaf.

So the if we wanted to go down to do all the

counts, So, t of n's the number of binary trees

with n nodes. TNK is the number of n known binary trees

with k leaves. So, T2-1 =two.

There's two of them that have one leaf. and CN is the average number of leaves in

a random n node binary tree. So that's the ratio of those two.

So C21. = 1 so for five there's four of them that

have one leaf, so I'm sorry for three, there's four of them

that have one leaf, so T31 equals four. There's one, the balanced one, has two

leaves, so T31 equals one. And so,

[COUGH] if you want to compute the average number of leaves in a binary tree

with three nodes, it's four plus two times one divided by five, or 1.2.

And similarly for fourteen, we find that eight of them have one leaf and six of

them will have two leaves and that gives a solution.

So what's the average number of leaves in a, in, in a random binary tree?

If we treat them all likely how do we get that counting sequence?

and again, that's actually a practical problem that plagued programmers when

binary trees first came into use because these things were a wasteful of space and

people want to know how much space they could save.

O.k. So let's [COUGH] use cumulative counting

to solve that problem. So.

set of all binary trees internal nodes is a size function leaves is our cost

function number of leaves in the tree. the number of binary trees is size n of

catalyn numbers as c analyses that we did and now what we want to do is develop a

generating function for the cumulative cause which is the total number of leaves

in all binary trees of size n. so the counting GF, we already did that

one. so that's just summarizing that.

and the accumulated cost GF, that's for the [COUGH] at c of z sum over all trees,

number of leaves times z to the size. [COUGH], and then, the average number of

leaves in a random endo binary tree is just the ratio of the coefficients of

those. We already know the coefficient of the

number of trees, that's the catalyn number.

So, we're looking for coefficient of Z to be in in C of C.

So, that's the next thing we're going to do is try to find that cumulative cost

function. so [COUGH] that's the, that's the

function. Now again we're going to use the same.

kind of decomposition that we did when we're enumerating trees.

And, and actually usually, that's what happens.

The, same formal description, that gave us the number of trees, it's going to

give us, the cost. And that's why this methods so powerful.

we, we, do the work to figure out the composition, of the way to describe it.

and then we get to apply it twice. Once to find out the total number, and

the other to find out the total cost. and so.

[COUGH] that the composition immediately leads to this equation for the cumulated

generating function. So there's the tree that is just a leaf.

so that's accounts for the Z term. and then for all the other trees there's

a left and a right. So that's T sub L nodes on the left and T

sub L nodes on the right. and however many leaves they are on the

left [COUGH] they're. In the tree, is the total number of

leaves on the left, plus the total number of leaves on the right.

The leaf can't, if it's got more than one node in it, the leaf can't cross between

the two trees. so, this equation here holds exactly.

Again, the plus one for the root but that same decomposition gives us this same

equation. And again, T sub L and T sub R are

independent so that immediately [COUGH]. allows us to break these break these sums

up into independent sums. So we have leaves of t sub l, z to the t

c l, t sub l. times the sum on T sub R there's, two

terms that are similar, one where we, for the T sub L and one for the, T sub R and

then, and then we have the double sum, so we distribute over those, so that's just

elementary distribution, and then those things, we have expressions for every one

of them, leafs of T sub L Z, to the T sub L that's

Z of Z. And sum t sub r t z to the t sub r,

that's t of z and we have two of those. So that gives us [COUGH] a simple

equation for. Z of Z, the in-cumulative generated

function. All that's left is to extract

co-coefficient's from that generating function.

So this is the summary of the deriration so far.

we're looking for that CGF. we did that decomposition, and we have an

equation for it. we know the number of tree's is the

catalan numbers. and so our accumulated generating

function is c of z, z plus 2z, t of z c of z, and we solve for z of c.

We get z over one minus 2c t of z. catalan numbers multiplied by 2z then

subtract one, you get z over square root of one minus 4z.

So that's an explicit expression for the cumulated cost function.

C of Z equals Z / 1 - 4Z. And we can extract coefficients from that

the same way that we did for the Catalan numbers using the binomial theorem.

the end result is, 2N minus two choose N minus one.

And very much the same calculations. Just with a slightly different result.

that's accumulated cost and then the final thing is to divide those two

and if you divide those two almost everything cancels.

except for an N plus one times NN on the top.

2N times 2N minus one on the bottom. so that's three N's and two N's and

that's an asymptotic 2N over four. So about a quarter of the.

[COUGH] Nodes in a random binary tree are leaves.

[COUGH], and that's an example of the use of

Accumulated generating functions to discover the average cost of a, of a

quantity. And we'll be seeing lots and lots of

derivations like this and actually many of'em will be much simpler because we

have coherent ways to deal with explaining the decomposition in terms of

the generating function. and we'll talk about that when we

introduce analytic combinatorics. So that's counting with generating

functions. And so now I want to finish by

Giving you a few exercises that you might do before the next lecture.

so the first one is to just practice solving an recurrence, and this is an

example that shows that the initial conditions really matter.

So solve recurrence with one set of initial conditions, and then solve the

same recurrence with just that one initial condition changed.

and you can see the impact of just that one change.

on the recurrence and also get some practice on sovereign recurrences.

another thing that You might do is practice expanding some unknown

generating functions. And there's many exercises like this in

the book. these are just a couple that you might

try. So what about one over square root of

1-Z? natural log of one over one - z.

There's a bunch of ways to do that there's a hint or one way that might

work. that'll give you some practice in how we

extract coefficients from generating functions.

And you can try some other exercises there as well.

So, what I, think would be useful, for people to do to, make sure they

understood the material in this lecture. one thing is to, if you've got access to

a symbolic math, math system, [COUGH] or if you're used to using one.

do things like, check the initial values on that

Equation that we got for accumulated costs for leaves in binary trees.

Similar to the check that I did just that the regular Catalan recurrence holds.

And if you don't have a symbolic math system, look around to see if you can do

that in some way. Some are freely available.

I think again the best way to learn this material is after you've listened to the

lecture and got some idea of the overview of the material is to read the text

carefully.'Cause the text really does tell the story in, in some detail.

and then it's certainly worth while to check your understanding by really try to

write up full solutions to those exercises.

maybe using TEC. Or, or maybe using HTML plus

Jack and really seeing that you can create the math that is the solution to

those exercises. That's an introduction to generating

functions.