0:04

In this segment we're going to start our discussion of tail recursion.

This is a new topic relevant to reasoning about how efficient

your recursive functions are in ML and in other functional languages.

At this point we've written a lot of recursive functions and so hopefully you're

convinced that writing a recursive function is no more difficult than writing a loop.

In fact, I've never shown you a loop.

I could even pretend that I don't even know what a loop is.

I would even argue that recursion while never

harder than a loop is often easier than a loop.

Whenever you're processing a tree,

like when you're evaluating an arithmetic expression you really want recursion.

Examples like appending two lists together became much easier when we just thought

recursively and basically these recursive functions

avoid any need for mutation even of local variables.

If you have something like a four loop you always end up incrementing

that index I and we're really trying to stay away from mutable variables.

But what we haven't talked about yet is whether recursion is

efficient or in what situations does it lead to fast code?

A lot of times people think recursion is inefficient.

That's not necessarily the case and even when it is the case it often doesn't matter.

But what we're going to do is see the importance of

this idea called tail recursion and then in

subsequent segments see how to use

common idioms in order to get tail recursion using things like an accumulator.

So just to be clear, I'm not showing you any new language features here.

Just new programming idioms and new ways to reason about

the efficiency of code that we've already been writing.

To understand recursion and tail recursion I have to tell you a little bit about

how function calls are implemented and all you have to

understand is the high level idea of a call stack.

When a program runs there is a call stack

of all the functions that you have called and that aren't done yet.

When you call a function F,

what that does is it pushes onto the stack some instance,

some thing that is going to stay on that stack

until the call to F finishes and when the call to F finishes,

we will pop it from the stack.

So, at any given time the stack contains all the calls that have

started and not finished and there can be lots of

them because one function calls another one.

What's in these stack things which I'll call stack frames?

Will they store information like what are the local variables

bound to and they store information like what work is

left for this function to do after

any function that it called is done with its evaluation.

Now notice that if we have a recursive function where

F calls F which calls F which calls F. We're

going to have multiple stack frames on our stack that are all for the same piece of code,

the function F. That makes perfect sense and it's what recursion is really about.

So let's see an example.

Suppose I have a factorial function here, so it just computes.

If you pass it N it returns N times N minus one times N minus two down to one.

It only works correctly for non-negative numbers.

Suppose I call fact with threes.

I'll eventually get the answer six back,

three times two times one.

My first call is here on this single element stack you see I'm calling fact with three.

Then when I call fact with three it ends up calling fact with two three minus one and

so I'm going to represent that by adding to

my stack that there is now a call of fact two.

What fact three is remembering to do in its stack frame is when it gets

its result back it's going to multiply it by three and then that will be its answer.

So, three times whatever I get back from the call to fact two.

Similarly fact two is going to call fact one,

so now I have a bigger stack where fact two is waiting to get the results of

the call and multiply it by two and even fact one ends up calling fact zero.

At this point I have a stack with four elements on

it but then when I call fact with zero,

it evaluates the f expression and returns one.

There's no additional call put on the stack.

It ends up saying, "All right,

I'm going to return one" and when you return you pop that off your call stack.

So now we have a smaller stack and now fact one has it's recursive result it needs.

It multiplies one by one,

gets one and now we pop it off,

get two by one.

We now pop off the recursive call of fact with argument two,

with fact of three and then fact of three would eventually return six.

Alright. This is how call stacks work.

This is exactly what I would expect to happen when we evaluate fact.

Now here is a second version of factorial that is more complicated.

I'm going to show you in a minute that it's actually more

efficient but we can't see that yet.

Let's first understand how this version of factorial works.

What it does is all it does is it calls a helper function locally

defined although that's not important that I've called aux for auxiliary,

for helper function and this helper function takes in

a number but also acc which is short for accumulator.

What it does is it says,

if N equals zero then just return whatever is an accumulator,

otherwise call aux with N minus one and go ahead and multiply the accumulator by N.

And I'm going to show you that this will actually compute factorial correctly.

Instead of our simple recursive function,

what we're doing now is we're multiplying into this second argument accumulator as we go.

So when N is zero we just end up returning accumulator.

So this is still recursive, aux is recursive.

It's more complicated but when aux calls aux,

the result of the recursive call is the result for the caller.

There is no extra work to do.

When I got back here...

In the first version of factorial

after the recursive call the caller had more work to do.

It had to multiply by N. Here there is no more work for the caller.

The result of the recursive call is the result.

And that's going to be the key difference but before I show that let me

show you how the call stacks would look given what I've shown you so far.

So I start with fact three.

Fact three is going to call aux with three and

one because that's the initial value for the accumulator if you look back at the code.

Then aux of three,

one will end up calling aux with two and three.

So three minus one is two and three times the current accumulator one is three.

Now notice that these callers on my stack are just waiting

to get their results back and immediately return them.

So then aux two three will call aux one six,

aux one six will call aux zero six.

At this point I have a bigger stack than I actually had with

my previous version but now aux zero six will return six.

Aux one six will return that six,

aux two three will return that six,

aux three one will return that six and fact three will turn that six.

This will just continue until I'm done with my program. All right.

So now we get to the key idea of

an important optimization that you should understand functional languages do.

It is simply unnecessary to keep around a stack frame if

all it is going to do is take the result from the callee and immediately return it.

So such a situation is called a tail call

for reasons I've never fully understood but that is what everyone calls them.

And the compiler, the implementation of the language

recognizes these function calls and treats them differently.

What it does is it removes the caller's stack frame before it makes

the call so that the callee just reuses the same stack space that the caller was using.

And along with a couple other optimizations I'm not going to show you,

this makes recursive functions that use tail calls

absolutely as efficient as loops in other languages.

It is reasonable to assume it is guaranteed by another language that you

can assume this kind of efficiency for tail calls.

So in fact what I showed you before for

this more complicated version of factorial is not what happens.

We do start with our stack factor of

three but right here where I see this call for fact three,

this is a tail call, there's nothing more for fact to

do when it calls aux of N and one than return it.

We will reuse the stack space.

We will replace the stack frame for fact three with the stack frame for aux three one.

Now we're evaluating the aux function with three and

one and when we get down here to the recursive call,

there is no more work to do afterwards.

This is also a tail call.

We will reuse this stack space for the recursive call of aux two and three.

The same thing will happen with the next recursive call and with

the next recursive call and so we never build up a stack and when aux zero,

six returns six, we immediately have our answer.

So that is why this more complicated version of factorial is in fact more

efficient and if you are planning to use factorial on

very large numbers where you cared about this sort of efficiency,

this would be a better way to write it.

On the other hand if you were just computing factorial of three,

its more complicated and I would probably prefer the simplest solution.