From Recursion to Iteration through CPS

PPL 2023

We observed in 4.2 that with our \(L5\) interpreter because it is executed in a JavaScript interpreter which does NOT implement Tail Call Optimization, we run out of stack space even when we execute a tail-recursive function in \(L5\).

In this section, we introduce a general method which models recursion explicitly - so that we can properly implement tail-recursion as iteration even when our meta-language does not support tail call optimization. In other words, our implementer explains how to turn tail recursion into iteration.

We first illustrate the general approach on a concrete example (a simple tail recursive function written in TypeScript, which we turn into an equivalent iterative program in JavaScript through gradual semantic transformations).
We then implement the same method on the \(L5\) interpreter and obtain an iterative version of the interpreter. We finally demonstrate that when we execute the iterative interpreter on a tail-recursive \(L5\) program, this yields an iterative execution process which does not consume control memory.
The end-result is an interpreter which executes tail-recursive programs as iteration even though the meta-language does not implement tail recursion optimization.

From Recursion to Tail Recursion using the CPS Transformation

Consider the following recursive function in TypeScript:

// ============================================================
// regular recursive function: sum all numbers from 0 to n.
export const sum = (n: number): number =>
    (n === 0) ? 0 :
    n + sum(n - 1);

This function is recursive (because the recursive call does not appear in tail position). We know that if we execute it on a large value for n, it will create a stack overflow error.

We could instead rewrite this function as a tail recursive function:

export const sumIter = (n: number, acc: number): number =>
    (n === 0) ? acc :
    sumIter(n - 1, n + acc);

This version is tail-recursive. On an interpreter that would perform Tail Call Optimization (TCO), this function would be executed in an iterative manner. But JavaScript does not perform TCO, and therefore, this function will also cause a stack overflow when it is executed on a large value of n.

We could of course write the same function directly in an iterative manner:

export const sumLoop = (n: number): number => {
    let sum = 0;
    for (let i = 0; i <= n; i++)
        sum = sum + i;
    return sum;
}

What we want to obtain instead is a systematic transformation process which will turn any recursive function into a semantically equivalent function which can be executed iteratively – that is, without consuming control memory. We demonstrate this transformation as a series of smaller transformations - starting with the CPS transformation, which is the key semantic transformation involved in this overall compilation of the recursive program into an equivalent iterative program.

Note that when we perform the CPS transformation, we do not transform sum into sumIter.
sumIter implements a different algorithm than sum. This is the result of the CPS transformation applied on the TypeScript code of sum:

// ============================================================
// CPS Procedural transformation of the recursive algorithm
type Cont = (x: number) => number;

export const sumCPS = (n: number, cont: Cont): number =>
    (n === 0) ? cont(0) :
    sumCPS(n - 1, (sn1) => cont(n + sn1));

// Driver function: same signature as the original function, passes a default cont.
export const sumCPS1 = (n: number): number =>
    sumCPS(n, (x) => x);

The resulting program is tail recursive (which is always the case for a CPS program).
But because JavaScript does not apply TCO, when we invoke this program, we still obtain a stack overflow:

sumCPS1(10000)

--> RangeError: Maximum call stack size exceeded

From Procedural Continuations to Concrete Continuations

The reason we still suffer from control memory consumption is that even for tail calls, the JavaScript interpreter internally allocates a stack frame. That is, in the tail call:

sumCPS(5, cont) --> sumCPS(4, (sn1) => cont(5 + sn1))

we still allocate a call frame on the stack. In fact, in JavaScript, we always allocate a call frame whenever we invoke a function - regardless of the position of the function application. If we want to avoid allocating call stacks, we must avoid calling functions. How can we achieve this for the cases of continuations?

We will achieve this objective in two steps:

Up to this point, we described continuations as closures which receive as parameter the result of the current function and pass this parameter to the continuation of the code. These closures encapsulate any local variable that does not depend on the result. When a procedure is transformed in CPS-form, the structure of the resulting function is:

In this context, when we invoke a CPS function, we construct its continuation as the last parameter of the CPS function. That is, each time we invoke a CPS function with a continuation, we construct a new continuation instance with the specific parameters this continuation receives.

Within the body of the continuations, we invoke the continuation passed as a parameter with different parameters.

We, accordingly, can distinguish two contexts in which continuations participate:

For example, in the sumCPS example:

type Cont = (x: number) => number;

export const sumCPS = (n: number, cont: Cont): number =>
    (n === 0) ? cont(0) :                   // The cont parameter is invoked
    sumCPS(n - 1, (sn1) => cont(n + sn1));  // A new continuation is constructed with n and cont in its memory
                                            // In the body of this continuation, the cont continuation is invoked.
                                            
// Driver function: same signature as the original function, passes a default cont.
export const sumCPS1 = (n: number): number =>
    sumCPS(n, (x) => x);  // The default identity continuation is constructed (it has no memory)

Concrete Continuations vs. Procedural Continuations

In the first step of the transformation from CPS to Iteration, we transform procedural continuations into concrete data structures. This transformation is systematic:

With this transformation, we obtain the following code for the sumCPS example:

// =============================================================
// CPS Concrete transformation of the CPS procedural implementation
// Continuations are implemented as data structures instead of procedures.

type CCont = IdCont | Cont1;

export interface Cont1 {tag: "Cont1"; n: number; cont: CCont};
export const makeCont1 = (n: number, cont: CCont): Cont1 => ({tag: "Cont1", n: n, cont: cont});
export const isCont1 = (x: any): x is Cont1 => x.tag === "Cont1";

export interface IdCont {tag: "IdCont"};
export const makeIdCont = (): IdCont => ({tag: "IdCont"});
export const isIdCont = (x: any): x is IdCont => x.tag === "IdCont";

export const applyCont = (cont: CCont, val: number): number =>
    isIdCont(cont) ? val :
    isCont1(cont) ? applyCont(cont.cont, cont.n + val) :
    -1;

export const sumCPSC = (n: number, cont: CCont): number =>
    (n === 0) ? applyCont(cont, 0) :
    sumCPSC(n - 1, makeCont1(n, cont));

export const sumCPS2 = (n: number): number =>
    sumCPSC(n, makeIdCont());

Registerization of Concrete Continuations

This implementation of the CPS program with concrete continuations is still not iterative: we invoke JavaScript functions each time we construct a continuation and each time we invoke one. These are places where we will still consume stack memory.

We use functions in the sumCPS2 program for 3 reasons:

Observe, however, that because we are operating on code in CPS form, the function calls are all in tail position. This means that we do not need to remember where to return to, we can simply go to the next call and never return.

We observe that the continuation constructors all have the same structure – they simply initialize the fields of a record with the parameters. Hence, we are ensured that invoking a continuation constructor will consume a single stack frame and will not chain to further function calls.

In contrast, applyCont often yields recursive chains of function calls.

In this step, we remove the need to use a stack frame to pass parameters to the functions involved in the CPS program. Instead, we define a finite set of registers - that is, a set of variables of the appropriate types which are defined over the scope of the whole program (all the functions involved in the CPS transformation). There is one register for each parameter of all the continuation constructors and the applyCont function.

Instead of invoking a function by passing parameters - we initialize the corresponding registers and then call a function of zero parameters. In our example, this transformation yields the following program:

// =============================================================
// CPS Registerization transformation of the CPS concrete implementation

// Define in the scope of the registers all the relevant functions:
// - the computation functions in CPS form
// - the invoke continuation dispatcher method
const sumREG1VM = (nREG: number, contREG: CCont, valREG: number): number => {
    const sumCPSCReg = (): number => {
        // console.log(`sumCPCReg n=${nREG} cont=${JSON.stringify(contREG)}`)
        if (nREG === 0) {
            valREG = 0;
            return applyContReg();
        } else {
            contREG = makeCont1(nREG, contREG);
            nREG = nREG - 1;
            return sumCPSCReg();
        }
    }

    const applyContReg = (): number => {
        // console.log(`applyContReg val=${valREG} cont=${JSON.stringify(contREG)}`)
        if (isIdCont(contREG)) {
            return valREG;
        } else if (isCont1(contREG)) {
            valREG = contREG.n + valREG;
            contREG = contREG.cont;
            return applyContReg();
        } else {
            console.error(`Bad continuation ${contREG}`);
            return -1;
        }
    }
    // entry point
    return sumCPSCReg();
}

// Bootstrap the virtual machine
export const sumREG1 = (n: number): number =>
    sumREG1VM(n, makeIdCont(), undefined);

Why Registers can be Safely Overridden

The set of registers we defined replace activation frames in a stack. Consider why in the current state of our transformation, it is safe to use a single flat set of registers instead of stacked activation frames.

The reason is that in CPS form, when we enter a user-defined function, we read the parameters (in our case from the registers), execute a set of primitive computation steps, and then move on to the tail call. In particular:

Realize that the registerization transformation is safe only on code in CPS form.

Observe the code in the sumREG1 program. We still invoke unbounded functions (that is, functions which can invoke more user defined functions in an unbounded chain of calls):

These functions have zero-argument by construction, but their invocation still consumes stack frames. In the last step of the transformation, called pipelining, we now transform these calls into an explicit loop.

From Registers to Iterative Execution

In the register version of the program, we still invoke zero-parameter functions, all in tail positions. We want to avoid calling functions altogether to avoid consuming activation frames on the stack.

The solution we introduce is to convert tail calls with no parameters into the equivalent of a “computed GOTO” instruction in a very simple “abstract machine”. An abstract machine has a set of pre-defined instructions and a set of registers.
In our case, these instructions are the different types of continuations we can compute in the process of the function execution. We add a single instruction to indicate that we have reached the end of the computation and that the Virtual Machine can shutdown and return the value of a specific register.

To encode the name of the instruction of the next command to be executed, we introduce an additional register - traditionally called the Program Counter (or PC). The type of this register is an enumeration of the possible continuation types. We call this type the InstructionSet of the virtual machine.

Finally, we define a VM() procedure which implements the logic of the virtual machine:

All procedures in the implementation are defined in the scope of the registers. To invoke a new procedure, we end each call by setting pcREG to the name of the procedure we want to execute next. Setting pcREG in tail position is equivalent to invoking a procedure in tail position.

This transformation of calls of procedures in tail position to an explicit loop and dispatch over the PC register yields the following version of the program:

// =============================================================
// Iterative CPS transformation of the Registerized CPS version.

// The set of instructions known by the "Sum Virtual Machine"
type InstructionSet = 'applyContReg2' | 'sumCPSCReg2' | 'halt';

// Introduce the new register pcREG2 which encodes the name of the "next instruction" to be executed.
const sumREG2VM = (nREG2: number, contREG2: CCont, valREG2: number, pcREG2: InstructionSet): number => {
    const sumCPSCReg2 = (): void => {
        // console.log(`sumCPCReg n=${nREG2} cont=${JSON.stringify(contREG2)}`)
        if (nREG2 === 0) {
            valREG2 = 0;
            pcREG2 = 'applyContReg2';
        } else {
            contREG2 = makeCont1(nREG2, contREG2);
            nREG2 = nREG2 - 1;
            pcREG2 = 'sumCPSCReg2';
        }
    }

    const applyContReg2 = (): void => {
        // console.log(`applyContReg val=${valREG2} cont=${JSON.stringify(contREG2)}`)
        if (isIdCont(contREG2)) {
            pcREG2 = 'halt';
            // return valREG;
        } else if (isCont1(contREG2)) {
            valREG2 = contREG2.n + valREG2;
            contREG2 = contREG2.cont;
            pcREG2 = 'applyContReg2';
        } else {
            pcREG2 = 'halt';
        }
    }

    const VM = (): void => {
        // Explicitly iterate over each of the possible steps of the computation
        // The next instruction to be executed is computed and stored in the pcREG2 register.
        while (pcREG2 !== 'halt') {
            if (pcREG2 === 'sumCPSCReg2') sumCPSCReg2();
            else if (pcREG2 === 'applyContReg2') applyContReg2();
            else {
                console.error(`Bad instruction ${pcREG2}`);
                pcREG2 = 'halt';
            }
        }
    }

    // entry point
    VM();
    return valREG2;
}

export const sumREG2 = (n: number): number =>
    sumREG2VM(n, makeIdCont(), undefined, 'sumCPSCReg2');

The Registerized VM is Iterative

The resulting program is an iterative implementation of the original program. Because the original program was tail-recursive, the resulting program is also iterative.

We confirm this through empirical tests:

// =============================================================
// TESTS
sum(10000);     // Stack overflow
sumCPS1(10000); // Stack overflow
sumCPS2(10000); // Stack overflow
sumREG1(10000); // Stack overflow
sumREG2(10000); // 100 010 000 / 2 = 50 005 000

Transforming the \(L5\) Interpreter into an Iterative Interpreter

This was quite a lot of work to turn a single tail recursive procedure into an iterative program. The benefit of this method is that, by relying on the properties of the CPS transformation, we can transform any program into a corresponding iterative program.

In particular, we can transform the code of the \(L5\) interpreter into an iterative interpreter which does not consume JavaScript stack. Naturally, this program will still consume “some memory” when it executes a recursive program. But this memory will be managed explicitly, in the form of a concrete continuation data structure. It will not rely implicitly on the stack of the metalanguage (JavaScript).

As a consequence, if we execute an \(L5\) tail-recursive program, this iterative interpreter will consume bounded control memory - as expected of a Scheme interpreter.

We illustrate this transformation in multiple steps:

To validate the transformation, we confirm that the evaluation of the following \(L7\) program completes without stack overflow:

assert.deepEqual(evalParse(`
(L5 (define sumCPS
      (lambda (n cont)
        (if (= n 0)
            (cont 0)
            (sumCPS (- n 1) (lambda (sn1) (cont (+ n sn1)))))))
    (sumCPS 10000 (lambda (x) x)))
`), 50005000);

Summary