patternModerate

What property of cons allows elimination of tail recursion modulo cons?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

recursion compilers cs stackoverflow program-optimization tail-recursion

allowswhatrecursioneliminationpropertytailmodulocons

Problem

I'm familiar with the idea of basic tail recursion elimination, where functions that return the direct result of a call to themselves can be rewritten as iterative loops.

foo(...):
    # ...
    return foo(...)

I also understand that, as a special case, the function can still be rewritten if the recursive call is wrapped in a call to cons.

foo(...):
    # ...
    return (..., foo(...))

What property of cons allows this? What functions other than cons can wrap a recursive tail call without destroying our ability to rewrite it iteratively?

GCC (but not Clang) is able to optimize this example of "tail recursion modulo multiplication," but it's unclear what mechanism allows it to discover this or how it makes its transformations.

pow(x, n):
    if n == 0: return 1
    else if n == 1: return x
    else: return x * pow(x, n-1)

Solution

While GCC likely uses ad-hoc rules, you can derive them in the following way. I'll use pow to illustrate since you're foo is so vaguely defined. Also, foo might best be understood as an instance of last-call optimization with respect to single-assignment variables as the language Oz has and as discussed in Concepts, Techniques, and Models of Computer Programming. The benefit of using single-assignment variables is it allows remaining within a declarative programming paradigm. Essentially, you can have each field of the struct foo returns represented by single-assignment variables that are then passed to foo as additional arguments. foo then becomes a tail-recursive void returning function. No particular cleverness is needed for this.

Returning to pow, first, transform into continuation passing style. pow becomes:

pow(x, n):
    return pow2(x, n, x => x)

pow2(x, n, k):
    if n == 0: return k(1)
    else if n == 1: return k(x)
    else: return pow2(x, n-1, y => k(x*y))

All calls are tail calls now. However, the control stack has been moved into the captured environments in the closures representing the continuations.

Next, defunctionalize the continuations. Since there is only one recursive call, the resulting data structure representing the defunctionalized continuations is a list. We get:

pow(x, n):
    return pow2(x, n, Nil)

pow2(x, n, k):
    if n == 0: return applyPow(k, 1)
    else if n == 1: return applyPow(k, x)
    else: return pow2(x, n-1, Cons(x, k))

applyPow(k, acc):
    match k with:
        case Nil: return acc
        case Cons(x, k):
            return applyPow(k, x*acc)

What applyPow(k, acc) does is take a list, i.e. free monoid, like k=Cons(x, Cons(x, Cons(x, Nil))) and make it into x(x(xacc)). But since is associative and generally forms a monoid with unit 1, we can reassociate this into ((xx)x)acc, and, for simplicity, tack a 1 on to start, producing (((1x)x)x)acc. The key thing is that we can actually partially compute the result even before we have acc. That means instead of passing k around as a list which is essentially some incomplete "syntax" that we'll "interpret" at the end, we can "interpret" it as we go. The upshot is that we can replace Nil with the unit of the monoid, 1 in this case, and Cons with the operation of the monoid, , and now k represents the "running product". applyPow(k, acc) then becomes just k*acc which we can inline back into pow2 and simplify producing:

pow(x, n):
    return pow2(x, n, 1)

pow2(x, n, k):
    if n == 0: return k
    else if n == 1: return k*x
    else: return pow2(x, n-1, k*x)

A tail-recursive, accumulator-passing style version of the original pow.

Of course, I'm not saying GCC does all this reasoning at compile-time. I don't know what logic GCC uses. My point is simply having done this reasoning once, it's relatively easy to recognize the pattern and immediately translate the original source code into this final form. However, the CPS transform and defunctionalization transform are completely general and mechanical. From there fusion, deforestation, or supercompilation techniques could be used to attempt to eliminate the reified continuations. The speculative transformations could be thrown away if it is not possible to eliminate all the allocation of the reified continuations. I suspect, though, that that would be too expensive to do all the time, in full generality, hence more ad-hoc approaches.

If you want to get ridiculous, you can check out the paper Recycling Continuations which also uses CPS and representations of continuations as data, but does something similar to but different to tail-recursion-modulo-cons. This describes how you might produce pointer-reversing algorithms by transformation.

This pattern of CPS transforming and defunctionalizing is a quite powerful tool for understanding, and is used to good effect in a series of papers I list here.

Code Snippets

pow(x, n):
    return pow2(x, n, x => x)

pow2(x, n, k):
    if n == 0: return k(1)
    else if n == 1: return k(x)
    else: return pow2(x, n-1, y => k(x*y))

pow(x, n):
    return pow2(x, n, Nil)

pow2(x, n, k):
    if n == 0: return applyPow(k, 1)
    else if n == 1: return applyPow(k, x)
    else: return pow2(x, n-1, Cons(x, k))

applyPow(k, acc):
    match k with:
        case Nil: return acc
        case Cons(x, k):
            return applyPow(k, x*acc)

pow(x, n):
    return pow2(x, n, 1)

pow2(x, n, k):
    if n == 0: return k
    else if n == 1: return k*x
    else: return pow2(x, n-1, k*x)

Context

StackExchange Computer Science Q#83360, answer score: 15

Revisions (0)

No revisions yet.