Y Combinator

A combinator is a lambda expression with no free variables. The Y combinator allows for effective recursion when using explicit recursion in function definitions is not possible.

1. Normal-order form

\[\begin{equation}Y = \lambda f . ((\lambda x. f (x x)) (\lambda x. f (x x)))\label{y}\end{equation}\] where \(f\) is a function that takes one argument

2. Scheme Factorial Example

We will look of an example of a factorial function in scheme (see ELisp) that implements recursion using the Y combinator from Mike Vanier. Then, we will discuss it below as a general template for using the Y combinator to recurse

(define almost-factorial
  (lambda (f)
      (lambda (n)
        (if (= n 0)
            1
            (* n (f (- n 1)))))))

(define Y 
  (lambda (f)
    (lambda (x) (f (x x)))
    (lambda (x) (f (x x)))))

(define factorial (Y almost-factorial))

(factorial 5)

3. Discussion

First, I will give the functions names that reflect how I intuitively understand them.

(define almost-factorial
  (lambda (create-recursive-call)
      (lambda (n)
        (if (= n 0)
            1
            (* n (create-recursive-call (- n 1)))))))

(define make-recursive 
  (lambda (f)
    (lambda (x) (f (x x)))
    (lambda (x) (f (x x)))))

(define factorial (make-recursive almost-factorial))

(factorial 5)

The high-level plan:

Take an almost-recursive function \(f\) and make it recursive: everywhere recursion needs to happen, we substitute create-recursive-call which calls \(f\) again. \(f\) takes the body of create-recursive-call as an argument.

Slightly more detailed plan:

almost-factorial, or more generally, a function \(f\) that takes one argument…
and this argument will be the body of create-recursive-call
\(f\) is almost recursive, but everywhere a call to \(f\) might be made in the body of \(f\), we instead say create-recursive-call
Note that \(f\) is itself a combinator
Given create-recursive-call as an argument, almost-factorial returns a function that takes an integer as \(n\) input. This is an example of currying. This integer argument will act as our counter. Not all recursive functions require a counter, but in this example, it allows us to terminate the recursion when \(n=0\).

\(Y f\) will be a recursive version of \(f\). Let's see how it works. We'll evaluate (make-recursive almost-factorial) n and talk about how \((Y f) n\) would be evaluated in general.

First, what happens when \((Y f)\) is evaluated? We drop in the body of \(Y\) in place: \[Y f = (\lambda f (\lambda x . f (x x)) (\lambda x . f (x x))) f\]

Substituting \(f\) into the body: \[(\lambda x . f (x x)) (\lambda x . f (x x))\]

The next step of evaluation is applying the first lambda to the second lambda. \[f ((\lambda x . f (x x)) (\lambda x . f (x x)))\]

Now \(f\) can be called with its one argument. In our specific example, we this looks like:

(almost-factorial ((lambda (x) (f (x x)))(lambda (x) (f (x x)))))

In the body of almost-factorial the symbol create-recursive-call will be bound to ((lambda (x) (f (x x)))(lambda (x) (f (x x)))). In general, in the body of \(f\), \(((\lambda x . f (x x)) (\lambda x . f (x x)))\) will be bound to some variable, call it \(r\). Now, recall that because of currying, almost-factorial should be able to take a second argument now, namely n. Similarly, \(f\) can now take \(n\) as an argument:

\[f ((\lambda x . f (x x)) (\lambda x . f (x x))) n\]

Side question: Why didn't we keep evaluating the body of create-recursive-call? How did we avoid that infinite series of substitution steps? It's because that body was bound to the name create-recursive-call in the body of almost-factorial. Since we're doing lazy evaluation, this expression will not need to be evaluated until later on in the body when it is called. By then, n will be decremented and we will be making progress in our recursion. Similarly, we can think of the evaluation of \(r\) in the body of \(f\) being postponed.

Now, let's get to evaluating the body of almost-factorial, or \(f\). Inside, we decrement \(n\). Now, we come to the portion of the program where we need to make a recursive call. We call on our passed in create-recursive-call or \(r\). Recall that \(r\) looks like this: \((\lambda x. f (x x)) (\lambda x . f (x x))\) and recall that we call it in this context:

(* n (create-recursive-call (- n 1)))

Evaluation for our specific example proceeds as follows:

(* n ((lambda (x) (f (x x)))(lambda (x) (f (x x)))) (- n 1))

which evaluates to:

(*n (f ((lambda (x) (f (x x)))(lambda (x) (f (x x)))) (- n 1)))

Similarly, when we evaluate \(r\): \[f ((\lambda x. f (x x)) (\lambda x . f (x x)))\]

We're back where we started! I read the above as (* n (almost-factorial (create-recursive-call) (- n 1))). Recall that f was bound to the body of almost-factorial. Again recall that we don't evaluate the lambda expressions further, because they have been bound to a variable in the subsequent call to almost-factorial, or \(f\).

Now \(f\) is ready to receive another numerical argument. We pass it \(n-1\). Eventually, the counter will reach 0, and the base case will be reached. Because we are lazy-evaluating, we will not need to evaluate any further create-recursive-call expressions in the base case.

Side questoion: Why can we call \(f\) here, inside the body of \(f\)? Isn't that explicit recursion? Because \(f\) is a combinator, we just plop in the body of \(f\) for the symbol \(f\). Remember that \(Y f\) created a combinator that knows about \(f\).

4. As fixed point

Y returns the fixed point of a function. That is, \(f (Y f) = Y f\). We can verify this with our definition of \(Y\) \(\eqref{y}\) \[\begin{align*} Y f &= (\lambda x. f (x x)) (\lambda x. f (x x)) \\ &= f ((\lambda x. f (x x)) (\lambda x. f (x x)))\\ &= f (Y f) \end{align*}\]

4.1. Connection with factorial example

Recall that the fixed point of \(f\) is itself a function that takes an integer \(n\) as input. Why do we care about the fixed point in general? It's the only function which will take any value of \(n\), no matter how big. When we make our function to factorial, the argument that we pass into \(f\) can be expanded as deep as we want, which is good, because we will need an argument that can accomodate any size \(n\).

4.2. In what sense is factorial a fixed point?

From wikipedia and 6.820 notes.

Consider \[ H = \lambda f.\; \lambda n.\; Cond\; (Zero?\; n)\; 1\; (Mul\; n\; (f\; (Sub\; n\; 1))) \]

Then, what characterizes the fixed point of \(H\)? Let's look at \(\text{fix}\; H\): \[\begin{align*} \text{fix}\; H\; n &= H\; (\text{fix}\; H)\; n\\ &= \lambda\; n.\; Cond\; (Zero?\; n\;)\; 1\; (Mul\; n\; ((\text{fix}\; H) (Sub\; n\; 1))) \end{align*}\] But we see now that \(\text{fix}\; H\) fulfills exactly what we want from a factorial function.

5. Lazy evaluation form vs strict evaluation form

When lazy evaluation is possible, the Y combinator in scheme looks like:

(define Y 
  (lambda (f)
    ((lambda (x) (f (x x)))
     (lambda (x) (f (x x))))))

This is called normal-order form

When only strict evaluation is allowed, the combinator looks like:

(define Y
  (lambda (f)
    ((lambda (x) (f (lambda (y) (x x) y)))
     (lambda (x) (f (lambda (y) (x x) y))))))

This is called applicative-order form.

What is the difference between these two? When the normal-order version is called: (Y f), the strict evaluator will try to evaluate

((lambda (x) (f (x x)))
 (lambda (x) (f (x x))))

which will result in (lambda (x) (f (x x))) being substituted for x in the first lambda expression:

(f (lambda (x) (f (x x)))
   (lambda (x) (f (x x))))

This will prompt another immediate evaluation of the lambda expression and so on. For a strict evaluator, these substitutions will continue forever, without the body of f ever being run.

In contrast, when the applicative-order version is called, the strict evaluator will try to evaluate

((lambda (x) (f (lambda (y) (x x) (y))))
 (lambda (x) (f (lambda (y) (x x) (y)))))

This will result in:

(f (lambda (y)
    ((lambda (x) (f (lambda (y) (x x) (y))))
     (lambda (x) (f (lambda (y) (x x) (y)))))))

Now, due to the lambda expression wrapper, the contents of the lambda expression will not be evaluated immediately. Rather, they will be evaluated when needed in the body of f.

6. Sources

7. see also

baking the y combinator from scratch