What's the least we can put in a programming language and still have it be able to do anything other languages can [1]? It turns out we can get away with including only a single built in feature: function definition, written using . The language that has only this one feature is called "lambda calculus" or " calculus." We also need a way to apply our functions to inputs; we write to indicate the value of the function on , perhaps more familiarly written .
A expression is something of the form , which means "take in an input , and return . For example, takes in a number and returns one more than it. We only directly have functions with one input, but we can build functions with multiple inputs out of this. For example, applying to , and applying the result to gives . We can think of this as a two-input function, and abbreviate it as ; "" is simply a shorthand for "." We don't lose any power by restricting ourselves to unary functions, because we can think of functions with multiple inputs as taking only one input at a time, and returning another function at each intermediate step.
We said the only feature we have is function definition, so what's this about "" and ""? These symbols don't exist in calculus, so we'll have to define numbers and operations on them using [2]. But expressions are useful even when we have other features as well; programming languages such as Python allow function definition using . It's also useful when trying to understand what means to see examples with functions you're more familiar with.
We're not going to put constraints on what inputs expressions can take. If we allowed expressions like , it would get confused when we try to give it something other than a number; this is why we can't have and . So what sorts of objects will we give our functions? The only objects have we: other functions, of course!
Let's formally define calculus. First, the language of calculus contains
What counts as a expression? We define them recursively:
We also want to define free and bound variables, and also do this recursively:
We can use the rules to build expressions like , in which the first instance of and the only instance of are bound, and the second instance of is free. We have an oppressive amount of parentheses; we'll establish informal conventions to reduce them in the next section.
If we want to be extremely careful, we should write the add-two-numbers function as , using parentheses as in the formal definition. But we often don't need them, and use the following rules:
Of course, we don't have to omit parentheses whenever we can. These rules govern what it means when we don't have all the parentheses strictly necessary, but we can include them if we want to be extra clear.
So far, expressions don't have any meaning; we've said what we're allowed to write down but not how the relate to each other. What we usually do with a expression is reduce it, and there are three rules that we can use: -reduction, -conversion, and -conversion.
-reduction is the fancy name for "stick in the inputs to a function." For example, to evaluate , we first notice that is the input to a expression that starts , and substitute for in the expression, giving us . Now is substituted for , giving us .
Formally, for a variable and expressions and , -reduction converts to , i.e. with every free instance of replaced with . The expressions and are equivalent.
is there a more standard notation for this?
-conversion says we can rename variables; is the same as . Formally, if is a expression containing , and no instance of bound by that is within the scope of a , then we can replace every in the scope of the with a to get an equivalent expression. We need the second criterion because otherwise we could replace the s in with s to get , and then replace the outer with to get , which is not equivalent (the second substitution here is valid; the first one is the problem).
Finally, -conversion says that if two expressions give the same result on any input, then they're equivalent. We should have and be the same, since -reduction converts to , so they're the same on any input. Formally, if isn't free in , then is equivalent to .
Let's look again at the function that adds two numbers, . There are two different ways we can think of this: first, as a function that takes in two numbers, and returns their sum. If we're talking about natural numbers, this is a function .
We could also think of the function as taking one input at a time. It says "take in input , return the expression ." For example, on input we have . Now we have a function that takes in a number, and gives us a function that takes in a number and returns a number. We can write this as .
This equivalence between functions on two (or more) inputs, and functions on one input that return a new function, is known as Currying and is important in calculus, where the only objects are functions. We have to get used to the idea that a function can take other functions as arguments and give functions as return values.
calculus doesn't have a built in way to define a variable. Suppose we have the function which applies its first input to its second input twice. Currying, takes in a function and returns that function composed with itself. Say we want to know what is. We could just write twice, to get . -reduction eventually reduces this to , the function that applies its first input to its second input four times (as you might have expected, applying a function twice, twice, is the same as applying it four times).
We can also get while only writing out once. Take the expression or, writing out , . A single step of -reduction turns this into the previous version, where we wrote out twice. The expression we started with was shorter, and expressed better the idea that we wanted to use the same value of twice. In general, if we want to say something like (i.e. define the variable , and then use it in some expression), we can write , first building the expression we want to use the value of in, and then substituting in the value using function application. This will be very useful in the next section when we make recursive functions.
It might seem like expressions are very finite, and -reduction will always finish after a reasonable amount of time. But for calculus to be as powerful as normal programming languages, there must be expressions that don't reduce in a finite amount of time. The way to resolve this is to figure out how to build while loops in calculus; if you can make a while loop, you can write a program that runs forever pretty easily.
Since all we have is functions, we'll use recursion to make loopy programs. For example, consider this factorial function in Python:
factorial = lambda n: 1 if n==0 else n*factorial(n-1)
How can we write this in lambda calculus? Let's assume we have numbers, arithmetic, booleans,
possibly change these links to church encoding pages once they exist
Starting to translate from Python to calculus, we have , which says "if is , then , else . But we can't define like this; we have to use the method described in the previous section. In order for each recursive call of the factorial function to know how to call further recursive calls, we need to pass in to itself. Consider the expression
Let . Then is applied to itself. Plugging in for , . But since , this is just , exactly what we wanted to be.
Let's generalize this. Say we want to define a recursive formula , where is any expression (in the factorial example ). We define
and use this to build
We can express this process taking any to the recursive function as a expression:
For any , which takes as arguments and and returns some expression using them, recursively calls . Specifically, letting , we have .
For fun, let's make a expression that enters an infinite loop by reducing to itself. If we make the function that applies to , recursively calling it should accomplish this, since it's analogous to the Python function f = lambda x: f(x)
, which just loops forever. So let . Now
Continued -reduction doesn't get us anywhere. This expression is also interesting because it's a simple quine, a program that outputs its own code, and this is a sort of template you can use to make quines in other languages.
This strategy of building recursive functions results in some very long intermediate computations. If is the factorial function, we compute as , etc. By the time the recursive calls finally stop, we have the very long expression , which takes a lot of space to write down or store in a computer. Many programming languages will give you an error if you try to do this, complaining that you exceeded the max recursion depth.
The fix is something called tail_recursion, which roughly says "return only a function evaluation, so you don't need to remember to do something with it when it finishes." The general way do accomplish this is to pass along the intermediate computation to the recursive calls, instead of applying it to the result of the recursive calls. We can make a tail recursive version of our Python factorial like so [3]:
f = lambda n, k: k if n==0 else f(n-1, k*n)
We pass along as an intermediate value. It has to start with ; we can do this by defining a new function g = lambda n: f(n,1)
or using Python's default value feature to have be be default. Now if we want factorial, we get . We compute the products as we go instead of all at the end, and only ever have one call of active at a time.
Writing a tail recursive version of factorial in calculus is left as an exercise for the reader.
You might be wondering why we called the recursive-function-maker , so let's look at that again. We said earlier that . By -conversion, . That means is a fixed point of ! For any function , finds a value such that . This kind of object is called a fixed_point_combinator; this particular one is called the Y_combinator, conventionally denoted .
Why is a fixed point finder the right way to make recursive functions? Consider again the example of factorial. If a function says "do factorial for recursive steps," then says "do factorial for recursive steps;" was designed to be the function which, calling recursively, gives factorial. Naming the identity function , does factorial for steps, so does it for step, for , and so on. To completely compute factorial, we need a function that says "do factorial for infinity recursive steps." But doing one more than infinity is the same as just doing infinity, so it must be true that ; i.e. that is a fixed point of . There may be other fixed points of , but the Y combinator finds the fixed point that corresponds to recursively calling it.
What do we get, and what do we lose, if we allow the use of ?
takes a lambda-term which may have corresponded to a primitive_recursive function, and outputs a lambda term which need not correspond to a primitive recursive function. That is because it allows us to perform an unbounded search: it expresses fully-general looping strategies. (As discussed earlier, without a fixed-point finder of some kind, we can only create lambda-terms corresponding to programs which provably halt.)
On the other side of the coin, may produce non-well-typed terms, while terms without a fixed-point finder are guaranteed to be well-typed. That means it's hard or even impossible to perform type-checking (one of the most basic forms of static analysis) on terms which invoke .
Strongly relatedly, in a rather specific sense through the Curry-Howard correspondence, the use of corresponds to accepting non-constructiveness in one's proofs. Without a fixed-point finder, we can always build a constructive proof corresponding to a given lambda-term; but with involved, we can no longer keep things constructive.
i.e. Turing_complete
The standard way of doing this is called the Church encoding.
Python will still give you a "maximum recursion depth exceeded" error with this, but the strategy works in lambda calculus and lazy languages like Lisp and Haskell, and Python is easier to demonstrate with.