**Contents**

- Their common ancester: Lambda Calculus!
- Early Functional Languages
- Modern Functional Languages
- A Collection of Links
- References

The most fascinating aspect about functional programming is that functional programming languages are all based on a simple and elegant mathematical foundation --- Lambda Calculus.

As an effort to capture the intuitive computational model based on mathematical functions, Church(1932;1933, 1941) defined a calculus that can express the behavior of functions. First, lambda calculus can use lambda abstractions to define a function. E.g., an one-argument function f that increments its argument by one can be defined by

f = $\lambda $ x. x+1

Church chose the symbol $\lambda $ to do function abstraction and hence the name lambda calculus.

With a function at hand, the next natural thing to do is to apply this function to an argument. E.g., we can apply the above function f to argument 3 and the syntax for this function application is "f 3".

Finally, to do computations, lambda calculus has a reduction rule to tell the meaning of function application such as "f 3" (this reduction rule is where the computation happens and it is called the $\beta $ reduction rule)

f 3 => 3 + 1

Essentially, that's all about lambda calculus: function abstraction, function application and $\beta $ reduction. They corresponds naturally to our intuition of using functions.

A formal presentation of lambda calculus using BNF syntax is like the following

e ::= x | $\lambda $ x. e | e1 e2

where x is a variable, "$\lambda $ x. e" is for lambda abstraction and "e1 e2" is for lambda application.

The $\beta $ reduction rule can be formalized by

($\lambda $ x. e1) e2 => [e2/x] e1.

where [e2/x] e1 means the substitution of e2 for all the occurrence of x in e1. An another example of this rule is ($\lambda $ x. x + x) 5 => 5 + 5.

The first impression of Lambda calculus maybe that it is too simple to be any useful. However, according to Church's Thesis, it can compute anything that your desktop computer can compute!

[Church's Thesis]
*
Effectively computable functions from positive integers to positive integers are just those definable in the lambda calculus.
*

Here is some evidence to the Church's Thesis. First, Church has shown numbers (like 3) and operations on numbers (like the plus operation) can be defined in the calculus. Furthermore, booleans, pairs, conditional branch, recursive functions and many more can also be defined (interested readers may see section 2.2 of the book by Barendregt for more materials).

As further evidence to Church's Thesis, Turing, who proposed Turing machine as a computational model for computers, proved that Turing computability was equivalent to lambda-definability (Turing37). Kleene showed that lambda-definability was equivalent to Godel and Herbrand's notions of recursiveness (Kleene36).

When I first got in touch with Functional Programming, I was puzzled by the term of "side effects". Isn't the intrinsic meaning of "n:=3" to assign 3 to variable n? Why would people call this a side effect? Here is my taking now: when an expression e is computed, if the behavior of e is solely reflected in its value, then e is side-effect free. For example, suppose we have

f = $\lambda $ x. x + 1

g = $\lambda $ x. (n:=x; x + 1)

The value of computing both "f 3" and "g 3" is 4; however, value 4 cannot summarize the whole behavior of "g 3", which also changes the implicit store. Thus, "g 3" has side effects while "f 3" does not.

We can get more reasoning power for a language without side effects (so called referential transparency). Suppose in a program, we have "x = f(a); x+x"; if f has no side effects, we can transform the above program to "f(a)+f(a)". However, suppose that f has the side effect to print our the value of a, then "f(a)+f(a)" will print out the value twice!

"To use functions as arguments, one needs a notation for functions, and it seemed natural to use the $\lambda $ -notation of Church (1941). I didn't understand the rest of his book, so I wasn't tempted to try to implement his more general mechanism for defining functions."Nevertheless, LISP had many innovations which was influential on both theorectic and practical aspects of functional programming:

- The use of garbage collection as a method of claiming unused cells (a funny story about the demonstration of the earlist garbage collector; see the appendix section).
- Use closures to implement static scoping. For example, in "let x = 3 in f = $\lambda $ y. x + y", function f is not closed since it has a free variable x. To use f, it has to be bound with an environment that tells the value of x. A function with the environment for its free variables is called a closure.
- Invention of the conditional expression and its use for writing recursive functions. This has something to do with the evaluation order of expressions(see lazy evaluation).
- Use higher-order operations on list such as the
*mapcar*function. we will have say more about higher-order functions.

To see the usefulness of higher-order functions, let us consider opertaions on a list of integers. If we want to increment every element by one in the list, say [2,3,5,7], we can do it by scan through each element and do the incrementation. What if we want to double every element, we can do it in the same way except we use the double operation.

Now comes the interesting part. We can define a function g which take another function f and a list of integers as arguments; the job of function g is to apply f to every element in the integer list. That is, this function g takes a higher-order function f (from integers to integers) as an argument.(g is called *mapcar* in LISP). Interested readers may see the paper by Hughes for more examples.

ML is strongly and statically typed. Its core is based on a typed lambda calculus (the original lambda calculus by Church is untyped).

Types are used to classify expressions. E.g. expression 3 is of type integer, function f = $\lambda $ x. x + 1 is a function from integers to integers, thus it has a function type, int -> int.

By statically giving types to expressions, compilers can prevent errors during compile time. For example, the rule to check function application "f e" requires that f is a function and the type of e matches the domain type of f. Using this rule, compilers reject expressions such as "3 7" (apply 3 to 7) since 3 is not of function type.

ML also allows parametric polymorphism, which can give an expression a polymorphic type. For instance, the function *map* (mapcar in LISP) that takes a function and a list and applies the function to every element in the list is given the following type, which is polymorphic over t and t'.

map : $\forall $ t,t'. (t -> t') -> t list -> t' list.

For a static typed language like ML, there is a burden on the programmer to provide type annotations to expressions in a program. Fortunately, Hindley (1969) and Milner (1978) independently discovered a restricted polymorphic type system where most of the types can be infered automatically. With the *type inference*, functional programming becomes practical.

datatype inttree = Empty | Node of int * int inttree list

It means an inttree is either an empty node (denoted by value constructor Empty), or a node with an intger value and a list of inttrees (denoted by value constructor Node).Given a value of type inttree, we can use pattern matching to tell if its an empty node or node with a list of trees.

Expressions in the Lambda Calculus can use the $\beta $ reduction rule to do computations. However, an expression may have different ways to do $\beta $ reductions. For example,the expression ($\lambda $ x. x + x) (5 * 4) can have two possible reductions:

reduction one: __($\lambda $ x. x + x) (5 * 4)__ => (5 * 4) + (5 * 4)

reduction two: ($\lambda $ x. x + x) __(5 * 4)__ => ($\lambda $ x. x + x) 20

A *normal-order reduction* is a sequential reduction in which, whenever there is more than one reducible expression, the left most one is chosen first. Reduction one above is using the normal-order reduction. For a function application "f e", this reduction strategy will reduce "f e" first before reducing e to a value. For this reason, it is often called *call-by-name* parameter passing.

A *applicative-order reduction* is a sequential reduction in which the leftmost innermost reduction expression is chosen. Reduction two above is using the applicative-order reduction. For a function application "f e", this reduction strategy will first reduce e to a value v, then reduce "f v". For this reason, it is often called *call-by-value* parameter passing.

The good thing about normal-order reduction is that if there is a normal form (cannot be further reduced) for an expression, the normal-order reduction can always find it. This is not true for applicative-order reduction; an example is "($\lambda $ x. 1) e" and let e be a non-terminating expression.

However, the normal-order reduction is awfully inefficient, since e may appear in f many times and e can be arbitrarily large. If we reduce "f e" first, then the computation of e will be duplicated. This can be seen in the example below, where the computation (5 * 4) has been done twice in the normal-order reduction.

Normal-order reduction :($\lambda $ x. x + x) (5 * 4)=> (5 * 4) + (5 * 4) => 20 + (5 * 4) =>20 + 20=> 40

Applicative-order reduction : ($\lambda $ x. x + x) (5 * 4) =>($\lambda $ x. x + x) 20=>20 + 20=> 40

For this efficiency reason and the ease of implementation, languages like LISP, FP and ML use applicative-order evaluation.

On the other hand, the efficiency of normal-order reduction can be improved without sacrificing its termination property by using *lazy evaluation*. The idea is when reducing "($\lambda $ x. e1) e2", we can first create a pointer to expression e2 and then reduce "($\lambda $ x. e1) e2" to e1', which is e1 with all x replaced by the pointer to e2. If we need to reduce the pointer when reducing e1', we can reduce the expression (e2) pointed by the pointer. The point here is next time we meet this pointer in e1', we don't need to reduce e2 again since it has already been reduced the first time. Furthermore, if e1' does not contain that pointer, we do not need to reduce e2 at all. In this sense, this strategy is called "call-by-need"; we evaluation e2 whenever we need it and it will be evaluated *at most once*.

Lazy evaluation uses less reduction steps than applicative-order reduction (although with more implementation and runtime cost) and yet guarantees to find the normal form if there is one.

The best thing about lazy evalution maybe the ability to compute with unbounded "infinite" data structures like an infinite list. For more on this, please see Hughes's paper.

Common Lisp | Scheme | elisp

Standard ML '97 | SML/NJ | CAML

Tutorial Papers in Functional Programming by John Hughes

Functional programming page in Wikipedia

LISP Page by Christopher Browne

[1] |
J. Backus.
Can Programming be Liberated from the von Neumann Style?
Communications of the ACM, 21(8):613-641, 1978.[ bib | http ] |

[2] |
Henk Barendregt.
Lambda calculi with types.
In Handbook of Logic in Computer Science, Volumes 1 (Background:
Mathematical Structures) and 2 (Background: Computational Structures),
Abramsky & Gabbay & Maibaum (Eds.), Clarendon, volume 2. 1992.[ bib | .html ] |

[3] |
Alonzo Church.
A set of postulates for the foundation of logic.
Annals of Mathematics, 2:33, 346-366 and 34, 839-864,
1933-1934.[ bib ] |

[4] |
Alonzo Church.
The calculi of lambda conversion.
Princeton University Press, 1941.[ bib | .html ] |

[5] |
M. Gordon, R. Milner, L. Morris, M. Newey, and C. Wadsworth.
A metalanguage for interactive proof in lcf.
In Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on
Principles of programming languages, pages 119-130. ACM Press, 1978.[ bib | http ] |

[6] |
J. R. Hindley.
The principal type-scheme of an object in combinatory logic.
Trans. American Math. Soc, 146:29-60, 1969.[ bib ] |

[7] |
P. Hudak.
Conception, Evolution, and Application of Functional
Programming Languages.
ACM Computing Surveys, 21(3):359-411, 1989.[ bib | http ] |

[8] |
J. Hughes.
Why Functional Programming Matters.
The Computer Journal, 32(2):98-107, 1989.[ bib | .html ] |

[9] |
John McCarthy.
History of LISP.
In Richard L. Wexelblat, editor, History of Programming
Languages: Proceedings of the ACM SIGPLAN Conference, pages 173-197.
Academic Press, June 1-3 1978.[ bib | .ps ] |

[10] |
R. Milner.
A theory of type polymorphism in programming.
Journal of Computer and System Sciences, 17(3):348-375, 1978.[ bib ] |

[11] |
S.C.Kleene.
Lambda-definability and recursiveness.
Duke Mathematical Journal, 2:340-353, 1936.[ bib ] |

[12] |
Alan M. Turing.
Computability and lambda-definability.
J. Symb. Log., 2(4):153-163, 1937.[ bib ] |

[13] |
D.A. Turner.
Miranda: a Non-Strict Functional Language with
Polymorphic Types.
In J.-P. Jouannaud, editor, Proceedings of the IFIP
International Conference on Functional Programming Languages and Computer
Architectures (FPCA'85), Nancy, France, volume 201 of Lecture Notes in
Computer Science, pages 1-16. Springer-Verlag, Berlin, Germany, 1985.[ bib ] |

The reference is generated from a bibtex file by bibtex2html.

Last updated: *3/11/2004*

Author : Gang Tan, Penn State