# A Brief History of Functional Programming

Contents

## Their common ancester: Lambda Calculus!

The most fascinating aspect about functional programming is that functional programming languages are all based on a simple and elegant mathematical foundation --- Lambda Calculus.

As an effort to capture the intuitive computational model based on mathematical functions, Church(1932;1933, 1941) defined a calculus that can express the behavior of functions. First, lambda calculus can use lambda abstractions to define a function. E.g., an one-argument function f that increments its argument by one can be defined by

f = $\lambda$ x. x+1

Church chose the symbol $\lambda$ to do function abstraction and hence the name lambda calculus.

With a function at hand, the next natural thing to do is to apply this function to an argument. E.g., we can apply the above function f to argument 3 and the syntax for this function application is "f 3".

Finally, to do computations, lambda calculus has a reduction rule to tell the meaning of function application such as "f 3" (this reduction rule is where the computation happens and it is called the $\beta$ reduction rule)

f 3 => 3 + 1

Essentially, that's all about lambda calculus: function abstraction, function application and $\beta$ reduction. They corresponds naturally to our intuition of using functions.

A formal presentation of lambda calculus using BNF syntax is like the following

e ::= x | $\lambda$ x. e | e1 e2

where x is a variable, "$\lambda$ x. e" is for lambda abstraction and "e1 e2" is for lambda application.

The $\beta$ reduction rule can be formalized by

($\lambda$ x. e1) e2 => [e2/x] e1.

where [e2/x] e1 means the substitution of e2 for all the occurrence of x in e1. An another example of this rule is ($\lambda$ x. x + x) 5 => 5 + 5.

### What can lambda calculus do?

The first impression of Lambda calculus maybe that it is too simple to be any useful. However, according to Church's Thesis, it can compute anything that your desktop computer can compute!

[Church's Thesis] Effectively computable functions from positive integers to positive integers are just those definable in the lambda calculus.

Here is some evidence to the Church's Thesis. First, Church has shown numbers (like 3) and operations on numbers (like the plus operation) can be defined in the calculus. Furthermore, booleans, pairs, conditional branch, recursive functions and many more can also be defined (interested readers may see section 2.2 of the book by Barendregt for more materials).

As further evidence to Church's Thesis, Turing, who proposed Turing machine as a computational model for computers, proved that Turing computability was equivalent to lambda-definability (Turing37). Kleene showed that lambda-definability was equivalent to Godel and Herbrand's notions of recursiveness (Kleene36).

### Side effects: comparison between functional programming and imperative programming

Imperative languages like C and Java assume having an implicit state which can be modified by constructs in the language. For example, the assignment statement "n:=3" changes the value of the variable n in the state to 3. In this case, we say that statement "n:=3" has side effects.

When I first got in touch with Functional Programming, I was puzzled by the term of "side effects". Isn't the intrinsic meaning of "n:=3" to assign 3 to variable n? Why would people call this a side effect? Here is my taking now: when an expression e is computed, if the behavior of e is solely reflected in its value, then e is side-effect free. For example, suppose we have

f = $\lambda$ x. x + 1

g = $\lambda$ x. (n:=x; x + 1)

The value of computing both "f 3" and "g 3" is 4; however, value 4 cannot summarize the whole behavior of "g 3", which also changes the implicit store. Thus, "g 3" has side effects while "f 3" does not.

We can get more reasoning power for a language without side effects (so called referential transparency). Suppose in a program, we have "x = f(a); x+x"; if f has no side effects, we can transform the above program to "f(a)+f(a)". However, suppose that f has the side effect to print our the value of a, then "f(a)+f(a)" will print out the value twice!

## Early Functional Languages

McCarthy in the late 1950s defined the first versions of LISP, which uses Church's $\lambda$ notation for functions. Peter Landin's Language Iswim and Iverson's APL in the 1960s pushed the functional programming paradigm significantly. Backus gave a Turing Award lecture to argue why imperative programming was "bad" and why functional programming was "good". In his lecture, he also defined the language FP which is very influential in the further development of functional languages.

### LISP

McCarthy's seminal work in the late 1950s defined LISP, which was extremely successfuly and is still used today after 40 years. The dialects now include Common Lisp, Scheme and elisp for emacs. Interestingly, by McCarthy's own account(1978), Lambda Calculus had little impact on the development of LISP other than the lambda notation.
"To use functions as arguments, one needs a notation for functions, and it seemed natural to use the $\lambda$ -notation of Church (1941). I didn't understand the rest of his book, so I wasn't tempted to try to implement his more general mechanism for defining functions."
Nevertheless, LISP had many innovations which was influential on both theorectic and practical aspects of functional programming:
• The use of garbage collection as a method of claiming unused cells (a funny story about the demonstration of the earlist garbage collector; see the appendix section).
• Use closures to implement static scoping. For example, in "let x = 3 in f = $\lambda$ y. x + y", function f is not closed since it has a free variable x. To use f, it has to be bound with an environment that tells the value of x. A function with the environment for its free variables is called a closure.
• Invention of the conditional expression and its use for writing recursive functions. This has something to do with the evaluation order of expressions(see lazy evaluation).
• Use higher-order operations on list such as the mapcar function. we will have say more about higher-order functions.
Nowadays, nearly every functional programming system uses garbage collection and uses closures to implement static scoping.

### FP

Backus defined FP in his Turing Award lecture (1978). His lecture gave a huge lift to functional language field. An important idea in FP is that of higher-order functions, i.e., functions that take functions as arguments and return them as results.

To see the usefulness of higher-order functions, let us consider opertaions on a list of integers. If we want to increment every element by one in the list, say [2,3,5,7], we can do it by scan through each element and do the incrementation. What if we want to double every element, we can do it in the same way except we use the double operation.

Now comes the interesting part. We can define a function g which take another function f and a list of integers as arguments; the job of function g is to apply f to every element in the integer list. That is, this function g takes a higher-order function f (from integers to integers) as an argument.(g is called mapcar in LISP). Interested readers may see the paper by Hughes for more examples.

## Modern Functional Languages

Modern functional languages such as ML, Miranda and Haskell have more advanced features to make functional programming practical. These features include: static type systems, polymorphism, type inference, algebraic datatype, pattern matching and lazy evaluation.

### ML

In the mid 1970s, at the U. of Edinburgh, researchers (Gordon, Milner et al.) had been working on an automatic theorem proving system (LCF). They needed a language to describe proof search strategies. Hence, they defined ML (meta language) and soon found out they could use ML as a general programming language (1978). Two important dialects now include Standard ML of New Jersey and CAML.

#### Static type systems

ML is strongly and statically typed. Its core is based on a typed lambda calculus (the original lambda calculus by Church is untyped).

Types are used to classify expressions. E.g. expression 3 is of type integer, function f = $\lambda$ x. x + 1 is a function from integers to integers, thus it has a function type, int -> int.

By statically giving types to expressions, compilers can prevent errors during compile time. For example, the rule to check function application "f e" requires that f is a function and the type of e matches the domain type of f. Using this rule, compilers reject expressions such as "3 7" (apply 3 to 7) since 3 is not of function type.

ML also allows parametric polymorphism, which can give an expression a polymorphic type. For instance, the function map (mapcar in LISP) that takes a function and a list and applies the function to every element in the list is given the following type, which is polymorphic over t and t'.

map : $\forall$ t,t'. (t -> t') -> t list -> t' list.

For a static typed language like ML, there is a burden on the programmer to provide type annotations to expressions in a program. Fortunately, Hindley (1969) and Milner (1978) independently discovered a restricted polymorphic type system where most of the types can be infered automatically. With the type inference, functional programming becomes practical.

#### Algebraic datatypes and pattern matching

One of the most beloved features in ML is user-defined algebraic datatypes. For example, if we want to define a integer tree type, we can define it in ML:

datatype inttree = Empty | Node of int * int inttree list

It means an inttree is either an empty node (denoted by value constructor Empty), or a node with an intger value and a list of inttrees (denoted by value constructor Node).

Given a value of type inttree, we can use pattern matching to tell if its an empty node or node with a list of trees.

### Miranda

David Turner developed Miranda in Turner85. It is a language with lazy evaluation (non-strict evaluation).

#### Lazy evaluation

The following paragraphs are a sketch of section 2.2 by Hudak.

Expressions in the Lambda Calculus can use the $\beta$ reduction rule to do computations. However, an expression may have different ways to do $\beta$ reductions. For example,the expression ($\lambda$ x. x + x) (5 * 4) can have two possible reductions:

reduction one: ($\lambda$ x. x + x) (5 * 4) => (5 * 4) + (5 * 4)

reduction two: ($\lambda$ x. x + x) (5 * 4) => ($\lambda$ x. x + x) 20

A normal-order reduction is a sequential reduction in which, whenever there is more than one reducible expression, the left most one is chosen first. Reduction one above is using the normal-order reduction. For a function application "f e", this reduction strategy will reduce "f e" first before reducing e to a value. For this reason, it is often called call-by-name parameter passing.

A applicative-order reduction is a sequential reduction in which the leftmost innermost reduction expression is chosen. Reduction two above is using the applicative-order reduction. For a function application "f e", this reduction strategy will first reduce e to a value v, then reduce "f v". For this reason, it is often called call-by-value parameter passing.

The good thing about normal-order reduction is that if there is a normal form (cannot be further reduced) for an expression, the normal-order reduction can always find it. This is not true for applicative-order reduction; an example is "($\lambda$ x. 1) e" and let e be a non-terminating expression.

However, the normal-order reduction is awfully inefficient, since e may appear in f many times and e can be arbitrarily large. If we reduce "f e" first, then the computation of e will be duplicated. This can be seen in the example below, where the computation (5 * 4) has been done twice in the normal-order reduction.

```Normal-order reduction :
($\lambda$ x. x + x) (5 * 4)
=> (5 * 4) + (5 * 4)
=> 20 + (5 * 4)
=> 20 + 20
=> 40
```
```Applicative-order reduction :
($\lambda$ x. x + x) (5 * 4)
=> ($\lambda$ x. x + x) 20
=> 20 + 20
=> 40
```

For this efficiency reason and the ease of implementation, languages like LISP, FP and ML use applicative-order evaluation.

On the other hand, the efficiency of normal-order reduction can be improved without sacrificing its termination property by using lazy evaluation. The idea is when reducing "($\lambda$ x. e1) e2", we can first create a pointer to expression e2 and then reduce "($\lambda$ x. e1) e2" to e1', which is e1 with all x replaced by the pointer to e2. If we need to reduce the pointer when reducing e1', we can reduce the expression (e2) pointed by the pointer. The point here is next time we meet this pointer in e1', we don't need to reduce e2 again since it has already been reduced the first time. Furthermore, if e1' does not contain that pointer, we do not need to reduce e2 at all. In this sense, this strategy is called "call-by-need"; we evaluation e2 whenever we need it and it will be evaluated at most once.

Lazy evaluation uses less reduction steps than applicative-order reduction (although with more implementation and runtime cost) and yet guarantees to find the normal form if there is one.

The best thing about lazy evalution maybe the ability to compute with unbounded "infinite" data structures like an infinite list. For more on this, please see Hughes's paper.

As an effort to define a standard functional language with modern features, a committee is formed in 1987 to serve that purpose. The result is a purely functional language Haskell, named after the logician Haskell B. Curry. Haskell has all the modern functional language features such as higher-order functions, type inference, lazy evaluation and user-defined datatypes. See history of Haskell for more information.