patternModerate

Define a list using only the Hindley-Milner type system

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

thehindleysystemtypeusingdefinelistonlymilner

Problem

I'm working on a small lambda calculus compiler that has a working Hindley-Milner type inference system and now also supports recursive let's (not in the linked code), which I understand should be enough to make it Turing complete.

The problem now is I have no idea how to make it support lists, or whether it already does support them and I just need to find a way to encode them. I'd like to be able to define them without having to add new rules to the type system.

The easiest way I can think of a list of x is as something that is either null (or the empty list), or a pair that contains both an x and a list of x. But to do this I need to be able to define pairs and or's, which I believe are the product and the sum types.

Seems that I can define pairs this way:

pair = λabf.fab
first = λp.p(λab.a)
second = λp.p(λab.b)

Since pair would have the type a -> (b -> ((a -> (b -> x)) -> x)), after passing, say, an int and a string, it'd yield something with type (int -> (string -> x)) -> x, which would be the representation of a pair of int and string. What bothers me here is that if that represents a pair, why is that not logically equivalent to, nor implies the proposition int and string?. However, is equivalent to (((int and string) -> x) -> x), as if I could only have product types as parameters to functions. This answer seem to address this problem, but I have no idea what the symbols he uses mean. Also, if this does not really encode a product type, is there anything I can do with product types I couldn't do with my definition of pairs above (considering I can also define n-tuples the same way)? If not, wouldn't this contradict the fact that you cannot express (AFAIK) conjunction using only implication?

Also, how about the sum type? Can I somehow encode it using only the function type? If so, would this be enough to define lists? Or else, is there any other way to define lists without having to extend my type system? And if not, w

Solution

Pairs

This encoding is the Church encoding of pairs. Similar techniques can encode booleans, integers, lists, and other data structures.

Under the context x:a; y:b, the term pair x y has the type (a -> b -> t) -> t. The logical interpretation of this type is the following formula (I use standard mathematical notations: $\to$ is implication, $\vee$ is or, $\wedge$ is and, $\neg$ is negation; $\iff$ is equivalence of formulas):
$$ \begin{align}
(a \to b \to t) \to t
& \quad\iff\quad \neg (\neg a \vee \neg b \vee t) \vee t \\
& \quad\iff\quad (a \wedge b \wedge \neg t) \vee t \\
& \quad\iff\quad (a \wedge b) \vee t \\
\end{align}
$$
Why “a and b or t”? Intuitively, because pair is a function that takes a function as an argument and applies it to the pair. There are two ways this can go: the argument function can make use of the pair, or it can produce a value of type t without using the pair at all.

pair is a constructor for the pair type and first and second are destructors. (These are the same words used in object-oriented programming; here the words have a meaning that is related to the logical interpretation of the types and terms that I won't go into here.) Intuitively, the destructors let you access what is in the object and the constructors pave the way for the destructor by taking as an argument a function that they apply to the parts of the object. This principle can be applied to other types.

Sums

The Church encoding of a discriminated union is essentially dual to the Church encoding of a pair. Where a pair has two parts that must be put together and you can choose to extract one or the other, you can choose to build the union in either of two ways and when you use it you need to allow for both ways. Thus there are two constructors, and there is a single destructor which takes two arguments.

let case w = λf. λg. w f g           case : ((a->t) -> (b->t) -> t) -> (a->t) -> (b->t) -> t
  (* or simply let case w = w *)
let left x = λf. λg. f x             left : a -> ((a->t) -> (b->t) -> t)
let right y = λf. λg. g x            right : b -> ((a->t) -> (b->t) -> t)

Let me abbreviate the type (a->t) -> (b->t) -> t as SUM(a,b)(t). Then the types of the destructors and constructors are:

case : SUM(a,b)(t) -> (a->t) -> (b->t) -> t
left : a -> SUM(a,b)(t)
right : b -> SUM(a,b)(t)

Thus

case (left x) f g → f x
case (rightt y) f g → g y

Lists

For a list, apply again the same principle. A list whose elements have the type a can be built in two ways: it can be an empty list, or it can be an element (the head) plus a list (the tail). Compared with pairs, there's a little twist when it comes to the destructors: you can't have two separate destructors head and tail because they would only work on non-empty lists. You need a single destructor, with two arguments, one of which is a 0-argument function (i.e. a value) for the nil case, and the other a 2-argument function for the cons case. Functions like is_empty, head and tail can be derived from that. Like in the case of sums, the list is directly its own destructor function.

let nil = λn. λc. n
let cons h t = λn. λc. c h t
let is_empty l = l true (λh. λt. false) 
let head l default = l default (λh. λt. h)
let tail l default = l default (λh. λt. t)

Each of these functions is polymorphic. If you work the types of these functions, you'll note than cons is not uniform: the type of the result is not the same as the type of the argument. The type of the result is a variable — it is more general than the argument. If you chain cons calls, the successive calls to build a list instantiate cons at different types. This is crucial to make lists work in the absence of recursive types. You can make heterogeneous lists that way. In fact, the types that you can express are not “list of $T$”, but “list whose first elements are of types $T_1,\ldots,T_n$”.

As you surmise, if you want to define a type that only contains homogeneous lists, you need recursive types. Why? Let's look at the type of a list. A list is encoded as a function which takes two arguments: the value to return on empty lists, and the function to compute the value to return on a cons cell. Let a be the element type, b be the type of the list, and c be the type returned by the destructor. The type of a list is

a -> (a -> b -> c) -> c

Making the list homogeneous is saying that if it's a cons cell, the tail has to have the same type as the whole, i.e. it adds the constraint

a -> (a -> b -> c) -> c = b

The Hindley-Milner type system can be extended with such recursive types, and in fact practical programming languages do that. Practical programming languages tend to disallow such “naked” equations and require a data constructor, but this is not an intrinsic requirement of the underlying theory. Requiring a data constructor simplifies type inference, and in practice tends to avoids accepting functions

Code Snippets

let case w = λf. λg. w f g           case : ((a->t) -> (b->t) -> t) -> (a->t) -> (b->t) -> t
  (* or simply let case w = w *)
let left x = λf. λg. f x             left : a -> ((a->t) -> (b->t) -> t)
let right y = λf. λg. g x            right : b -> ((a->t) -> (b->t) -> t)

case : SUM(a,b)(t) -> (a->t) -> (b->t) -> t
left : a -> SUM(a,b)(t)
right : b -> SUM(a,b)(t)

case (left x) f g → f x
case (rightt y) f g → g y

let nil = λn. λc. n
let cons h t = λn. λc. c h t
let is_empty l = l true (λh. λt. false) 
let head l default = l default (λh. λt. h)
let tail l default = l default (λh. λt. t)

a -> (a -> b -> c) -> c

Context

StackExchange Computer Science Q#20088, answer score: 13

Revisions (0)

No revisions yet.