# Taming Heterogeneous Lists in Haskell

The list data type is perhaps the most frequently used functional data structure in Haskell. Heterogeneous lists are used less frequently compared to the usual lists, simply because we don’t need a list of values from different types all that often. However, they do show up from time to time. In particular, heterogeneous lists are a very natural model of rows from a database table. For example, if I had a database table of my spending history:

Item | Quantity | Unit Price | Purchase Time |
---|---|---|---|

Apple | 10 | 0.5 | 2018/1/1 |

Coffee | 2 | 3 | 2018/1/2 |

Lunch | 1 | 6 | 2018/1/2 |

Text Book | 1 | 100 | 2018/1/2 |

I can model each row of this database by a 4-element heterogeneous list that contains the item name, the quantity, the unit price and the purchase time.

Although heterogeneous lists are flexible in the types of values they contain, it is not immediately clear how to define useful higher-order functions such as `map`

and `foldr`

over them. The type of `map :: (a -> b) -> [a] -> [b]`

does not make sense with heterogeneous lists, since each element in a heterogeneous list can have a completely different type. How can we solve this problem? Is it possible to define higher-order functions for heterogeneous lists just as we would for ordinary lists?

I will explore how to construct such a solution from the ground up in this post. Many ideas in this post are well-established results from the Strongly Typed Heterogeneous Collections paper. But I want to present the important ingredients assuming only experience with vanilla Haskell. Hacking at the type level requires us to enable many language extensions from `ghc`

. When I first encountered these extensions, I had no idea what they entailed and what combinations of them allow me to do what I want. As we walk through defining and writing functions over heterogeneous lists, I will pay close attention to the important language extensions that enable our solution. I hope this post would serve as a guide on basic type level programming to my past self, and also to anyone else who interested in tapping into the power of the Haskell type system.

## Defining Heterogeneous List data type

To define the `HList`

data type, one of the first features we need to enable is `GADTs`

, which stands for generalized algebraic data types. To enable this extension, we use the following pragma in our Haskell source code

`{-# LANGUAGE GADTs #-}`

We have all heard of algebraic data types since those are the default user-definable data types. GADTs give us the ability to provide a more precise type signature for each constructor under definition. One of the canonical use cases of GADTs is typed abstract syntax trees. To present the power of GADTs, let’s take a little detour and assume we are building the abstract syntax trees for a simple expression language that operates over strings and integers. We might define its syntax with the following declaration:

```
data Expr = SLit String
| SAppend Expr Expr
| ILit Int
| IAdd Expr Expr
```

The four constructors represents string literals, string append operation, integer literals and integer addition operation respectively. We could then implement an interpreter for this language:

```
interp :: Expr -> Either String Int
interp (SLit s) = Left s
interp (SAppend s1 s2) =
case (interp s1, interp s2) of
(Left s1', Left s2') -> Left $ s1' ++ s2'
_ -> error "Type mismatch!"
interp (ILit i) = Right i
interp (IAdd i1 i2) =
case (interp i1, interp i2) of
(Right i1', Right i2') -> Right $ i1' + i2'
_ -> error "Type mismatch!"
```

This works as expected:

```
*Main> interp (ILit 10 `IAdd` ILit 20)
Right 30
*Main> interp (SLit "foo" `SAppend ` SLit "bar")
Left "foobar"
```

However, there are two glaring uses of `error`

in the definition of `interp`

! There are many terms in in the `Expr`

type that encode computations that does not make sense in this little language, such as appending two integers, or adding an integer and a string.

Here is where GADTs shine. We can index the type `Expr`

by the sort of values its computation produces, and since GADTs allows us to give type signatures for each constructor, we can restrict the constructors for append and addition to only take string-producing and integer-producing computations. Putting this into code, we have a new definition for `Expr`

.

```
data Expr a where
SLit :: String -> Expr String
SAppend :: Expr String -> Expr String -> Expr String
ILit :: Int -> Expr Int
IAdd :: Expr Int -> Expr Int -> Expr Int
```

Notice that each constructor explicitly mentions its sort of values. Now, we can define an interpreter in the usual way:

```
interp :: Expr a -> a
interp (SLit s) = s
interp (SAppend s1 s2) = interp s1 ++ interp s2
interp (ILit i) = i
interp (IAdd i1 i2) = interp i1 + interp i2
```

This `interp`

is better than the old `interp`

in several ways:

- It gets rid of
`error`

; - Its type signature is much simpler: instead of having
`Either String Int`

, we just have`a`

; - Its definition is also much simpler: we don’t need to pattern match on the result of recursive calls to
`interp`

.

All of these improvements come from the type signatures we added to the constructors for `Expr`

: we have encoded the “correctness” of an `Expr`

term into its constructors, and ruled out all of the invalid ones with GADTs.

This can be verified by trying to construct an invalid term in `ghci`

:

```
*Main> SLit "foo" `IAdd` SLit "bar"
<interactive>:10:1: error:
• Couldn't match type ‘[Char]’ with ‘Int’
Expected type: Expr Int
Actual type: Expr String
• In the first argument of ‘IAdd’, namely ‘SLit "foo"’
In the expression: SLit "foo" `IAdd` SLit "bar"
In an equation for ‘it’: it = SLit "foo" `IAdd` SLit "bar"
<interactive>:10:19: error:
• Couldn't match type ‘[Char]’ with ‘Int’
Expected type: Expr Int
Actual type: Expr String
• In the second argument of ‘IAdd’, namely ‘SLit "bar"’
In the expression: SLit "foo" `IAdd` SLit "bar"
In an equation for ‘it’: it = SLit "foo" `IAdd` SLit "bar"
```

Here, `ghci`

is telling us that it expects the two operands of `IAdd`

to have type `Expr Int`

, but we gave it two `Expr String`

instead. This is what we expect, thanks to the type signature of `IAdd`

.

So how does all of this help us defining a heterogeneous list? Just like we indexed `Expr`

with the sort of computations it produces, we need to index a heterogeneous list with a list of the types it contains. We are used to lists at the value level in Haskell, there is actually also support for type-level lists in GHC as well. A type-level list is a list that contains types instead of values. For example, `'[Int, Float, Bool]`

is a type-level list. In fact, a large family of types enjoy automatic promotion to the type level through a general mechanism called `DataKinds`

. To enable this mechanism, we use the pragma

`{-# LANGUAGE DataKinds #-}`

A kind can be though of as the “type of a type”. All of our familiar types such as `Int`

, `Bool`

, `String`

, `Float`

and any other value-containing type has the kind `*`

, pronounced star. The arrow type constructor for functions `->`

has kind `* -> * -> *`

, which means it takes two value-containing types, and produces another value-containing type. But there can be kinds other than `*`

and combinations of `*`

and arrows. By enabling `DataKinds`

, all the promotable types become kinds in the type system, and the constructors of the type become type constructors in the promoted kind. `DataKinds`

came from the paper Giving Haskell a Promotion, and it’s a delightful read for those interested in the theoretical details behind `DataKinds`

.

With GADTs and DataKinds, we are ready to define the heterogeneous list data type, which we will call `HList`

.

```
data HList :: [*] -> * where
HNil :: HList '[]
HCons :: x -> HList xs -> HList (x ': xs)
```

The two constructors correspond to the empty list and a cons cell respectively. This is just like the definition of the usual list type. The indices to `HList`

is what makes it special. In the previous example for GADTs with `Expr`

, the index to `Expr`

is a value-containing types of kind `*`

. However, the index for `HList`

has the kind `[*]`

, a *type-level list* of value-containing types.

In the `HNil`

constructor, the index is an empty type-level list written `'[]`

. The tick in the front is used to distinguish the value level empty list from the type-level empty list.

In the `HCons`

case, the arguments to the constructor is a value of type `x`

and another `HList`

whose values have types in the type-level list `xs`

. Naturally, this should construct an `HList`

whose values have types in `x ': xs`

.

To make the notations easier to work with, let’s introduce an infix operator instead of writing `HCons`

everywhere:

```
infixr 5 %:
(%:) :: x -> HList xs -> HList (x ': xs)
(%:) = HCons
```

We can now build a heterogeneous list that represents a row from the table shown earlier.

```
*Main> let row = "Apple" %: 10 %: 0.5 %: "2018/1/1" %: HNil
*Main> :t row
row :: (Num x1, Fractional x) => HList '[[Char], x1, x, [Char]]
```

Since we did not specify the type for the literals `10`

and `0.5`

explicitly, `ghci`

tried to infer the most general type for us.

Unfortunately, we don’t have a `Show`

instance for `HList`

s yet, and `ghci`

refuses to print `row`

for us for that reason. Let’s fix this next.

`Show`

and `Eq`

for `HList`

, inductively

We may have seen proofs by induction in math classes before: to prove a property P for all natural numbers, we first prove P for 0, and then assume P holds for n as an inductive hypothesis followed by a proof of P for the successor n+1.

Proof by induction in fact applies to many things other than natural numbers. In particular, it applies to our `HList`

type. The general idea is that in order to prove some property P for all values of a particular type, we just need to prove P for all possible ways to construct values in this type assuming the appropriate inductive hypothesis. For `HList`

s, the `HNil`

and `HCons`

constructors are all the possible ways of construction.

Although we are not proving anything for `HList`

s here, we can consider defining `Show`

instances as “proving” to the compiler that we can `Show`

heterogeneous lists.

Following this intuition, we need to define `Show`

instances separately for the empty case and cons case. This is easy for the empty case:

```
instance Show (HList '[]) where
show HNil = "HNil"
```

For the non-empty case, we might try something like this:

```
instance Show (HList (x ': xs)) where
show (HCons x xs) = ...
```

But we are stuck here. We would like to recursively call `show`

on `x`

and `xs`

, but since we don’t have a `Show`

instance for `x`

and the tail of the list `xs`

. This suggests we need to add some constraints to the instance head here:

```
instance (Show x, Show (HList xs)) => Show (HList (x ': xs)) where
show (HCons x xs) = show x ++ " %: " ++ show xs
```

Notice how the constraint `Show (HList xs)`

is just like an inductive hypothesis here! Now, `ghci`

will happily `show`

the `row`

value we defined just now:

```
*Main> row
"Apple" %: 10 %: 0.5 %: "2018/1/1" %: HNil
```

There are two language extensions that we needed to enable here to have the compiler accept these two instance definitions:

`{-# LANGUAGE FlexibleInstances #-}`

`{-# LANGUAGE FlexibleContexts #-}`

By default, instance heads must be of the form `C (T a1 a2 ... an)`

where `C`

is the typeclass name, `T`

is a type constructor and each of `ai`

are distinct type variables. In our `Show`

instances, `HList`

is the `T`

here, but `'[]`

and `x ': xs`

are not type variables.

Similarly, by default the constraints for instances must have the form `C x`

where `x`

is a type variable.

These two restrictions are in place to help ensure termination of instance resolution, which is undecidable in the most general case. By enabling the two `Flexible`

extensions, we relax these two restrictions.

We can define an `Eq`

instance for `HList`

following the same principle:

```
instance Eq (HList '[]) where
HNil == HNil = True
instance (Eq x, Eq (HList xs)) => Eq (HList (x ': xs)) where
(HCons x xs) == (HCons y ys) = x == y && xs == ys
```

## Higher-order Functions

Just having `HList`

values is not interesting if we can’t perform the usual list operations on them. If we can implement `map`

and `foldr`

over `HList`

s, then we can implement many other operations based on `map`

and `foldr`

.

Since the usual type for `map`

does not apply to `HList`

s, what can we do? Intuitively, we need to supply `map`

over `HList`

with a polymorphic function that accepts values from any type in the heterogeneous list as an argument. But we don’t have any constraints over what types can appear in an `HList`

from its construction, and any function that’s polymorphic enough to operate over values of any type is likely not interesting enough for general computations over `HList`

s.

To overcome this dilemma, we can apply the idea of de-functionalization. De-functionalization means using opaque symbols to represent the function being used in some environment, thus adding an extra level of indirection. This idea can be traced back to the seminal paper Definitional interpreters for higher-order programming languages by John Reynolds.

To implement this idea, we first introduce a new typeclass called `Apply`

:

```
class Apply f a b where
apply :: f -> a -> b
```

Here, the type variable `f`

is the symbol representing a function, and the type variable `a`

is the type of the argument, and the type variable `b`

is the type of the result from applying `f`

to `a`

. This typeclass is different from the other ones we have seen so far, it contains 3 type variables instead of just a single type variable. Indeed, the language extension

`{-# LANGUAGE MultiParamTypeClasses #-}`

allows us to create these typeclass declarations. As its name suggests, it enables the definition of typeclasses with multiple parameters.

The idea behind `Apply`

is that we will ask the typechecker to search for an `Apply`

instance at each application of the function mapping over an `HList`

, and similarly for folds as well. This suggests map and fold should be functions of a typeclass instead of standalone functions for `HList`

s so we can use `Apply`

as a constraint. Let’s define these two typeclasses:

```
class MapH f xs ys where
mapH :: f -> HList xs -> HList ys
class FoldrH f acc xs where
foldrH :: f -> acc -> HList xs -> acc
```

For `MapH`

, we have a function symbol `f`

which will be applied to each element in the list `HList xs`

, producing another list `HList ys`

. For `FoldrH`

, we have a function symbol `f`

which will be used to combine an accumulator with each element of the list `HList xs`

, and returns the final value of the accumulator.

The instance of these two typeclasses are easy to define for empty lists:

```
instance MapH f '[] '[] where
mapH _ _ = HNil
instance FoldrH f acc '[] where
foldrH _ acc _ = acc
```

For non-empty lists, we need to use the `Apply`

typeclass defined earlier as a constraint, so we know how to apply `f`

to each element:

```
instance (Apply f x y, MapH f xs ys)
=> MapH f (x ': xs) (y ': ys) where
mapH f (HCons x xs) = apply f x %: mapH f xs
instance (Apply f x (acc -> acc), FoldrH f acc xs)
=> FoldrH f acc (x ': xs) where
foldrH f acc (HCons x xs) = apply f x $ foldrH f acc xs
```

Let’s test these functions out by implementing a better pretty printer for `HList`

s. We may want the same syntax `[x, y, ..., z]`

as for ordinary lists when printing `HLists`

, instead of printing each cons with the symbol `%:`

. First, we need a function symbol for converting things to `String`

through `mapH`

:

```
data Pretty = Pretty
instance Show x => Apply Pretty x String where
apply _ x = show x
```

Next, we can use the `foldrH`

function to combine the outputs from `mapH`

. We need another symbol for combining these strings into the list format:

```
data Concat = Concat
instance Apply Concat String (String -> String) where
apply _ x = \y -> x ++ ", " ++ y
```

Now, pretty printing a list is just first mapping `Pretty`

over the list, and then folding the result with `Concat`

.

`prettyHList xs = "[" ++ (foldrH Concat "" $ mapH Pretty xs) ++ "]"`

However, although this is what we would like to do operationally, the compiler needs a little more guidance on the intermediate result of `mapH Pretty xs`

since it uses the type of that result to select an instance for `MapH`

. In particular, we need to tell the compiler that the intermediate result is an `HList`

whose type index is a list of `String`

types.

Here, we need another language extension called `TypeFamilies`

. Type families are functions at the type-level: functions that take types and return another type. We can enable it with the pragma:

`{-# LANGUAGE TypeFamilies #-}`

Let’s define a type family called `ConstMap`

that takes a list of types, and returns another list of the same length, but replaces each type in the original list by some given type.

```
type family ConstMap (t :: *) (xs :: [*]) :: [*] where
ConstMap _ '[] = '[]
ConstMap t (x ': xs) = t ': (ConstMap t xs)
```

Now, we can define `prettyHList`

, with a type signature that instructs the compiler which instance of `mapH`

it should select.

```
prettyHList :: forall xs.
( MapH Pretty xs (ConstMap String xs)
, FoldrH Concat String (ConstMap String xs)
) => HList xs -> String
prettyHList hlist =
"[" ++ (foldrH Concat "" $ mapH @_ @_ @(ConstMap String xs) Pretty hlist) ++ "]"
```

In the definition of `prettyHList`

, we used two other language extensions:

`{-# LANGUAGE ScopedTypeVariables #-}`

`{-# LANGUAGE TypeApplications #-}`

The first extension brings the type variables in the signature into scope for the definition of `prettyHList`

, as long as we use an explicit `forall`

to declare them.

The second extension allows us to explicitly unify type variables with a type of our choice. The types of our choice are prefixed by the `@`

sign. We can also supply an underscore as a wildcard type, this would still allow the compiler to infer the type instead of unifying with a specific choice. This is useful when the type variable we would like to unify does not appear first, as is the case in `mapH @_ @_ @(ConstMap String xs)`

: we would like to unify the `ys`

type variable, but it appears last for `mapH`

.

Now, let’s define a better show instance with `prettyHList`

:

```
instance ( MapH Pretty xs (ConstMap String xs)
, FoldrH Concat String (ConstMap String xs)
) => Show (HList xs) where
show = prettyHList
```

Now, `HList`

s are rendered in ghci just like regular lists:

```
*Main> row
["Apple", 10, 0.5, "2018/1/1", ]
```

A final remark is that in fact we need an extension called `UndecidableInstaces`

for the `Apply`

, `MapH`

and `FoldrH`

instances defined earlier. This is because the constraints for these instances are “larger” than the instance themselves, which can potentially lead to an infinite loop during instance resolution. By enabling this extension, we are telling the compiler to rest assured that instance resolution will terminate.

## Overcoming overlapping instances

In the final section, I would like to write about some tricks we can use to deal with overlapping instances in Haskell. To motivate the problem of overlapping instances, let’s consider the extensible record problem.

Extensible records are maps from labels to values of a certain type, and they are extensible in the sense that we can extend any record with another label-value pair.

Of course, we can implement such a data structure with `Data.Dynamic`

and `Data.Map`

, using labels as keys to the map, and convert everything to a `Dynamic`

for storage in the map. But ideally, we want a solution that performs less runtime checks.

Recall earlier that Haskell has many different kinds of types, such as `*`

for value-containing types, `[*]`

for type-level lists, etc. Haskell also has a special kind called `Symbol`

from the module `GHC.TypeLits`

, which is the kind for type-level strings.

We can use type-level strings from the `Symbol`

kind as labels for extensible records with a key-value pair type like this:

```
data KV :: Symbol -> * -> * where
V :: a -> KV s a
```

The `KV`

type has two type indices: a key `s`

and a type `a`

, with `a`

being the type of value it contains.

Now, an extensible record is simply an `HList`

of these `KV`

values, and it is perfectly extensible since we can just `HCons`

another `KV`

to the list :)

```
extensibleRow :: HList '[KV "item" String, KV "quantity" Int, KV "price" Float, KV "date" String]
extensibleRow = V "Apple" %: V 10 %: V 0.5 %: V "2018/1/1" %: HNil
```

Here, `extensibleRow`

contains the same values as the `row`

variable defined above, but also contains the labels of the original table.

As with any record, we would like to select a certain field from it. We can define a `select`

function using another typeclass.

```
class Member (s :: Symbol) xs t where
select :: HList xs -> t
```

Here, an instance of the typeclass `Member s xs t`

says `s`

should only hold if `s`

is a label in `xs`

, and it maps to a value of type `t`

. Using the `select`

function from this typeclass on such an `HList`

will return the value with type `t`

.

Now, we should define two general instances of `Member`

, with the first one corresponding to the case where the first label in the list is the one we are looking for, and the second instance being the recursive case. This is just how we would write a `lookup`

function at the value level. We use `T.==`

from the `Data.Type.Equality`

module for testing equality of symbols at the type level. This module is qualified as `T`

to avoid naming conflicts with other top-level names defined in this post.

```
instance Member s (KV s t ': tail) t where
select (HCons (V t) _) = t
instance ((s T.== s') ~ 'False, Member s tail t) => Member s (KV s' t' ': tail) t where
select (HCons _ xs) = select @s @tail @t xs
```

This passes the typechecker just fine, until we try to use it. And we are hit with the overlapping instance error message from `ghci`

:

```
• Overlapping instances for Member
"item"
'[KV "item" String,
KV "quantity" Int,
KV "price" Float,
KV "date" String]
String
```

How could these two instances overlap each other? In one case, the first symbol in the list is equal to the one we are looking for. In the other case, the first symbol is different from the one we are looking for. These are two logically separate scenarios!

The answer lies in an important property of the instance resolution process for Haskell: ** ghc does not check the instance constraints while matching instance heads**.

If we stand in `ghc`

’s shoes for a second, and try to see things from its perspective, then the two instance heads for `Member`

defined above are indeed overlapping!

`Member s (KV s t ': tail) t`

`Member s (KV s' t' ': tail) t`

Namely, the second instance head is strictly more general than the first instance head.

So… I was thinking too “functionally” instead of “logically” when programming the instances for `Member`

. I can perhaps be a better logical programmer and write two non-overlapping instances to solve this problem. But, I would much rather change the problem itself and keep the same “functional” ideas behind these two instances :)

Recall type families, which are functions at the type level. Type families work more or less as expected when interacting with instance resolution: type families will reduce as much as they can before instance resolution happens. We can exploit behavior, and make these two instances non-overlapping by adding another parameter to the typeclass `Member`

.

```
class Member (s :: Symbol) xs t (isHead :: Bool) where
select' :: HList xs -> t
```

Here, the parameter `isHead`

is a type-level boolean that states whether `s`

is the first label in the list. To compute such a boolean, let’s define a type family:

```
type family IsHead (s :: Symbol) (xs :: [*]) :: Bool where
IsHead s (KV s' _ ': _) = s T.== s'
IsHead s _ = 'False
```

Now, we are ready to define the two instances!

```
instance Member s (KV s t ': tail) t 'True where
select' (HCons (V t) _) = t
instance Member s tail t (IsHead s tail)
=> Member s (KV s' t' ': tail) t 'False where
select' (HCons _ xs) = select' @s @tail @t @(IsHead s tail) xs
```

These two instances are definitely not overlapping because their `isHead`

parameters are different. Notice that we also used the type family `IsHead`

in the constraint to the second instance definition in order to search for `s`

recursively down the tail.

However, `select'`

is not very usable because there are many ambiguous type variables when we apply `select'`

to a record, such as `t`

and `isHead`

, but `t`

and `isHead`

can both be computed before instance resolution. The latter is computed by `IsHead`

, so we just need to define a type family `Lookup`

that computes `t`

.

```
type family Lookup (s :: Symbol) (xs :: [*]) :: * where
Lookup s '[] = TypeError (('Text "Cannot find the label: ") ':<>: ('Text s))
Lookup s (KV s' t ': tail) = If (s T.== s') t (Lookup s tail)
Lookup s (badType ': tail) =
TypeError (('Text "Expecting a KV in the type list, instead have: ")
':<>:
(ShowType badType))
```

Now, we can finally define a user-facing `select`

function as the following:

select :: forall s xs. Member s xs (Lookup s xs) (IsHead s xs) => HList xs -> Lookup s xs select = select’ @s @xs @(Lookup s xs) @(IsHead s xs)

And it works just as expected!

```
*Main> select @"item" extensibleRow
"Apple"
*Main> select @"price" extensibleRow
0.5
```

You may be wondering what those `TypeError`

calls are in the definition of `Lookup`

. Using type-level features can make `ghc`

output some incredibly obscure error messages difficult for a user to digest; in our case, if we tried to `select @"nonexistentField" extensibleRow`

, `ghc`

would try really hard to search for an instance where `s`

is `"nonexistentField"`

, and of course it would fail. This will result in an error message about not being able to deduce such a `Member`

instance.

But in this case, we can catch this error early and report a much better error message in terms of unrecognized label! If we can’t find `s`

in the application of `Lookup`

, then the field label does not exist. So we use the `TypeError`

type family exported from `GHC.TypeLits`

to report a customized type error. This will be much more informative for the users of our extensible records.

Testing out in `ghci`

indeed produces a reasonable error message:

```
*Main> select @"nonexistentField" extensibleRow
<interactive>:32:1: error:
• Cannot find the label: nonexistentField
```

## Conclusion

So there we have it, a small exercise of type-safe heterogeneous lists in Haskell. Although we have used `HList`

as the carrier of our discussion throughout the post, I hope I explained what the required language extensions enable. In my opinion, language extensions are what makes Haskell both amazing and confusing. It gives programmers a cutting edge playground with the latest theoretical enhancements, but it can be daunting to see a dozen unfamiliar extensions in the first lines of a Haskell file. This post is by no means comprehensive, as I have only covered the most important extensions I am familiar with so far, and I hope this helps you find the right references when you encounter these language extensions in your own Haskell adventures.