Posted on May 9, 2018 by hengchu

Taming Heterogeneous Lists in Haskell

The list data type is perhaps the most frequently used functional data structure in Haskell. Heterogeneous lists are used less frequently compared to the usual lists, simply because we don’t need a list of values from different types all that often. However, they do show up from time to time. In particular, heterogeneous lists are a very natural model of rows from a database table. For example, if I had a database table of my spending history:

Item Quantity Unit Price Purchase Time
Apple 10 0.5 2018/1/1
Coffee 2 3 2018/1/2
Lunch 1 6 2018/1/2
Text Book 1 100 2018/1/2

I can model each row of this database by a 4-element heterogeneous list that contains the item name, the quantity, the unit price and the purchase time.

Although heterogeneous lists are flexible in the types of values they contain, it is not immediately clear how to define useful higher-order functions such as map and foldr over them. The type of map :: (a -> b) -> [a] -> [b] does not make sense with heterogeneous lists, since each element in a heterogeneous list can have a completely different type. How can we solve this problem? Is it possible to define higher-order functions for heterogeneous lists just as we would for ordinary lists?

I will explore how to construct such a solution from the ground up in this post. Many ideas in this post are well-established results from the Strongly Typed Heterogeneous Collections paper. But I want to present the important ingredients assuming only experience with vanilla Haskell. Hacking at the type level requires us to enable many language extensions from ghc. When I first encountered these extensions, I had no idea what they entailed and what combinations of them allow me to do what I want. As we walk through defining and writing functions over heterogeneous lists, I will pay close attention to the important language extensions that enable our solution. I hope this post would serve as a guide on basic type level programming to my past self, and also to anyone else who interested in tapping into the power of the Haskell type system.


Defining Heterogeneous List data type

To define the HList data type, one of the first features we need to enable is GADTs, which stands for generalized algebraic data types. To enable this extension, we use the following pragma in our Haskell source code

{-# LANGUAGE GADTs #-}

We have all heard of algebraic data types since those are the default user-definable data types. GADTs give us the ability to provide a more precise type signature for each constructor under definition. One of the canonical use cases of GADTs is typed abstract syntax trees. To present the power of GADTs, let’s take a little detour and assume we are building the abstract syntax trees for a simple expression language that operates over strings and integers. We might define its syntax with the following declaration:

data Expr = SLit String
          | SAppend Expr Expr
          | ILit Int
          | IAdd Expr Expr

The four constructors represents string literals, string append operation, integer literals and integer addition operation respectively. We could then implement an interpreter for this language:

interp :: Expr -> Either String Int
interp (SLit s)        = Left s
interp (SAppend s1 s2) =
  case (interp s1, interp s2) of
    (Left s1', Left s2') -> Left $ s1' ++ s2'
    _                    -> error "Type mismatch!"
interp (ILit i)        = Right i
interp (IAdd i1 i2)    =
  case (interp i1, interp i2) of
    (Right i1', Right i2') -> Right $ i1' + i2'
    _                      -> error "Type mismatch!"

This works as expected:

*Main> interp (ILit 10 `IAdd` ILit 20)
Right 30
*Main> interp (SLit "foo" `SAppend ` SLit "bar")
Left "foobar"

However, there are two glaring uses of error in the definition of interp! There are many terms in in the Expr type that encode computations that does not make sense in this little language, such as appending two integers, or adding an integer and a string.

Here is where GADTs shine. We can index the type Expr by the sort of values its computation produces, and since GADTs allows us to give type signatures for each constructor, we can restrict the constructors for append and addition to only take string-producing and integer-producing computations. Putting this into code, we have a new definition for Expr.

data Expr a where
  SLit    :: String -> Expr String
  SAppend :: Expr String -> Expr String -> Expr String
  ILit    :: Int -> Expr Int
  IAdd    :: Expr Int -> Expr Int -> Expr Int

Notice that each constructor explicitly mentions its sort of values. Now, we can define an interpreter in the usual way:

interp :: Expr a -> a
interp (SLit s)        = s
interp (SAppend s1 s2) = interp s1 ++ interp s2
interp (ILit i)        = i
interp (IAdd i1 i2)    = interp i1 + interp i2

This interp is better than the old interp in several ways:

  1. It gets rid of error;
  2. Its type signature is much simpler: instead of having Either String Int, we just have a;
  3. Its definition is also much simpler: we don’t need to pattern match on the result of recursive calls to interp.

All of these improvements come from the type signatures we added to the constructors for Expr: we have encoded the “correctness” of an Expr term into its constructors, and ruled out all of the invalid ones with GADTs.

This can be verified by trying to construct an invalid term in ghci:

*Main> SLit "foo" `IAdd` SLit "bar"

<interactive>:10:1: error:
    • Couldn't match type ‘[Char]’ with ‘Int’
      Expected type: Expr Int
        Actual type: Expr String
    • In the first argument of ‘IAdd’, namely ‘SLit "foo"’
      In the expression: SLit "foo" `IAdd` SLit "bar"
      In an equation for ‘it’: it = SLit "foo" `IAdd` SLit "bar"

<interactive>:10:19: error:
    • Couldn't match type ‘[Char]’ with ‘Int’
      Expected type: Expr Int
        Actual type: Expr String
    • In the second argument of ‘IAdd’, namely ‘SLit "bar"’
      In the expression: SLit "foo" `IAdd` SLit "bar"
      In an equation for ‘it’: it = SLit "foo" `IAdd` SLit "bar"

Here, ghci is telling us that it expects the two operands of IAdd to have type Expr Int, but we gave it two Expr String instead. This is what we expect, thanks to the type signature of IAdd.

So how does all of this help us defining a heterogeneous list? Just like we indexed Expr with the sort of computations it produces, we need to index a heterogeneous list with a list of the types it contains. We are used to lists at the value level in Haskell, there is actually also support for type-level lists in GHC as well. A type-level list is a list that contains types instead of values. For example, '[Int, Float, Bool] is a type-level list. In fact, a large family of types enjoy automatic promotion to the type level through a general mechanism called DataKinds. To enable this mechanism, we use the pragma

{-# LANGUAGE DataKinds #-}

A kind can be though of as the “type of a type”. All of our familiar types such as Int, Bool, String, Float and any other value-containing type has the kind *, pronounced star. The arrow type constructor for functions -> has kind * -> * -> *, which means it takes two value-containing types, and produces another value-containing type. But there can be kinds other than * and combinations of * and arrows. By enabling DataKinds, all the promotable types become kinds in the type system, and the constructors of the type become type constructors in the promoted kind. DataKinds came from the paper Giving Haskell a Promotion, and it’s a delightful read for those interested in the theoretical details behind DataKinds.

With GADTs and DataKinds, we are ready to define the heterogeneous list data type, which we will call HList.

data HList :: [*] -> * where
  HNil  :: HList '[]
  HCons :: x -> HList xs -> HList (x ': xs)

The two constructors correspond to the empty list and a cons cell respectively. This is just like the definition of the usual list type. The indices to HList is what makes it special. In the previous example for GADTs with Expr, the index to Expr is a value-containing types of kind *. However, the index for HList has the kind [*], a type-level list of value-containing types.

In the HNil constructor, the index is an empty type-level list written '[]. The tick in the front is used to distinguish the value level empty list from the type-level empty list.

In the HCons case, the arguments to the constructor is a value of type x and another HList whose values have types in the type-level list xs. Naturally, this should construct an HList whose values have types in x ': xs.

To make the notations easier to work with, let’s introduce an infix operator instead of writing HCons everywhere:

infixr 5 %:
(%:) :: x -> HList xs -> HList (x ': xs)
(%:) = HCons

We can now build a heterogeneous list that represents a row from the table shown earlier.

*Main> let row = "Apple" %: 10 %: 0.5 %: "2018/1/1" %: HNil
*Main> :t row
row :: (Num x1, Fractional x) => HList '[[Char], x1, x, [Char]]

Since we did not specify the type for the literals 10 and 0.5 explicitly, ghci tried to infer the most general type for us.

Unfortunately, we don’t have a Show instance for HLists yet, and ghci refuses to print row for us for that reason. Let’s fix this next.


Show and Eq for HList, inductively

We may have seen proofs by induction in math classes before: to prove a property P for all natural numbers, we first prove P for 0, and then assume P holds for n as an inductive hypothesis followed by a proof of P for the successor n+1.

Proof by induction in fact applies to many things other than natural numbers. In particular, it applies to our HList type. The general idea is that in order to prove some property P for all values of a particular type, we just need to prove P for all possible ways to construct values in this type assuming the appropriate inductive hypothesis. For HLists, the HNil and HCons constructors are all the possible ways of construction.

Although we are not proving anything for HLists here, we can consider defining Show instances as “proving” to the compiler that we can Show heterogeneous lists.

Following this intuition, we need to define Show instances separately for the empty case and cons case. This is easy for the empty case:

instance Show (HList '[]) where
  show HNil = "HNil"

For the non-empty case, we might try something like this:

instance Show (HList (x ': xs)) where
  show (HCons x xs) = ...

But we are stuck here. We would like to recursively call show on x and xs, but since we don’t have a Show instance for x and the tail of the list xs. This suggests we need to add some constraints to the instance head here:

instance (Show x, Show (HList xs)) => Show (HList (x ': xs)) where
  show (HCons x xs) = show x ++ " %: " ++ show xs

Notice how the constraint Show (HList xs) is just like an inductive hypothesis here! Now, ghci will happily show the row value we defined just now:

*Main> row
"Apple" %: 10 %: 0.5 %: "2018/1/1" %: HNil

There are two language extensions that we needed to enable here to have the compiler accept these two instance definitions:

  1. {-# LANGUAGE FlexibleInstances #-}
  2. {-# LANGUAGE FlexibleContexts #-}

By default, instance heads must be of the form C (T a1 a2 ... an) where C is the typeclass name, T is a type constructor and each of ai are distinct type variables. In our Show instances, HList is the T here, but '[] and x ': xs are not type variables.

Similarly, by default the constraints for instances must have the form C x where x is a type variable.

These two restrictions are in place to help ensure termination of instance resolution, which is undecidable in the most general case. By enabling the two Flexible extensions, we relax these two restrictions.

We can define an Eq instance for HList following the same principle:

instance Eq (HList '[]) where
  HNil == HNil = True
instance (Eq x, Eq (HList xs)) => Eq (HList (x ': xs)) where
  (HCons x xs) == (HCons y ys) = x == y && xs == ys

Higher-order Functions

Just having HList values is not interesting if we can’t perform the usual list operations on them. If we can implement map and foldr over HLists, then we can implement many other operations based on map and foldr.

Since the usual type for map does not apply to HLists, what can we do? Intuitively, we need to supply map over HList with a polymorphic function that accepts values from any type in the heterogeneous list as an argument. But we don’t have any constraints over what types can appear in an HList from its construction, and any function that’s polymorphic enough to operate over values of any type is likely not interesting enough for general computations over HLists.

To overcome this dilemma, we can apply the idea of de-functionalization. De-functionalization means using opaque symbols to represent the function being used in some environment, thus adding an extra level of indirection. This idea can be traced back to the seminal paper Definitional interpreters for higher-order programming languages by John Reynolds.

To implement this idea, we first introduce a new typeclass called Apply:

class Apply f a b where
  apply :: f -> a -> b

Here, the type variable f is the symbol representing a function, and the type variable a is the type of the argument, and the type variable b is the type of the result from applying f to a. This typeclass is different from the other ones we have seen so far, it contains 3 type variables instead of just a single type variable. Indeed, the language extension

{-# LANGUAGE MultiParamTypeClasses #-}

allows us to create these typeclass declarations. As its name suggests, it enables the definition of typeclasses with multiple parameters.

The idea behind Apply is that we will ask the typechecker to search for an Apply instance at each application of the function mapping over an HList, and similarly for folds as well. This suggests map and fold should be functions of a typeclass instead of standalone functions for HLists so we can use Apply as a constraint. Let’s define these two typeclasses:

class MapH f xs ys where
  mapH :: f -> HList xs -> HList ys
class FoldrH f acc xs where
  foldrH :: f -> acc -> HList xs -> acc

For MapH, we have a function symbol f which will be applied to each element in the list HList xs, producing another list HList ys. For FoldrH, we have a function symbol f which will be used to combine an accumulator with each element of the list HList xs, and returns the final value of the accumulator.

The instance of these two typeclasses are easy to define for empty lists:

instance MapH f '[] '[] where
  mapH _ _ = HNil
instance FoldrH f acc '[] where
  foldrH _ acc _ = acc

For non-empty lists, we need to use the Apply typeclass defined earlier as a constraint, so we know how to apply f to each element:

instance (Apply f x y, MapH f xs ys)
  => MapH f (x ': xs) (y ': ys) where
  mapH f (HCons x xs) = apply f x %: mapH f xs

instance (Apply f x (acc -> acc), FoldrH f acc xs)
  => FoldrH f acc (x ': xs) where
  foldrH f acc (HCons x xs) = apply f x $ foldrH f acc xs

Let’s test these functions out by implementing a better pretty printer for HLists. We may want the same syntax [x, y, ..., z] as for ordinary lists when printing HLists, instead of printing each cons with the symbol %:. First, we need a function symbol for converting things to String through mapH:

data Pretty = Pretty
instance Show x => Apply Pretty x String where
  apply _ x = show x

Next, we can use the foldrH function to combine the outputs from mapH. We need another symbol for combining these strings into the list format:

data Concat = Concat
instance Apply Concat String (String -> String) where
  apply _ x = \y -> x ++ ", " ++ y

Now, pretty printing a list is just first mapping Pretty over the list, and then folding the result with Concat.

prettyHList xs = "[" ++ (foldrH Concat "" $ mapH Pretty xs) ++ "]"

However, although this is what we would like to do operationally, the compiler needs a little more guidance on the intermediate result of mapH Pretty xs since it uses the type of that result to select an instance for MapH. In particular, we need to tell the compiler that the intermediate result is an HList whose type index is a list of String types.

Here, we need another language extension called TypeFamilies. Type families are functions at the type-level: functions that take types and return another type. We can enable it with the pragma:

{-# LANGUAGE TypeFamilies #-}

Let’s define a type family called ConstMap that takes a list of types, and returns another list of the same length, but replaces each type in the original list by some given type.

type family ConstMap (t :: *) (xs :: [*]) :: [*] where
  ConstMap _      '[]  = '[]
  ConstMap t (x ': xs) = t ': (ConstMap t xs)

Now, we can define prettyHList, with a type signature that instructs the compiler which instance of mapH it should select.

prettyHList :: forall xs.
               ( MapH   Pretty xs     (ConstMap String xs)
               , FoldrH Concat String (ConstMap String xs)
               ) => HList xs -> String
prettyHList hlist =
  "[" ++ (foldrH Concat "" $ mapH @_ @_ @(ConstMap String xs) Pretty hlist) ++ "]"

In the definition of prettyHList, we used two other language extensions:

  1. {-# LANGUAGE ScopedTypeVariables #-}
  2. {-# LANGUAGE TypeApplications #-}

The first extension brings the type variables in the signature into scope for the definition of prettyHList, as long as we use an explicit forall to declare them.

The second extension allows us to explicitly unify type variables with a type of our choice. The types of our choice are prefixed by the @ sign. We can also supply an underscore as a wildcard type, this would still allow the compiler to infer the type instead of unifying with a specific choice. This is useful when the type variable we would like to unify does not appear first, as is the case in mapH @_ @_ @(ConstMap String xs): we would like to unify the ys type variable, but it appears last for mapH.

Now, let’s define a better show instance with prettyHList:

instance ( MapH   Pretty xs     (ConstMap String xs)
         , FoldrH Concat String (ConstMap String xs)
         ) => Show (HList xs) where
  show = prettyHList

Now, HLists are rendered in ghci just like regular lists:

*Main> row
["Apple", 10, 0.5, "2018/1/1", ]

A final remark is that in fact we need an extension called UndecidableInstaces for the Apply, MapH and FoldrH instances defined earlier. This is because the constraints for these instances are “larger” than the instance themselves, which can potentially lead to an infinite loop during instance resolution. By enabling this extension, we are telling the compiler to rest assured that instance resolution will terminate.


Overcoming overlapping instances

In the final section, I would like to write about some tricks we can use to deal with overlapping instances in Haskell. To motivate the problem of overlapping instances, let’s consider the extensible record problem.

Extensible records are maps from labels to values of a certain type, and they are extensible in the sense that we can extend any record with another label-value pair.

Of course, we can implement such a data structure with Data.Dynamic and Data.Map, using labels as keys to the map, and convert everything to a Dynamic for storage in the map. But ideally, we want a solution that performs less runtime checks.

Recall earlier that Haskell has many different kinds of types, such as * for value-containing types, [*] for type-level lists, etc. Haskell also has a special kind called Symbol from the module GHC.TypeLits, which is the kind for type-level strings.

We can use type-level strings from the Symbol kind as labels for extensible records with a key-value pair type like this:

data KV :: Symbol -> * -> * where
  V :: a -> KV s a

The KV type has two type indices: a key s and a type a, with a being the type of value it contains.

Now, an extensible record is simply an HList of these KV values, and it is perfectly extensible since we can just HCons another KV to the list :)

extensibleRow :: HList '[KV "item" String, KV "quantity" Int, KV "price" Float, KV "date" String]
extensibleRow = V "Apple" %: V 10 %: V 0.5 %: V "2018/1/1" %: HNil

Here, extensibleRow contains the same values as the row variable defined above, but also contains the labels of the original table.

As with any record, we would like to select a certain field from it. We can define a select function using another typeclass.

class Member (s :: Symbol) xs t where
  select :: HList xs -> t

Here, an instance of the typeclass Member s xs t says s should only hold if s is a label in xs, and it maps to a value of type t. Using the select function from this typeclass on such an HList will return the value with type t.

Now, we should define two general instances of Member, with the first one corresponding to the case where the first label in the list is the one we are looking for, and the second instance being the recursive case. This is just how we would write a lookup function at the value level. We use T.== from the Data.Type.Equality module for testing equality of symbols at the type level. This module is qualified as T to avoid naming conflicts with other top-level names defined in this post.

instance Member s (KV s t ': tail) t where
  select (HCons (V t) _) = t
instance ((s T.== s') ~ 'False, Member s tail t) => Member s (KV s' t' ': tail) t where
  select (HCons _ xs) = select @s @tail @t xs

This passes the typechecker just fine, until we try to use it. And we are hit with the overlapping instance error message from ghci:

• Overlapping instances for Member
                              "item"
                              '[KV "item" String,
                                KV "quantity" Int,
                                KV "price" Float,
                                KV "date" String]
                              String

How could these two instances overlap each other? In one case, the first symbol in the list is equal to the one we are looking for. In the other case, the first symbol is different from the one we are looking for. These are two logically separate scenarios!

The answer lies in an important property of the instance resolution process for Haskell: ghc does not check the instance constraints while matching instance heads.

If we stand in ghc’s shoes for a second, and try to see things from its perspective, then the two instance heads for Member defined above are indeed overlapping!

  1. Member s (KV s t ': tail) t
  2. Member s (KV s' t' ': tail) t

Namely, the second instance head is strictly more general than the first instance head.

So… I was thinking too “functionally” instead of “logically” when programming the instances for Member. I can perhaps be a better logical programmer and write two non-overlapping instances to solve this problem. But, I would much rather change the problem itself and keep the same “functional” ideas behind these two instances :)

Recall type families, which are functions at the type level. Type families work more or less as expected when interacting with instance resolution: type families will reduce as much as they can before instance resolution happens. We can exploit behavior, and make these two instances non-overlapping by adding another parameter to the typeclass Member.

class Member (s :: Symbol) xs t (isHead :: Bool) where
  select' :: HList xs -> t

Here, the parameter isHead is a type-level boolean that states whether s is the first label in the list. To compute such a boolean, let’s define a type family:

type family IsHead (s :: Symbol) (xs :: [*]) :: Bool where
  IsHead s (KV s' _ ': _) = s T.== s'
  IsHead s _              = 'False

Now, we are ready to define the two instances!

instance Member s (KV s t ': tail) t 'True where
  select' (HCons (V t) _) = t
instance Member s tail t (IsHead s tail)
  => Member s (KV s' t' ': tail) t 'False where
  select' (HCons _ xs) = select' @s @tail @t @(IsHead s tail) xs

These two instances are definitely not overlapping because their isHead parameters are different. Notice that we also used the type family IsHead in the constraint to the second instance definition in order to search for s recursively down the tail.

However, select' is not very usable because there are many ambiguous type variables when we apply select' to a record, such as t and isHead, but t and isHead can both be computed before instance resolution. The latter is computed by IsHead, so we just need to define a type family Lookup that computes t.

type family Lookup (s :: Symbol) (xs :: [*]) :: * where
  Lookup s '[] = TypeError (('Text "Cannot find the label: ") ':<>: ('Text s))
  Lookup s (KV s' t ': tail) = If (s T.== s') t (Lookup s tail)
  Lookup s (badType ': tail) =
    TypeError (('Text "Expecting a KV in the type list, instead have: ")
               ':<>:
               (ShowType badType))

Now, we can finally define a user-facing select function as the following:

select :: forall s xs. Member s xs (Lookup s xs) (IsHead s xs) => HList xs -> Lookup s xs select = select’ @s @xs @(Lookup s xs) @(IsHead s xs)

And it works just as expected!

*Main> select @"item" extensibleRow
"Apple"
*Main> select @"price" extensibleRow
0.5

You may be wondering what those TypeError calls are in the definition of Lookup. Using type-level features can make ghc output some incredibly obscure error messages difficult for a user to digest; in our case, if we tried to select @"nonexistentField" extensibleRow, ghc would try really hard to search for an instance where s is "nonexistentField", and of course it would fail. This will result in an error message about not being able to deduce such a Member instance.

But in this case, we can catch this error early and report a much better error message in terms of unrecognized label! If we can’t find s in the application of Lookup, then the field label does not exist. So we use the TypeError type family exported from GHC.TypeLits to report a customized type error. This will be much more informative for the users of our extensible records.

Testing out in ghci indeed produces a reasonable error message:

*Main> select @"nonexistentField" extensibleRow
<interactive>:32:1: error:
    • Cannot find the label: nonexistentField

Conclusion

So there we have it, a small exercise of type-safe heterogeneous lists in Haskell. Although we have used HList as the carrier of our discussion throughout the post, I hope I explained what the required language extensions enable. In my opinion, language extensions are what makes Haskell both amazing and confusing. It gives programmers a cutting edge playground with the latest theoretical enhancements, but it can be daunting to see a dozen unfamiliar extensions in the first lines of a Haskell file. This post is by no means comprehensive, as I have only covered the most important extensions I am familiar with so far, and I hope this helps you find the right references when you encounter these language extensions in your own Haskell adventures.