Taming Heterogeneous Lists in Haskell
The list data type is perhaps the most frequently used functional data structure in Haskell. Heterogeneous lists are used less frequently compared to the usual lists, simply because we don’t need a list of values from different types all that often. However, they do show up from time to time. In particular, heterogeneous lists are a very natural model of rows from a database table. For example, if I had a database table of my spending history:
Item | Quantity | Unit Price | Purchase Time |
---|---|---|---|
Apple | 10 | 0.5 | 2018/1/1 |
Coffee | 2 | 3 | 2018/1/2 |
Lunch | 1 | 6 | 2018/1/2 |
Text Book | 1 | 100 | 2018/1/2 |
I can model each row of this database by a 4-element heterogeneous list that contains the item name, the quantity, the unit price and the purchase time.
Although heterogeneous lists are flexible in the types of values they contain, it is not immediately clear how to define useful higher-order functions such as map
and foldr
over them. The type of map :: (a -> b) -> [a] -> [b]
does not make sense with heterogeneous lists, since each element in a heterogeneous list can have a completely different type. How can we solve this problem? Is it possible to define higher-order functions for heterogeneous lists just as we would for ordinary lists?
I will explore how to construct such a solution from the ground up in this post. Many ideas in this post are well-established results from the Strongly Typed Heterogeneous Collections paper. But I want to present the important ingredients assuming only experience with vanilla Haskell. Hacking at the type level requires us to enable many language extensions from ghc
. When I first encountered these extensions, I had no idea what they entailed and what combinations of them allow me to do what I want. As we walk through defining and writing functions over heterogeneous lists, I will pay close attention to the important language extensions that enable our solution. I hope this post would serve as a guide on basic type level programming to my past self, and also to anyone else who interested in tapping into the power of the Haskell type system.
Defining Heterogeneous List data type
To define the HList
data type, one of the first features we need to enable is GADTs
, which stands for generalized algebraic data types. To enable this extension, we use the following pragma in our Haskell source code
{-# LANGUAGE GADTs #-}
We have all heard of algebraic data types since those are the default user-definable data types. GADTs give us the ability to provide a more precise type signature for each constructor under definition. One of the canonical use cases of GADTs is typed abstract syntax trees. To present the power of GADTs, let’s take a little detour and assume we are building the abstract syntax trees for a simple expression language that operates over strings and integers. We might define its syntax with the following declaration:
data Expr = SLit String
| SAppend Expr Expr
| ILit Int
| IAdd Expr Expr
The four constructors represents string literals, string append operation, integer literals and integer addition operation respectively. We could then implement an interpreter for this language:
interp :: Expr -> Either String Int
interp (SLit s) = Left s
interp (SAppend s1 s2) =
case (interp s1, interp s2) of
(Left s1', Left s2') -> Left $ s1' ++ s2'
_ -> error "Type mismatch!"
interp (ILit i) = Right i
interp (IAdd i1 i2) =
case (interp i1, interp i2) of
(Right i1', Right i2') -> Right $ i1' + i2'
_ -> error "Type mismatch!"
This works as expected:
*Main> interp (ILit 10 `IAdd` ILit 20)
Right 30
*Main> interp (SLit "foo" `SAppend ` SLit "bar")
Left "foobar"
However, there are two glaring uses of error
in the definition of interp
! There are many terms in in the Expr
type that encode computations that does not make sense in this little language, such as appending two integers, or adding an integer and a string.
Here is where GADTs shine. We can index the type Expr
by the sort of values its computation produces, and since GADTs allows us to give type signatures for each constructor, we can restrict the constructors for append and addition to only take string-producing and integer-producing computations. Putting this into code, we have a new definition for Expr
.
data Expr a where
SLit :: String -> Expr String
SAppend :: Expr String -> Expr String -> Expr String
ILit :: Int -> Expr Int
IAdd :: Expr Int -> Expr Int -> Expr Int
Notice that each constructor explicitly mentions its sort of values. Now, we can define an interpreter in the usual way:
interp :: Expr a -> a
interp (SLit s) = s
interp (SAppend s1 s2) = interp s1 ++ interp s2
interp (ILit i) = i
interp (IAdd i1 i2) = interp i1 + interp i2
This interp
is better than the old interp
in several ways:
- It gets rid of
error
; - Its type signature is much simpler: instead of having
Either String Int
, we just havea
; - Its definition is also much simpler: we don’t need to pattern match on the result of recursive calls to
interp
.
All of these improvements come from the type signatures we added to the constructors for Expr
: we have encoded the “correctness” of an Expr
term into its constructors, and ruled out all of the invalid ones with GADTs.
This can be verified by trying to construct an invalid term in ghci
:
*Main> SLit "foo" `IAdd` SLit "bar"
<interactive>:10:1: error:
• Couldn't match type ‘[Char]’ with ‘Int’
Expected type: Expr Int
Actual type: Expr String
• In the first argument of ‘IAdd’, namely ‘SLit "foo"’
In the expression: SLit "foo" `IAdd` SLit "bar"
In an equation for ‘it’: it = SLit "foo" `IAdd` SLit "bar"
<interactive>:10:19: error:
• Couldn't match type ‘[Char]’ with ‘Int’
Expected type: Expr Int
Actual type: Expr String
• In the second argument of ‘IAdd’, namely ‘SLit "bar"’
In the expression: SLit "foo" `IAdd` SLit "bar"
In an equation for ‘it’: it = SLit "foo" `IAdd` SLit "bar"
Here, ghci
is telling us that it expects the two operands of IAdd
to have type Expr Int
, but we gave it two Expr String
instead. This is what we expect, thanks to the type signature of IAdd
.
So how does all of this help us defining a heterogeneous list? Just like we indexed Expr
with the sort of computations it produces, we need to index a heterogeneous list with a list of the types it contains. We are used to lists at the value level in Haskell, there is actually also support for type-level lists in GHC as well. A type-level list is a list that contains types instead of values. For example, '[Int, Float, Bool]
is a type-level list. In fact, a large family of types enjoy automatic promotion to the type level through a general mechanism called DataKinds
. To enable this mechanism, we use the pragma
{-# LANGUAGE DataKinds #-}
A kind can be though of as the “type of a type”. All of our familiar types such as Int
, Bool
, String
, Float
and any other value-containing type has the kind *
, pronounced star. The arrow type constructor for functions ->
has kind * -> * -> *
, which means it takes two value-containing types, and produces another value-containing type. But there can be kinds other than *
and combinations of *
and arrows. By enabling DataKinds
, all the promotable types become kinds in the type system, and the constructors of the type become type constructors in the promoted kind. DataKinds
came from the paper Giving Haskell a Promotion, and it’s a delightful read for those interested in the theoretical details behind DataKinds
.
With GADTs and DataKinds, we are ready to define the heterogeneous list data type, which we will call HList
.
data HList :: [*] -> * where
HNil :: HList '[]
HCons :: x -> HList xs -> HList (x ': xs)
The two constructors correspond to the empty list and a cons cell respectively. This is just like the definition of the usual list type. The indices to HList
is what makes it special. In the previous example for GADTs with Expr
, the index to Expr
is a value-containing types of kind *
. However, the index for HList
has the kind [*]
, a type-level list of value-containing types.
In the HNil
constructor, the index is an empty type-level list written '[]
. The tick in the front is used to distinguish the value level empty list from the type-level empty list.
In the HCons
case, the arguments to the constructor is a value of type x
and another HList
whose values have types in the type-level list xs
. Naturally, this should construct an HList
whose values have types in x ': xs
.
To make the notations easier to work with, let’s introduce an infix operator instead of writing HCons
everywhere:
infixr 5 %:
(%:) :: x -> HList xs -> HList (x ': xs)
(%:) = HCons
We can now build a heterogeneous list that represents a row from the table shown earlier.
*Main> let row = "Apple" %: 10 %: 0.5 %: "2018/1/1" %: HNil
*Main> :t row
row :: (Num x1, Fractional x) => HList '[[Char], x1, x, [Char]]
Since we did not specify the type for the literals 10
and 0.5
explicitly, ghci
tried to infer the most general type for us.
Unfortunately, we don’t have a Show
instance for HList
s yet, and ghci
refuses to print row
for us for that reason. Let’s fix this next.
Show
and Eq
for HList
, inductively
We may have seen proofs by induction in math classes before: to prove a property P for all natural numbers, we first prove P for 0, and then assume P holds for n as an inductive hypothesis followed by a proof of P for the successor n+1.
Proof by induction in fact applies to many things other than natural numbers. In particular, it applies to our HList
type. The general idea is that in order to prove some property P for all values of a particular type, we just need to prove P for all possible ways to construct values in this type assuming the appropriate inductive hypothesis. For HList
s, the HNil
and HCons
constructors are all the possible ways of construction.
Although we are not proving anything for HList
s here, we can consider defining Show
instances as “proving” to the compiler that we can Show
heterogeneous lists.
Following this intuition, we need to define Show
instances separately for the empty case and cons case. This is easy for the empty case:
instance Show (HList '[]) where
show HNil = "HNil"
For the non-empty case, we might try something like this:
instance Show (HList (x ': xs)) where
show (HCons x xs) = ...
But we are stuck here. We would like to recursively call show
on x
and xs
, but since we don’t have a Show
instance for x
and the tail of the list xs
. This suggests we need to add some constraints to the instance head here:
instance (Show x, Show (HList xs)) => Show (HList (x ': xs)) where
show (HCons x xs) = show x ++ " %: " ++ show xs
Notice how the constraint Show (HList xs)
is just like an inductive hypothesis here! Now, ghci
will happily show
the row
value we defined just now:
*Main> row
"Apple" %: 10 %: 0.5 %: "2018/1/1" %: HNil
There are two language extensions that we needed to enable here to have the compiler accept these two instance definitions:
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
By default, instance heads must be of the form C (T a1 a2 ... an)
where C
is the typeclass name, T
is a type constructor and each of ai
are distinct type variables. In our Show
instances, HList
is the T
here, but '[]
and x ': xs
are not type variables.
Similarly, by default the constraints for instances must have the form C x
where x
is a type variable.
These two restrictions are in place to help ensure termination of instance resolution, which is undecidable in the most general case. By enabling the two Flexible
extensions, we relax these two restrictions.
We can define an Eq
instance for HList
following the same principle:
instance Eq (HList '[]) where
HNil == HNil = True
instance (Eq x, Eq (HList xs)) => Eq (HList (x ': xs)) where
(HCons x xs) == (HCons y ys) = x == y && xs == ys
Higher-order Functions
Just having HList
values is not interesting if we can’t perform the usual list operations on them. If we can implement map
and foldr
over HList
s, then we can implement many other operations based on map
and foldr
.
Since the usual type for map
does not apply to HList
s, what can we do? Intuitively, we need to supply map
over HList
with a polymorphic function that accepts values from any type in the heterogeneous list as an argument. But we don’t have any constraints over what types can appear in an HList
from its construction, and any function that’s polymorphic enough to operate over values of any type is likely not interesting enough for general computations over HList
s.
To overcome this dilemma, we can apply the idea of de-functionalization. De-functionalization means using opaque symbols to represent the function being used in some environment, thus adding an extra level of indirection. This idea can be traced back to the seminal paper Definitional interpreters for higher-order programming languages by John Reynolds.
To implement this idea, we first introduce a new typeclass called Apply
:
class Apply f a b where
apply :: f -> a -> b
Here, the type variable f
is the symbol representing a function, and the type variable a
is the type of the argument, and the type variable b
is the type of the result from applying f
to a
. This typeclass is different from the other ones we have seen so far, it contains 3 type variables instead of just a single type variable. Indeed, the language extension
{-# LANGUAGE MultiParamTypeClasses #-}
allows us to create these typeclass declarations. As its name suggests, it enables the definition of typeclasses with multiple parameters.
The idea behind Apply
is that we will ask the typechecker to search for an Apply
instance at each application of the function mapping over an HList
, and similarly for folds as well. This suggests map and fold should be functions of a typeclass instead of standalone functions for HList
s so we can use Apply
as a constraint. Let’s define these two typeclasses:
class MapH f xs ys where
mapH :: f -> HList xs -> HList ys
class FoldrH f acc xs where
foldrH :: f -> acc -> HList xs -> acc
For MapH
, we have a function symbol f
which will be applied to each element in the list HList xs
, producing another list HList ys
. For FoldrH
, we have a function symbol f
which will be used to combine an accumulator with each element of the list HList xs
, and returns the final value of the accumulator.
The instance of these two typeclasses are easy to define for empty lists:
instance MapH f '[] '[] where
mapH _ _ = HNil
instance FoldrH f acc '[] where
foldrH _ acc _ = acc
For non-empty lists, we need to use the Apply
typeclass defined earlier as a constraint, so we know how to apply f
to each element:
instance (Apply f x y, MapH f xs ys)
=> MapH f (x ': xs) (y ': ys) where
mapH f (HCons x xs) = apply f x %: mapH f xs
instance (Apply f x (acc -> acc), FoldrH f acc xs)
=> FoldrH f acc (x ': xs) where
foldrH f acc (HCons x xs) = apply f x $ foldrH f acc xs
Let’s test these functions out by implementing a better pretty printer for HList
s. We may want the same syntax [x, y, ..., z]
as for ordinary lists when printing HLists
, instead of printing each cons with the symbol %:
. First, we need a function symbol for converting things to String
through mapH
:
data Pretty = Pretty
instance Show x => Apply Pretty x String where
apply _ x = show x
Next, we can use the foldrH
function to combine the outputs from mapH
. We need another symbol for combining these strings into the list format:
data Concat = Concat
instance Apply Concat String (String -> String) where
apply _ x = \y -> x ++ ", " ++ y
Now, pretty printing a list is just first mapping Pretty
over the list, and then folding the result with Concat
.
prettyHList xs = "[" ++ (foldrH Concat "" $ mapH Pretty xs) ++ "]"
However, although this is what we would like to do operationally, the compiler needs a little more guidance on the intermediate result of mapH Pretty xs
since it uses the type of that result to select an instance for MapH
. In particular, we need to tell the compiler that the intermediate result is an HList
whose type index is a list of String
types.
Here, we need another language extension called TypeFamilies
. Type families are functions at the type-level: functions that take types and return another type. We can enable it with the pragma:
{-# LANGUAGE TypeFamilies #-}
Let’s define a type family called ConstMap
that takes a list of types, and returns another list of the same length, but replaces each type in the original list by some given type.
type family ConstMap (t :: *) (xs :: [*]) :: [*] where
ConstMap _ '[] = '[]
ConstMap t (x ': xs) = t ': (ConstMap t xs)
Now, we can define prettyHList
, with a type signature that instructs the compiler which instance of mapH
it should select.
prettyHList :: forall xs.
( MapH Pretty xs (ConstMap String xs)
, FoldrH Concat String (ConstMap String xs)
) => HList xs -> String
prettyHList hlist =
"[" ++ (foldrH Concat "" $ mapH @_ @_ @(ConstMap String xs) Pretty hlist) ++ "]"
In the definition of prettyHList
, we used two other language extensions:
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
The first extension brings the type variables in the signature into scope for the definition of prettyHList
, as long as we use an explicit forall
to declare them.
The second extension allows us to explicitly unify type variables with a type of our choice. The types of our choice are prefixed by the @
sign. We can also supply an underscore as a wildcard type, this would still allow the compiler to infer the type instead of unifying with a specific choice. This is useful when the type variable we would like to unify does not appear first, as is the case in mapH @_ @_ @(ConstMap String xs)
: we would like to unify the ys
type variable, but it appears last for mapH
.
Now, let’s define a better show instance with prettyHList
:
instance ( MapH Pretty xs (ConstMap String xs)
, FoldrH Concat String (ConstMap String xs)
) => Show (HList xs) where
show = prettyHList
Now, HList
s are rendered in ghci just like regular lists:
*Main> row
["Apple", 10, 0.5, "2018/1/1", ]
A final remark is that in fact we need an extension called UndecidableInstaces
for the Apply
, MapH
and FoldrH
instances defined earlier. This is because the constraints for these instances are “larger” than the instance themselves, which can potentially lead to an infinite loop during instance resolution. By enabling this extension, we are telling the compiler to rest assured that instance resolution will terminate.
Overcoming overlapping instances
In the final section, I would like to write about some tricks we can use to deal with overlapping instances in Haskell. To motivate the problem of overlapping instances, let’s consider the extensible record problem.
Extensible records are maps from labels to values of a certain type, and they are extensible in the sense that we can extend any record with another label-value pair.
Of course, we can implement such a data structure with Data.Dynamic
and Data.Map
, using labels as keys to the map, and convert everything to a Dynamic
for storage in the map. But ideally, we want a solution that performs less runtime checks.
Recall earlier that Haskell has many different kinds of types, such as *
for value-containing types, [*]
for type-level lists, etc. Haskell also has a special kind called Symbol
from the module GHC.TypeLits
, which is the kind for type-level strings.
We can use type-level strings from the Symbol
kind as labels for extensible records with a key-value pair type like this:
data KV :: Symbol -> * -> * where
V :: a -> KV s a
The KV
type has two type indices: a key s
and a type a
, with a
being the type of value it contains.
Now, an extensible record is simply an HList
of these KV
values, and it is perfectly extensible since we can just HCons
another KV
to the list :)
extensibleRow :: HList '[KV "item" String, KV "quantity" Int, KV "price" Float, KV "date" String]
extensibleRow = V "Apple" %: V 10 %: V 0.5 %: V "2018/1/1" %: HNil
Here, extensibleRow
contains the same values as the row
variable defined above, but also contains the labels of the original table.
As with any record, we would like to select a certain field from it. We can define a select
function using another typeclass.
class Member (s :: Symbol) xs t where
select :: HList xs -> t
Here, an instance of the typeclass Member s xs t
says s
should only hold if s
is a label in xs
, and it maps to a value of type t
. Using the select
function from this typeclass on such an HList
will return the value with type t
.
Now, we should define two general instances of Member
, with the first one corresponding to the case where the first label in the list is the one we are looking for, and the second instance being the recursive case. This is just how we would write a lookup
function at the value level. We use T.==
from the Data.Type.Equality
module for testing equality of symbols at the type level. This module is qualified as T
to avoid naming conflicts with other top-level names defined in this post.
instance Member s (KV s t ': tail) t where
select (HCons (V t) _) = t
instance ((s T.== s') ~ 'False, Member s tail t) => Member s (KV s' t' ': tail) t where
select (HCons _ xs) = select @s @tail @t xs
This passes the typechecker just fine, until we try to use it. And we are hit with the overlapping instance error message from ghci
:
• Overlapping instances for Member
"item"
'[KV "item" String,
KV "quantity" Int,
KV "price" Float,
KV "date" String]
String
How could these two instances overlap each other? In one case, the first symbol in the list is equal to the one we are looking for. In the other case, the first symbol is different from the one we are looking for. These are two logically separate scenarios!
The answer lies in an important property of the instance resolution process for Haskell: ghc
does not check the instance constraints while matching instance heads.
If we stand in ghc
’s shoes for a second, and try to see things from its perspective, then the two instance heads for Member
defined above are indeed overlapping!
Member s (KV s t ': tail) t
Member s (KV s' t' ': tail) t
Namely, the second instance head is strictly more general than the first instance head.
So… I was thinking too “functionally” instead of “logically” when programming the instances for Member
. I can perhaps be a better logical programmer and write two non-overlapping instances to solve this problem. But, I would much rather change the problem itself and keep the same “functional” ideas behind these two instances :)
Recall type families, which are functions at the type level. Type families work more or less as expected when interacting with instance resolution: type families will reduce as much as they can before instance resolution happens. We can exploit behavior, and make these two instances non-overlapping by adding another parameter to the typeclass Member
.
class Member (s :: Symbol) xs t (isHead :: Bool) where
select' :: HList xs -> t
Here, the parameter isHead
is a type-level boolean that states whether s
is the first label in the list. To compute such a boolean, let’s define a type family:
type family IsHead (s :: Symbol) (xs :: [*]) :: Bool where
IsHead s (KV s' _ ': _) = s T.== s'
IsHead s _ = 'False
Now, we are ready to define the two instances!
instance Member s (KV s t ': tail) t 'True where
select' (HCons (V t) _) = t
instance Member s tail t (IsHead s tail)
=> Member s (KV s' t' ': tail) t 'False where
select' (HCons _ xs) = select' @s @tail @t @(IsHead s tail) xs
These two instances are definitely not overlapping because their isHead
parameters are different. Notice that we also used the type family IsHead
in the constraint to the second instance definition in order to search for s
recursively down the tail.
However, select'
is not very usable because there are many ambiguous type variables when we apply select'
to a record, such as t
and isHead
, but t
and isHead
can both be computed before instance resolution. The latter is computed by IsHead
, so we just need to define a type family Lookup
that computes t
.
type family Lookup (s :: Symbol) (xs :: [*]) :: * where
Lookup s '[] = TypeError (('Text "Cannot find the label: ") ':<>: ('Text s))
Lookup s (KV s' t ': tail) = If (s T.== s') t (Lookup s tail)
Lookup s (badType ': tail) =
TypeError (('Text "Expecting a KV in the type list, instead have: ")
':<>:
(ShowType badType))
Now, we can finally define a user-facing select
function as the following:
select :: forall s xs. Member s xs (Lookup s xs) (IsHead s xs) => HList xs -> Lookup s xs select = select’ @s @xs @(Lookup s xs) @(IsHead s xs)
And it works just as expected!
*Main> select @"item" extensibleRow
"Apple"
*Main> select @"price" extensibleRow
0.5
You may be wondering what those TypeError
calls are in the definition of Lookup
. Using type-level features can make ghc
output some incredibly obscure error messages difficult for a user to digest; in our case, if we tried to select @"nonexistentField" extensibleRow
, ghc
would try really hard to search for an instance where s
is "nonexistentField"
, and of course it would fail. This will result in an error message about not being able to deduce such a Member
instance.
But in this case, we can catch this error early and report a much better error message in terms of unrecognized label! If we can’t find s
in the application of Lookup
, then the field label does not exist. So we use the TypeError
type family exported from GHC.TypeLits
to report a customized type error. This will be much more informative for the users of our extensible records.
Testing out in ghci
indeed produces a reasonable error message:
*Main> select @"nonexistentField" extensibleRow
<interactive>:32:1: error:
• Cannot find the label: nonexistentField
Conclusion
So there we have it, a small exercise of type-safe heterogeneous lists in Haskell. Although we have used HList
as the carrier of our discussion throughout the post, I hope I explained what the required language extensions enable. In my opinion, language extensions are what makes Haskell both amazing and confusing. It gives programmers a cutting edge playground with the latest theoretical enhancements, but it can be daunting to see a dozen unfamiliar extensions in the first lines of a Haskell file. This post is by no means comprehensive, as I have only covered the most important extensions I am familiar with so far, and I hope this helps you find the right references when you encounter these language extensions in your own Haskell adventures.