July 9, 2021
Written by Artyom Kazak @ Monadfix
For a while, reinventing monad transformers was a fun weekend activity. Then the community has switched to van Laarhoven lenses and raytracers (or were raytracers always there? I forgot). Anyway, I propose a new weekend project: array-backed anonymous records.
I am not going to walk you through the entire implementation, just enough to get you going.
I want to have record types without data
. Something like this:
profile :: { id :: Int, name :: Text, ... }
profile = { id = 13, name = "Alexandra Botez", ... }
So, something like tuples, but with field names.
I want to have O(1) field access — so, the field values should be in an array rather than a list. Otherwise it's going to be too easy (apologies if I'm jumping ahead and all the Symbol
stuff is new to you). Something like this:
data KV (key :: Symbol) val = KV val
data Rec xs where
RNil :: Rec '[]
RCons :: val -> Rec xs -> Rec (KV key val ': xs)
profile :: Rec '[KV "id" Int, KV "name" Text]
profile = RCons 13 (RCons "Alexandra Botez" RNil)
We have O(1) cons, but O(n) access. Usually access is much more common, so I'd rather have O(1) access and O(n) cons.
Also, array-backed records are more complex and therefore more fun. A good mix of type-level stuff and "hey you actually have to mutate an array".
Vector
. Access should be several times faster than with a hashmap, even for tiny records.Array
does not support O(1) slices and therefore uses less memory than a Vector
. Its cousin SmallArray
uses less memory still. Both are more obscure and low-level than Vector
so there's extra fun involved.SmallArray
becomes more expensive to garbage collect when you hit the 128 element limit, so you'll need to figure out how to transparently upgrade it to an Array
once you hit that limit. (I didn't do this.)My experience is primarily with superrecord
, which I've had to modify for a project at Juspay. Since it uses SmallArray
, I'll go with that too:
data Rec (lts :: [*]) = MkRec { _unRec :: SmallArray# Any }
Any
is something like Dynamic
but without carrying the type information alongside. You can cast any lifted type in and out of Any
with unsafeCoerce
, but there are no guardrails if you accidentally get the type wrong.
The easiest way is the KV
approach we've seen above. Just tag values with a Symbol
.
However, it's better to have some type-level guarantees that you've listed the values in the right order when constructing the record. Ideally, we want field names both in the record type and in the record body.
Something like this:
data FldProxy (t :: Symbol) = FldProxy
data label := value = FldProxy label := value
And then we can write FldProxy @"id" := 13
on the value level, which is ugly for sure unless you know about overloaded labels. Then we can define an IsLabel
instance:
instance l ~ l' => IsLabel (l :: Symbol) (FldProxy l') where
fromLabel = FldProxy
And now suddenly our world is filled with sunshine. We can write key–value pairs like this: #id := 13
. Sweet.
On the type-level, we still have the same key–value pairs, but with :=
instead of KV
and it's just nicer: Rec '["id" := Int, "name" := Text]
.
There are a few options.
The easiest is to provide two functions — one for creating an empty record and another for consing a new field to it. This means O(n2) construction, though, because we'll have to reallocate the array at every step.
A slightly trickier option is to define a separate type for key–value list records, and a function that would convert a list-based record into an array-backed record. Internally it would count the elements and then allocate the array of exactly the right size. This is the approach of vinyl's ARec
.
The option I went with is — falling back on the tuple syntax. It is great to be able to write something like this:
mkRec (#id := 13, #name := "Alexandra Botez")
Unfortunately, it means defining instances for all possible tuple sizes (currently up to 62). But hey, you can write a script or a Template Haskell bit to generate those.
A pattern synonym makes this syntax even more nice:
pattern Rec :: RecTuple tuple fields => tuple -> Rec fields
pattern Rec a <- (toTuple -> a)
where Rec = fromTuple
Rec (#id := 13, #name := "Alexandra Botez")
One more option is to implement a Template Haskell based construction function that would use haskell-src-exts
to parse a fake record and convert it into an anonymous record. Something like this: [rec|X {id = 13, name = "Alexandra Botez"}|]
. Mark Karpov's Template Haskell tutorial should get you going, and if you get stuck with haskell-src-exts
, Well-Typed's large-records
library should serve as inspiration. (Finally — yes of course you shouldn't be using haskell-src-exts
, but GHC's native quasiquote parser doesn't like references to undefined field names.)
This is going to be the easiest operation, and I'll show how to implement it. All other operations are on you, though.
First, write a type family that would figure out a field's index in the record's fieldlist (lts
):
type family RecIndex (i :: Nat) (l :: Symbol) (lts :: [*]) :: Nat where
RecIndex idx l (l := t ': lts) = idx
RecIndex idx m (l := t ': lts) = RecIndex (1 + idx) m lts
-- Optional but nice
RecIndex idx m '[] = TypeError ('Text "Could not find label " ':<>: 'Text m)
The last case is a custom type error. When hitting it, GHC's error message will say Could not find label name
rather than Could not match RecIndex ...
or whatever.
Then just grab a value at that index. Since we are using a raw SmallArray#
, we'll need a lot of #
-es. If you are using Vector
, you won't need them.
unsafeGet :: forall lts. Int -> Rec lts -> Any
unsafeGet (I# index#) (MkRec vec#) =
let size# = sizeofSmallArray# vec#
-- NB: I decided to physically store the values in the opposite order,
-- but forgot why
in case indexSmallArray# vec# (size# -# index# -# 1#) of
(# a# #) -> a#
{-# INLINE unsafeGet #-}
get :: forall l v lts. Has l lts v => FldProxy l -> Rec lts -> v
get _ r =
let !index = fromIntegral (natVal' (proxy# :: Proxy# (RecIndex 0 l lts)))
in unsafeCoerce $ unsafeGet index r
{-# INLINE get #-}
The Has
type family says that a field with a certain name and value type has to be contained in a fieldlist:
type Has l lts v =
( RecTy l lts ~ v,
, KnownNat (RecIndex 0 l lts) )
-- | Get value type if you have the name
type family RecTy (l :: Symbol) (lts :: [*]) :: k where
RecTy l (l := t ': lts) = t
RecTy q (l := t ': lts) = RecTy q lts
Is the fun creeping in already?
set
, modify
, instance Eq
, instance Ord
.insert
, delete
, union
, intersection
. You'll have to make decisions about whether you want to allow duplicates or not, and how you want to handle them.Generic
instance (which would give you JSON encoding/decoding and a bunch of other things for free),Look at the jrec
library if you get stuck.
superrecord
if you want to go via this route.red-black-record
does. Awful compilation times, but fun. Probably fun.Either
) using the same type-level machinery you've had to painstakingly implement for records.Enjoy!