Generate PureScript Data Types From Haskell Data Types

Ong Yi Ren
The Startup
Published in
11 min readJan 13, 2021

--

Fullstack with two purely Functional Programming Language

The need arose when I was writing a full-stack web application with PureScript being the frontend and Haskell being the backend. While these two languages are largely similar, the fact that they are different implies that one cannot share the codes directly between them. For all data types defined in Haskell, I have to rewrite them in PureScript. This seems to be an automatable process so I decided to find a way to let the computer do the work for me! Let’s start with my preferred way of structuring data types in both languages.

PureScript
Dealing with data types in PureScript pretty effortless thanks to its support of row type polymorphism. You could always define the data types as follow

type User = { name :: String, address :: City }
type City = { name :: String, postalCode :: Int }
x = { name : "Richard" , address : y }
y = { name : "Singapore" , postalCode : 10000 }

and access the postalCode of x using dot notation x.address.postalCode, same as what you would normally do in most OOP languages. Updating a field is also simple, by writing z = x { address { postalCode = 20000 }}, you could create a new z with the same property as x , but with a postalCode value of 20000 .

Haskell
In contrast, row type polymorphism isn’t supported in Haskell by default. In Haskell, you are not able to define duplicate record fields in the same module. If you try to define the following in the same module, it won’t compile.

-- type User = { name :: String , address :: City }
-- Not possible to define data type as above in Haskell, instead we must define data as follow
-- Won't work because we have duplicate record field "name"data User = User { name :: String, address :: City}
data City = City { name :: String, postalCode :: Int }

Because of this, it is common to add Hungarian Notation in front of Haskell record fields.

data User = User { uName :: String, uAddress :: City}
data City = City { cName :: String, cPostalCode :: Int }
x = User { uName = "Richard" , uAddress = y }
y = City { cName = "Singapore" , cPostalCode =10000}

To access the cPostalCode of x , you can’t use the dot notation, you would need to either do pattern matching or use the accessor function.

-- Use pattern matching postalCode :: Int
postalCode = case x of
User { uName = _ , uAddress = City { cName = _ , cPostalCode = y}} -> y
-- Use accessor functionpostalCode :: Int
postalCode = (cPostalCode . uAddress) x

To create a new z with different cPostalCode from x

-- Use pattern matching z :: User 
z = case x of
User { uName = a , uAddress = City { cName = b, cPostalCode = _ }} -> User { uName = a , uAddress = City { cName = b, cPostalCode = 20000 }}
-- Use accesor function z :: User
z = x { uAddress = (uAddress x) { cPostalCode = 20000 }}

As you might notice, both prepending of Hungarian Notation and the way of getting/updating record fields can get ugly real quick. One of the solutions to these is to use GHC extension DuplicateRecordFields for the former and lens for the latter. This article summarizes the technique in detail.

Requirements
Now let’s talk about the requirements. I want the computer to auto-generate most data types and type class instances for me because writing them manually is so boring…

1. No fancy types. Even if they exist, they would be minimal and I don't mind rewriting them manually but the majority of my data types are just simple data types.
2. Prefer to utilize row type polymorphisms in Haskell so that I could generate something like type User = { name :: String , address :: City} in PureScript.There are probably libraries/GHC extensions that could help me to achieve that.
3. Auto-generate standard type class’s instances such Eq and Ord.
4. Auto-generate JSON encode/decode instance.
5. Auto-generate optics instances such as Lens , Prisms and Iso.
6. Minimal effort, I would not be able to write a library from scratch, it would be more feasible to leverage the existing libraries whenever possible.
7. My Haskell code needs to convert third-party JSON data into Haskell data types, because I use the approach I mentioned earlier, I need to include the option fieldLabelModifier = drop 1 in Aeson TH to drop the underscore prefix to fit the JSON data.

-- Given a json data {"name": "Richard","city" :{"name": "Singapore", "postalCode": "10000"}. 
-- I need to prepend my record field with _ to use makeFieldsNoPrefix, however if I derive JSON directly without dropping the underscore prefix, the result would be {"_name": "Richard","_city" :{"_name": "Singapore", "_postalCode": "10000"}}
data User = User { _name :: String
, _address :: City
} deriving (Eq,Generic)
data City = City { _name :: String
, _postalCode :: Int
} deriving (Eq,Generic)
makeFieldsNoPrefix ''User
makeFieldsNoPrefix ''City
-- "drop 1" is to drop the underscore prefixderiveJSON defaultOptions {fieldLabelModifier = drop 1 } ''User
deriveJSON defaultOptions {fieldLabelModifier = drop 1 } ''City

First attempt
I started by searching for a way to utilize row-type polymorphism in Haskell. Here is a list of libraries that support extensible records and/or extensible variants. There were three concerns for me.
1. Some libraries seem to be proof of concept and are not being maintained, such as extensible-data and named-records.
2. Some libraries are well maintained but are technically challenging for me, such as vinyl and row-type.
3. None of them has existing libraries which generate PureScript data types.

Second attempt
I found a nice Haskell library purescript-bridge that generates PureScript data types from common Haskell data types. Does it mean the end of the story? Not really. This is because the existing library doesn’t support functionality that could modify the record field when encoding/decoding JSON, this is needed by my 7th requirement. So I dived into the underlying PureScript library purescript-foreign-generic and found that there is an option fieldTransform that is analogous to fieldLabelModifier in Aeson TH, this seems to be a simple fix; by adding an option in the library, I can drop the underscore prefix. However, the library purescript-bridge generates Lens instances for data types and ends up generating duplicate function names when I use the same record fields in two different Haskell data types.

-- Data type definition in Haskelldata User = User { _name :: String
, _address :: City
} deriving (Eq,Generic)
data City = City { _name :: String
, _postalCode :: Int
} deriving (Eq,Generic)
makeFieldsNoPrefix ''User
makeFieldsNoPrefix ''City
-- Generated Code in PureScriptnewtype User = User { _name :: String , _address :: City }
newtype City = City { _name :: String , _postalCode :: Int}
.... Skippedname :: Lens' User name
name = _Newtype <<< prop (SProxy :: SProxy "_name")"
-- Duplicate function namename :: Lens' City name
name = _Newtype <<< prop (SProxy :: SProxy "_name")"

At this point, I should have used the final attempt to solve the issues but I wasn’t aware of it.

Third attempt
I am sharing the third attempt anyway. So I decided to stick with Hungarian Notation, to be precise, I decided to prepend the data constructor for all record fields. In the purescript-bridge library, I added an option to drop the data constructor prefix.

-- Data type definition in Haskelldata User = User { _userName :: String
, _userAddress :: City
} deriving (Eq,Generic)
data City = City { _cityName :: String
, _cityPostalCode :: Int
} deriving (Eq,Generic)
deriveJSON defaultOptions {fieldLabelModifier = \x -> case stripPrefix "_user" x of
Just y -> toLower (take 1 y) <> drop 1 y
Nothing -> x
} ''User
deriveJSON defaultOptions {fieldLabelModifier = \x -> case stripPrefix "_city" x of
Just y -> toLower (take 1 y) <> drop 1 y
Nothing -> x
} ''City
-- Generated code in PureScriptnewtype User = User { _userName :: String, _userAddress :: City }
newtype City = City { _cityName :: String, _cityPostalCode :: Int}
... Skippedinstance encodeUser :: Encode User where
encode = genericEncode $ defaultOptions { unwrapSingleConstructors = false , fieldTransform = \x -> case stripPrefix (Pattern "_user") of
Just y -> toLower (take 1 y) <> drop 1 y
Nothing -> x}
instance decodeUser :: Decode User where
decode = genericDecode $ defaultOptions { unwrapSingleConstructors = false , fieldTransform = \x -> case stripPrefix (Pattern "_user") of
Just y -> toLower (take 1 y) <> drop 1 y
Nothing -> x}
instance encodeCity :: Encode City where
encode = genericEncode $ defaultOptions { unwrapSingleConstructors = false , fieldTransform = \x -> case stripPrefix (Pattern "_city") of
Just y -> toLower (take 1 y) <> drop 1 y
Nothing -> x}
instance decodeCity :: Decode City where
decode = genericDecode $ defaultOptions { unwrapSingleConstructors = false , fieldTransform = \x -> case stripPrefix (Pattern "_city") of
Just y -> toLower (take 1 y) <> drop 1 y
Nothing -> x}
userName :: Lens' User name
userName = _Newtype <<< prop (SProxy :: SProxy "_name")
cityName :: Lens' City name
cityName = _Newtype <<< prop (SProxy :: SProxy "_name")

But I was wrong for 2 reasons.
1. I was confused, in the case of sum types, how am I going to write the function to drop the data constructor since I have more than one now? Using type constructor to prepend record fields is an option but it feels so weird.

-- Data type definition in Haskell data Foo = Bar { _barName :: String
, _barAddress :: City }
| Bazz { _bazzName :: String
, _bazzAddress :: City
} deriving (Eq,Generic)
-- What should I put inside ?
-- deriveJSON defaultOptions {fieldLabelModifier = ?} ''User
-- I can write this, but it looks weirddata Foo = Bar { _fooName :: String
, _fooAddress :: City }
| Bazz { _fooName :: String
, _fooAddress :: City
} deriving (Eq,Generic)
deriveJSON defaultOptions {fieldLabelModifier = x -> case stripPrefix "_foo" of
Just y -> toLower (take 1 y) <> drop 1 y
Nothing -> x} ''Foo

2. It takes a lot more keystrokes when I create a data type.

This approach is less ideal than the second attempt; in the second attempt, I could at least delete all Lensdefinition and just use Iso to access the fields, Iso is less powerful than Lens but at least I don’t need so many keystrokes just to define a data type?

Final attempt
I thought of the function makeFieldsNoPrefix in Lens TH , how come it could use the same function name without any issue. Turns out, how makeFieldsNoPrefix functions are as follow:

-- Data type definition in Haskelldata User = User { _name :: String
, _address :: City
} deriving (Eq,Generic)
data City = City { _name :: String
, _postalCode :: Int
} deriving (Eq,Generic)
makeFieldsNoPrefix ''User
makeFieldsNoPrefix ''City
-- Generated code in Haskell, might not look the same but is equivalent toclass HasName s a | s -> a where
name :: Lens' s a
class HasAddress s a | s -> a where
address :: Lens' s a
class HasPostalCode s a | s -> a where
postalCode :: Lens' s a
instance HasName User String where
name = lens (_name :: User -> String)((\x y -> x { _name = y }) :: User -> String -> User)
instance HasName City String where
name = lens (_name :: City -> String)((\x y -> x { _name = y }) :: City -> String -> City)
instance HasAddress User City where
address = lens _address (\x y -> x { _address = y })
instance HasPostalCode City Int where
postalCode = lens _postalCode (\x y -> x { _postalCode = y })

The existing purescript-bridge didn’t use a has-typeclass approach so I decided to change its implementation to resemble the approach in Haskell.

-- Generated code in PureScriptclass HasName s a | s -> a where
name :: Lens' s a
class HasAddress s a | s -> a where
address :: Lens' s a
class HasPostalCode s a | s -> a where
postalCode :: Lens' s a
instance hasNameUser :: HasName User String where
name = _Newtype <<< prop (SProxy :: SProxy "_name")
instance hasNameCity :: HasName City String where
name = _Newtype <<< prop (SProxy :: SProxy "_name")
instance hasAddressUser :: HasAddress User City where
address = _Newtype <<< prop (SProxy :: SProxy "_address")
instance hasPostalCodeCity :: HasPostalCode City Int where
postalCode = _Newtype <<< prop (SProxy :: SProxy "_postalCode")

Not there yet
Now, I could use my approach in Haskell which allows me to use duplicate fields and just drop the underscore prefix to fit my JSON data, I could do the same for PureScript also, right? Almost there. However, when there are JSON fields with key name type , we can’t use _type as record field in Haskell, this is because the generated function with the name type would not work since type is a built-in keyword. A workaround is to use __type as record fields in Haskell and drop 2 prefixes. This means that while most of the time dropping 1 prefix works, there are edge cases where the method doesn’t work.

-- This would not work bcs it would attempt to generate a function named "type"-- Data type definition in Haskelldata Foo = Foo { _type :: String 
} deriving (Eq,Generic)
deriveJSON defaultOptions {fieldLabelModifier = drop 1} ''Foo
makeFieldsNoPrefix ''Foo
-- Generated lens function in Haskell, this would not work bcs type is a reserved keywordclass HasType s a | s -> a where
type :: Lens' s a
instance HasType Foo String where
type = lens _type (\x y -> x { __type = y })

A workaround is

-- Data type definition in Haskelldata Foo = Foo { __type :: String
} deriving (Eq,Generic)
deriveJSON defaultOptions {fieldLabelModifier = (\x -> if x == "_type" then "type" else x ) . drop 1} ''Foo
makeFieldsNoPrefix ''Foo
-- Generated lens function in Haskellclass Has_type s a | s -> a where
_type :: Lens' s a
instance Has_type Foo String where
_type = lens __type (\x y -> x { __type = y })

Besides that, there is also a possibility that we might define multiple identical typeclass definitions in different files and result in conflict. The workaround is to put all typeclass definitions in a single file.

Other Concern
The existing purescript-foreign-generic library doesn’t support all options which Aeson TH supports. For example the SumEncoding option Untaggedvalue isn’t supported in purescript-foreign-generic, I am not sure if it is not yet implemented or not implemented for some reason. Besides purescript-foreign-generic, there are alternative libraries such as purescript-simple-json which uses the same underlying library generic-rep as purescript-foreign-generic, this document explains how to get it done.
There is also a discussion in Purescript Discourse which is related to this topic.

Conclusion
In this article, I share my attempts and imperfect, opinionated but functional approach on how to generate PureScript data types from Haskell datatypes. As a human, we shan’t do repetitive stuff which could be better done by machine in a much shorter time with no error. Do it twice is do it once many times!

Edited on 30 August 2021: I found a better approach on how to utilize DuplicateRecordFields extension and use it to replace the approach mentioned in this article. Basically, the steps are

1. Do not use underscore as prefix to generate lens instances and     aeson instance
2. Use generic-lens and OverloadedLabels extension instead

You may check out a working example here.

PS: Feel free to point out any mistakes.

--

--