Haskell and Type Theory for Real-World Applications

An Ideal Data-Model-First Development Approach

Can we have a better real-world application development experience with a better type system?

Anton Antich
Superstring Theory

--

Any real-world application uses a database or seventeen in the backend. Developing and maintaining such applications has largely remained a nightmare experience, despite many attempts to fix the problem. Enterprise Java Beans, Ruby-on-Rails, Meteor — many tried, but none succeeded in achieving a seemingly straightforward and very desirable goal:

I want to model my data types first and then automagically receive an ability to work with them in the business logic layer, in the persistence layer, in the UI layer, etc, without writing huge amounts of error-prone boilerplate code

Let’s look at a simplified example — a blog platform application. We have a Person type that deals with information about our bloggers and a Post type that represents blog posts. There is a “one-to-many” relationship between a Person and a Post so that we always know who is the author of a given post or can retrieve all posts for a given author.

Data Types For a Simple Blogging App

In the ideal world, the picture above should represent (almost) all we need to design manually — two data types (with more fields than shown, of course) and a relationship between them. I shouldn’t need to worry about or write a ridiculous amount of boilerplate code to take care of storing my data persistently or using it on the front-end, be it in the web browser or some native UI. Instead, it should be elegantly generated for me by a mechanism similar to Haskell typeclasses (Java generics or C++ templates give you some pale idea of what is possible in Haskell but are much less powerful).

In the real world, I must write boilerplate. To persist the data, I need to add id to all my types — absolutely useless clutter in the business logic code! — and learn the API of each new database separately. UI? Don’t get me started. I have to manually code all of the layers below, from data types to functions:

Pure Types, Persistence Layer, UI

But it’s just that — annoying boilerplate! My pure types and functions on them [should] already describe everything I need in my application domain to solve my business task(s). There are beautiful and elegant mathematical abstractions buried somewhere deep in this picture that should allow us to take care of the boilerplate and generate all the needed “service” types and conversion functions between Forms, Pure Types, Entities, etc. Defining these abstractions and empowering developers with them should make us more productive and our code more maintainable.

Ironically, Meteor has arguably come the closest to this goal from the practical point of view. Using Typescript, you can have your data types modeled properly and use them both on the backend and frontend. However, this is Typescript, which tries to hide all the imperative JavaScript ugliness under a thin pretend cover of a relatively primitive type system. You can’t even have a Functor or other parametric types for that matter, so you can’t provide any reasonable level of abstraction to model more complex systems or databases other than MongoDB that Meteor uses currently.

One approach to solving this with Types

It’s a sketch, not a detailed and technically correct proposal. I’m simply sharing ideas, asking questions, and looking for feedback — so please do share this feedback!

In the ideal type-driven environment (which would require dependent types of a full intuitionistic Type Theory, not the System-FC Haskell uses), I would like to be able to define my types similar to (in pseudo-Haskell):

data Person = Person {
name :: String,
dob :: Date,
... }
data Post = Post {
title :: String,
body :: String,
createdAt :: Date,
... }
type WhoIsAuthor = Post -> Person
type FindPosts = Person -> [Post]

And then use some powerful parametric types mechanism to take care of the rest. E.g., I would want to have an Entity parametric type that abstracts away all the CRUD (and more) operations regardless of the storage backend I use, something like:

data Entity a::Type db::DatabaseContext = 
Entity {
data :: a,
<Some magic that adds an "id", table name, etc, and takes care of all the CRUD complexities under the hood>
}

Since the above type depends on the value of the type DatabaseContext (it may also be a whole type with a typeclass, or something called a ∑-type in Type Theory, rather than a value) this construct is, unfortunately, impossible in Haskell. But boy, do I want to have it! (Is it possible in Idris?)

If it were possible, I’d write Entity Post MongoContext or Entity Person AmazonRedshiftContext and boom! — I can save, delete, find, and otherwise manipulate my pure data types without a single line of ugly, repetitive boilerplate. Here, DatabaseContent will hide all the low-level database-related machinery.

However, this only takes care about persisting my pure types, but how about relationships between them? I do want to know who the author of the post is. Here, we would use a more complex multiparameter type class, something like:

instance Database MongoContext Person Post WhoIsAuthor FindPosts where ...

Such a typeclass would look at my types, including the ones that define relationships (WhoIsAuthor and FindPosts), destructure them as needed, and automatically take care of all the ids and APIs for my data model and store it in a given Database (defined by MongoContext in this case). This is where real magic should happen, which is definitely currently impossible, neither in Haskell nor in Idris. It would generate Entity Post MongoContext etc instances under the hood, with the ids and table names, as well as some functions of my relationship types that would use those ids and table names to implement the lookups I need, something like:

whoIsAuthor :: WhoIsAuthor -- i.e., Post -> Person
getAllPosts :: FindPosts -- i.e., Person -> [Post]

Of course, in the real design, we’ll need to create more complex functions based on our pure ones — something like Entity Post MongoContext -> Entity Person MongoContext but as long as it’s done automatically, it still simplifies our life significantly enough.

The problem is, we can’t define such a typeclass, because it’s not only dependent on other types and potentially values, it’s not only multi-parameter, but it must take a variable number of type parameters to be useful in modeling real-world databases. We would have to define it along the lines of:

class Database db::DatabaseContext [Type] [Type -> Type] [Type -> [Type]] where ...

So, it’s a ∑-type that takes a type (or maybe even a value, depends on design) of a DatabaseContext that takes care of the database interfaces, a list of arbitrary pure types (that will be stored in our database tables), and a list of functions between these pure types that define our relationships — and then generates a proper CRUD, search, etc API for our data model.

Now, this is exactly what I want to be able to write. This will provide the same abstraction for any storage backend, and we can use a similar approach for UI elements generation as well!

Unfortunately, no language on earth allows such constructs. My question to you is — would you like to have one? :-)

Now, we can get pretty close to this flexibility in Haskell using Template Haskell, which is already used by some production-ready libraries for databases. However, this is clearly not as clean, concise, and beautiful as a proper dependent type implementation, not by a long shot. Dreadful compilation times and difficulty maintaining are the other two drawbacks of Template Haskell that anyone who tried to use it in anything resembling a production app knows well.

Type Families and generics is our next best bet, which may allow automating single-type CRUD operations, but not model the database with all the relationships as a whole — since this requires multi-parameter and variable parameter type families. Also, there are not so many options of passing a database context except hiding it in a Read or State monad, while table names will likely need to be passed explicitly.

To compound the problem, Haskell is known for terrible relations with the UI, be it web or native, so until there is a good JavaScript compiler with little overhead (ghc-js is the heaviest thing in the universe after some black holes), we stand little chance of creating a good (providing all the usual Haskell benefits of “if it compiles it runs” but with an easy-to-grasp API) end-to-end application development platform based on Haskell.

However, if we want to have this eventually, and make the task of creating and maintaining the real-world data-driven production apps much easier than now, the answer likely lies in using an even more powerful Type System than Idris and Haskell with all the GHC extensions can provide. Also, Haskell gets a bit confusing by differentiating types which are basically the same thing into Sum Types, Typeclasses, Type Familes, and the like. A cleaner design might be more conducive to making Type-based functional programming much more mainstream.

Thank you for reading this sketchy train of thought, and I will be very curious and grateful for your feedback to develop it further!

--

--

Anton Antich
Superstring Theory

How to scale startups and do AI and functional programming. Building Integrail.ai: pragmatic AGI platform. Built Veeam from 0 to 1B in revenue in under 10 years