Take ownership of your data — Part 1

Data analysis with elixir

There’s a few things that comes into mind when developing a new application. One thing we get excited about and sometimes delve deep into is system design and how the parts goes together, with high level abstractions, boxes and arrows and names. But the architecture is something fluid that grows overtime and an upfront design hurts more than it benefits most of the time. It requires a solid understanding of the requirements, workflow, interfaces and integrations, and ideally should grow overtime and adapt constantly.
One thing though that is stable enough to have an upfront design is the domain. The domain of a business or a problem to be solved is usually more stable than implementation, design, architecture and workflows, and could be a good base to give an initial thought. Not diving too much into Domain Driven Design, thinking about the system with a domain lenses could be useful in many systems.
But as with any habit, it’s tempting to start thinking about the domain the way it is persisted. Many times I’ve found myself drawing tables and relationships when thinking about a new application and how the application itself should reflect those designs. More on that on Part 2.
What we need to exercise here is decoupling data modeling from database modeling. This coupling comes from Integration Databases and how for too long we’ve been using it without giving proper thought on how damaging it could be in the long run for the applications and systems. When this anti-pattern is the default, we try to model the data thinking database first and trying to come up with a model that will suffice for every application that shares the data and making the distinctions necessary through SQL.
The possibly damaging consequence is the numbing effect on the data it gives us. We are forced by habit to think every data could be relational and trying to fit it in tables, rows and relationships. It changes the data to fit the model and not the other way around. The results could be from simple bad data representation to complex, extensive and unmaintainable SQL queries, and also disrespects the encapsulation principle, having different applications accessing the same data in different ways.
To break our minds away from that bad habit, a useful exercise would be to model your data inside your application first, without thinking about persistence. This does a few things:

  1. It lets you think outside of the relational model and model your data according to its nature;
  2. Helps you use the full potential of the programming language’s native structures and how customizable you want it to be;
  3. Forces you to think about state inside your application instead of a one way request of hitting the database.

The result of your data modeling should be a mess at first: you’re not used to do that. How granular you want your entities? How can you connect the parts? Is your data relational? Is it hierarchical? Or is it in a network-style, like graphs?
Next question that arises are the CRUD style operations / queries you want to perform on your data. CQRS is a pattern that describes in a good way how to manipulate your data in a different fashion than your regular CRUD or SQL-style operations, which can make you think about how your data changes, how should you validate inputs and transformations and keep consistency within your reaches.
At this point, you start to worry about concurrency. Is my state owned by a single process/owner/user? From how many sources will operations occur? How can I represent the changes safely and avoid conflicts, inconsistencies and race conditions?
To be fair at this point, most languages are a little behind in this aspect. When runtimes like the JVM were adapted to the web, their concurrency model didn’t follow-up gracefully and it was still delegated to the database to be handled. But libraries and framework caught-up with trends and we can make use of models like the Actor Model to manage that with more ease. Elixir, powered by BEAM, comes with built-in concurrency primitives that makes it a no-hassle concurrency management.

[caption id=”attachment_1341" align=”alignright” width=”300"]

receive example

receive example[/caption]

Elixir is powered by lightweight processes that can hold state. In BEAM, Elixir’s runtime, processes are isolated and share nothing with other processes and have little memory footprint. The communication between the processes are done by message-passing, each process’s messages being queued in a mailbox. The example to your right shows an example how a process can live for long holding state and waiting for messages to display or modify the state.

The message in the format of {:get, pid} asks the process to send back the state to the caller PID, while the {:update, new_state} updates the state with a new state. The receive block's return is piped into the loop() function to loop indefinitely and the process becomes idle while waiting for a new message. More details can be found here.

[caption id=”attachment_1340" align=”alignleft” width=”300"]

data modeling through process

data modeling through process[/caption]

It’s a verbose example because it shows the built-in mechanism that can be leveraged to hold state. Abstractions like Agents and GenServers helps you build a more clean representation depending on your needs. But this gives us the foundation on how we can hold state in our application and start modelling our data. A more concrete example is shown in the picture to your left, each square being a process holding state.

The connections are not hardwired and as flexible as you want. The arrows simply represent that one process has the identification of another process so it knows who to send a message, called PID. It then can pass messages to the process to modify state and optionally send it’s own PID to get the response when applicable. The classes that the student holds are not redundant, so the class process controls it’s own concurrency and student can confirm it’s enrolment to the class after receiving back a message from the class confirming it. This is a pretty granular approach, but a more coarse-grained approach could be applied as well.
You should realize at this point that there’s no mention of database modeling or persistence. This is because you shouldn’t worry for now and it should come as a result of the data modeling instead of the other way around. After your modeling of a piece of that is done, the persistence operations should come more naturally and less restricted to a type, making you think what kind of database you should use for your data and giving space to non-relational databases as well.