Inside Pica Pica’s stock API architecture

After some requests on the API workings, I thought it would be nice if I just outlined it a bit more in a blog post. As you might’ve noticed I’m not the biggest writer, but this seemed like a good topic. I’m very -very- sorry if this blog post is somewhat boring, but I’d rather explain it properly instead of cutting corners.


Micro services everywhere

I’m a big believer in the single responsibility principle, and that is seen in the API’s set up. That’s why for every data source you create in Pica Pica a process is started that handles only its assigned data source. No overlap happens with other data sources at all, they just don’t interact with each other.

Aside from the data sources you create, a couple of other processes are also started. So for example the newest addition is the event_logger which makes sure all events across the processes are logged in their respective log files and sends new events through a socket.io connection. Another process that is also started is the admin process, which handles everything from authenticating the administrator, as well as returning the logs and available data sources.

Of course to keep everything running smooth and stable, two higher level processes run on the system as well.

The first process is the nginx process. For those who are not familiar with nginx, it’s a web server, or in layman’s terms it is a service that (in our case, simply said) bundles all of our API processes together and creates a single point of access for them. It’s the service you call when you go to http://localhost:3000/api/admin/ which then translates it to the right API (in this case the admin API).

The last, but equally important process, is Supervisor. This process starts up all data sources, event logger and nginx, and then makes sure they all stay running. So if one of the processes should crash, it restarts them again to make sure they keep on running.

Whenever you execute docker logs your_api_name you will get to see the output of Supervisor. It tells you which processes have started, crashed, restarted and stopped. Very nifty!


The API architecture

When we talk about the API there are two things I will focus on. Firstly what technology I’m using, and secondly how the API works in an abstract way.

The technology I’m using is JavaScript including node.js 6.11.1 LTS as many other services out there at the moment. I made it in JavaScript as that’s currently one of the most popular languages out there and it’s highly customisable without too many restrictions. The default database technology used is SQLite, which performs well under quite high stress, and has some flexibility to squeeze a little more out still.

Besides that I use a range of packages within the API to make sure it all works as it does now. For example packages for unit tests, end-to-end tests and code coverage are all part of that.

The second part is much more interesting as now I’m going to explain how the API works on an abstract level.

I personally think that one of the best methods of architecture is the way Uncle Bob describes it in his article(s) about Clean Architecture. In essence it’s all comes down to layers that are interchangeable without too much fuss.

So for example when the database is called by a service, an intermediary file is called instead (called the repository). The repository then decides which database adapter to call. Which means that if we ever want to change our database engine from SQLite to say MongoDB, we can simply create a new database adapter for MongoDB, and point the repository to use that one instead. The change that we need to make is so trivial, it doesn’t affect any other code than the repository (and perhaps a configuration file or two).

This loose coupling makes it not only easier to maintain but also makes it much easier to test, since (almost) every component is contained to itself.

The main thing to take away is that Pica Pica’s stock API has a stable and customisable architecture, that makes it really simple to customise it to any situation.

High level path taken per request

So let’s take an example of a user that creates a new recipe record in the API. What happens from start till finish…

  1. The request comes in at the API’s controller.
  2. The controller fires off a new use case with the details of our request. In this case the domain will be recipe, and the action will be create. It will also parse the current user (if logged in), the respond function and the service needed for the use case, which is the ability to create records. Let’s call the service serviceCreate.
  3. The use case for create will then first validate the data against the domain (recipe) to make sure that all data is valid, before sending it off the domain instance to serviceCreate function.
  4. Then the serviceCreate function will send the data to the repository to be inserted.
  5. The repository will then determine the database adapter (which is SQLite by default) to use, and send the create request to it.
  6. After successful insertion it will return an instance of the record created, which will be picked up by the use case again.
  7. Lastly the use case will return the record through the respond method given by the controller.
  8. The controller then finally outputs it back to the user in the way they want to display the data, which by default is JSON.

And that’s all to it. It seems complicated, but in reality all of this happens so fast it doesn’t delay anything in the request. The beauty of this is that we can change every individual part of the application without breaking anything.

For example if we just want to output HTML instead of JSON, we can just change the respond function parsed into the use case. Or if we want to change the way data is created, we can change the serviceCreate service, or the create use case.

Of course we need to be careful about some things. For example the controller knows about the use case and what it can expect from it, but the use case has no idea about the outside world. This means that the use case has a single responsibility and cannot go outside of that. We can’t just start including random functionality from outside of the use case reach, as that would compromise the loose coupling, and also the maintainability.

Let me explain a situation for creating new users. New users need a hashed password, but where does this actually need to happen? You could argue that this would happen in the database or service, as that’s the one actually doing the creation. But if we do that, then who does the authenticating and verification of the password hash when we need to? Since database adapters or services have no idea on what’s going on beyond their scope, this would totally complicate things.

We want to do this in the use case, but as we said before we don’t want the use case to use functionality outside its own scope. So putting it in the use case directly isn’t a great plan, as then every use case needs to know how to hash and verify password. You perhaps feel this coming already, but we need to put the functionality to hash the password (and, verify the password) inside the controller, and pass it on to the use case. That way the use case can make use of it, but the controller is the one who knows how it works. When we start using the authentication layer of the API, the controller is the one that needs to do the authenticating, so that means the logical place for the password hashing and verification is on that level.


Conclusion

I hope that my ramblings make a bit of sense, as I tried to make it as clear as I can. I’ve been putting off explaining this a bit as it’s usually a bit of a tedious bit to explain. The main thing I want to stress with this architecture is that since all components only look further inward, it’s very easy to understand, to maintain and most of all, to customise.

Show your support

Clapping shows how much you appreciated Joseph Callaars’s story.