How should be built the ideal cloud-native framework (part 2)
In the first post I presented what, in my opinion, the ideal framework/stack should include and why. In this post, I’ll go deeper into the technical details and precisely explain how we can build it.
The backend
First and foremost, the backend itself which will contain the CRUD libraries. I’ll not go too deep into this subject because I already wrote another post for this, but I shall say more about the language choice : Golang.
Golang is a very promising language, almost as fast in execution as C, multi-threaded, and statically typed. Moreover, it’s a fantastic language for beginners, not only it is quite simple but it is also very unforgiving, which is a good thing. You’re kinda forced to follow the best practices of development because the bad ones are just not supported. It may be a huge source of critics for Golang, but I find this fantastic.
You have to read my other post to know more, but one example is reflection. On almost all languages you’d use reflection, slow at execution, without even realizing it. In Golang, you can but you have to explicitly say you want to use it, and otherwise, you have to redesign your application the right way (for example, using code generation).
I want to avoid reinventing the wheel as much as possible, but I’m not aware of any good Golang framework yet. I considered Buffalo, Micro and Gin, I didn’t felt they were what I was looking for. Gin seems to gain a lot of traction but some important concepts are missing (no protobuf at the heart of the object, no code generation etc…).
For the microservice chassis, Go kit on the other hand seems to bring everything we are looking for while being very modular to use, and shall be one of the dependencies we could use.
The database
There are two databases we can consider.
The first is good old PostgreSQL. I already use it and love it, because it’s the Odoo database. Reliable, supported by a robust open source community (no editor), it’s probably the default choice for most projects nowadays, for good reasons.
PostgreSQL had only one issue: it doesn’t scale horizontally. You can’t just throw another server to scale your database. You have to configure a master/slave system, partition your data etc… Which is not that easy, and not what we want in a microservice environment.
This is not really PostgreSQL fault, horizontal scaling for a database is almost a mathematical impossibility. But those past years, some new databases are finally resolving this old challenge, and one of them is gaining a lot of traction : CockroachDB. Also, it uses the same SQL dialect than PostgreSQL, which means applications compatible with PostgreSQL also have good chances to be compatible with CockroachDB.
PostgreSQL is still useful in the more straightforward case and far more mature and reliable; this is why both shall be supported.
Now I’ll say what shall not be considered: NoSQL. Especially MongoDB, which keeps being popular for some reason.
MongoDB is a schemaless database, which means you can insert any data the way you want inside without control whatsoever. This may ease the life of developers during the initial phase, but once in production, this is a nightmare for the database administrators. You also don’t have constraints, which may easily lead to corrupted data.
Cassandra is a little better, you first define the schema, but you still don’t have the constraints and the foreign keys to avoid corruption.
I understand there are some use cases where you want to have unstructured data (a list of product attributes, for instance) in JSON format, but PostgreSQL now has a JSON column field for this purpose, providing the same index features than MongoDB.
Many people use NoSQL databases because they are supposed to be either faster or horizontally scalable. If they are faster, this is because they have no constraints to check, configure PostgreSQL with a schema without constraints, the speed will be the same. And if you want horizontal scale, you can look at new database engines like CockroachDB, which don’t forfeit the constraints.
At one meetup, I spoke with an entrepreneur who developed an accounting cloud service for freelancers. When I ask which database he used, he said MongoDB. For accounting. I thought about the countless time database constraints saved my customers once in production. Let’s not be that guy.
The admin interface
In an ERP system, the admin interface is the web interface you use to browse your data. You just log in, and go through all the resources you have access to by their corresponding menu.
In a microservice architecture, having the same experience may prove to be a challenge. We want to have only one admin interface where the employees will connect, but we will have several, dozens of independent, loosely coupled services providing the data behind. How shall we design this ?
Like for the custom web frontend we will talk later, we shall look for a headless frontend, independent from his backend. I like the React Admin project, which does precisely this: it provides a headless admin interface, and connect to any API to get the data and execute CRUD operations.
I love it. It’s still quite early stage, but it has a good foundation, I’m sure with some time it can evolve into something equivalent to the far more advanced web interface of Odoo but without being tied to a particular backend.
The API and the API protocols
We spoke about the backend, but we didn’t say how we would be able to contact it to execute our CRUD operations.
The most used API protocol today is REST, which is quite good but has some limitations. For instance you can only execute one operation per request, so a lot of the design complexity ends up being managed by the clients who use the API.
There is another API protocol that appeared those past years and looks very promising: GraphQL. It aims to bring a query language to your API, just like SQL, bringing a lot of flexibility to your API while making it easier to use. The client asks and get what and only what they want, reducing the number of requests, and unneeded compute on the backend.
It’s also really good for API composition, you have only one entry point, the GraphQL gateway, which will redirect your request to the corresponding services behind.
For public API, GraphQL is perfect because it’s easy to use for clients and external developers. But for internal communication, our main concern is speed and network reliability, there is another protocol we want to use: gRPC. gRPC is harder to use because it’s statically typed and uses code generation, but thanks to this, it brings unrivaled speed compared to REST and GraphQL protocols. Also, it works exceptionally well with Golang (both are coming from Google).
In practice, you want your GraphQL gateway to receive the request, and convert it into gRPC requests to send to your backend services. All communication inter-services shall also be in gRPC.
Finally, we mention the GraphQL protocol, but we didn’t say how our CRUD functions shall be named. Is there any standard ? The GraphCool / Prisma teams created some well-thought specifications for this, called OpenCRUD. I think we shall follow it so the work which will be done will also be compatible with other GraphQL backends like GraphCool and Prisma.
In this end, this is how our architecture will look like : Our admin interface will connect to the GraphQL gateway, which will itself decompose and transfer the request in GRPC to the corresponding services handling the CRUD operations.
The authentication
In order to have a proper ERP-like system, there is only one thing that is missing: authentication system.
This will be another uncommon challenge because the authentication system needs to manage the whole architecture. You authenticate in the admin interface, access is granted in the API gateway, and the backend services will verify that you have the rights to make the operations thanks to the access rules. All of them will need to connect to the authentication system to do so, so it needs to be an independent service.
One authentication system I like for this purpose is ORY Hydra. It uses OAuth2, all the complexity inherent to authentication is centralized in the Hydra service, but this service connects to another “consent” service to control the access. This consent service can be anything, and thanks to this design, we keep full control over the design of our authentication service.
This means the consent service will handle the login page, the user list, their password, their access rules etc… which then is handled to Hydra. The other services then connect to the Hydra service to perform their authentication tasks.
The clients
Now that everything is in place to resolve the ERP use case let’s check what’s missing for a regular web application. We already have everything we need on the backend side, accessible through a great API. The only thing missing is a public interface for our service, either a website, a mobile application, an IoT etc…
You can pretty much use whatever you want here since you will get your data through the API. Still, I recommend React for creating a website, thanks to innovative concepts and a great community. Also, using Apollo libraries, it’s at the forefront of GraphQL clients.
Another reason is related to mobile application: You can use React Native to create native applications in both Android and IOS. This also opens new possibilities, centralizing the code related to business logic and API communication in the same functions, and then providing different views depending on the platform. You could then develop at the same time your website and your mobile applications, sharing the same codebase!
I don’t want to go too much into details here, the other topics are more important for now, and I don’t want to overthink this. Let’s just say that I strongly consider Next.js for using React, which takes care of a lot of pain points and provides at the same time isomorphic websites, which are extremely important for good referencing on search engines.
The cloud-native tools
Once we have all the components of our stack, it’ll be time to install the environment to run it.
For this, we shall have a look at the Cloud Native Computing Foundation, which is now the organization managing all the best tools in the microservices community. You shall also have a look at their trail map, it does a pretty good job summarizing each step to go cloud-native.
Once all the services are containerized, we’ll need to set the orchestrator, which will pilot our server farm. Kubernetes, the first and the main project of the CNCF, is already the standard component for this job and is being deployed everywhere. This means you can install it yourself on your servers, or use the Kubernetes service of your cloud provider.
Kubernetes will take the containers of your services, and deploy them on your servers. Making sure you can easily duplicate them if you need to scale, managing the intra-services communications etc…
In the previous post, I mentioned how our CRUD framework should be able to generate events so other services can subscribe to them. Those events and the subscriptions need to be managed by a specialized, scalable, and reliable messaging system. Here again, the CNCF provides us another tool for this: NATS.
When communication between services needs to be done, either it will be a synchronous call through gRPC, or an asynchronous with NATS. The first service will generate an event, the subscribing service will get the data and do their operations without the first service even knowing.
Then you need a reliable observability system to know what’s going on. On a multi-services, multi-threaded environment, you can’t just check a service to see if an error was thrown, you need to set a global platform which will monitoring everything:
- The logs of your services will be sent and centralized into Fluentd
- Metrics (servers loads, time to answer requests, etc…) will be stored into Prometheus
- Each stage of every request will contact Jaeger to provide distributed tracing. This allows us to see the whole journey of a request, and instantly see which service went wrong.
With this, you shall be alerted when a service faces too much load and need to be scaled, and even when a new development caused a sudden rise of errors and need to be rolled back. Without this, once in production, when an error appear you will be blind and will have no way to understand what is happening.
Finally, you’ll need to manage the communication between your services, this is the role of a service mesh like Envoy and Istio. It will encrypt all the communication so you can set a zero-trust network, will send each request to your observability system, will cut communication if a service is slow to answer (circuit breaker), etc…
Note that most of these tools are written in Golang. This is also one major point about why I want to use this language for the backend, the cloud-native community and the Golang community include almost the same people, and it makes so much sense to be among them.
Setting up all this environment shall be an essential part of the stack, easing this installation as much as possible. It’s what’s currently difficult to do alone when you set up a microservice environment, and yet it is often overlooked in most stacks and tutorials.
The development process
The last part of our architecture, how we will manage the evolution of our code. We need to provide a VCS to store the code, host private discussions and merge requests for the team, and a reliable CI/CD solution to test the code, build the containers and send them to Kubernetes.
I have to admit, I’m a huge fan of Gitlab. Most of the product is open-source, they provide almost all you need for your DevOps process, from code to CI/CD, vulnerability scanning, etc… And as a company, they have such an amazing story, being entirely remote, usually open-sourcing popular features when the community is asking, even their internal company documentation is publicly available.
Given their dedication to open-source, I don’t understand why most open-source projects are still using Github. It’s beyond me.
This is why I’d like for the development of the stack itself to happen on Gitlab.com and avoid Github. I understand this may cut us from many other contributors, but using Gitlab make so much more sense, by conviction and simply because it shall itself be part of the stack.
About the license
Finally, what shall be the legal setting of this stack. For such a tool, we shall use a permissive open-source license like the Apache license, which allows anyone to do whatever they want with it and also protect from patents.
I don’t want this technology to be backed by a company as an editor. I don’t plan to make any money from it, all I care is having the tools I need to do my job.
We will be able to create awesome paid products and services easily enough with the stack if we can make it happen. The stack shall be backed by a community, like the cloud-native tools I mentioned above, like PostgreSQL, and in the long run, I can just hope it will become one of the projects hosted by the respected open-source foundations, like the CNCF and the Linux Foundation.
I don’t know if the prototype I made will be useful, if we will be able to create the community needed. But I’m sure of one thing: someone has to do it, we are all waiting for something like this.
We will need more people, more contributors, more companies willing to fund this research. If you agree with this vision, don’t just read this post, let’s chat! Let us know what your challenges are, how we could solve them. Contact me, join the chatroom on http://empower.sh, star the prototype on https://gitlab.com/empowerlab/example, and relay this post around you.
Thank you for your attention, I hope you found what you were looking for.