Software Systems at Mizu

Published in

ÇSTech

15 min readJan 25, 2022

Software systems can be thought of as an organism working in harmony. Team members, the software itself, tools, principles, best practices, and so on.

In this article, we will dive into the parts of the organism, the ideas, learnings, and some other details of the software system designed and maintained at Mizu.

Take the Control at Scale

In software, the smallest piece that makes it possible to build the entire system is a single line of code. Start thinking from over there. Taking control at that point is a must and it can be considered a building block to take the control at scale.

Clean Code

What makes the code clean? Namings, conversions, approaches, principles, patterns… Yes, there is a lot more than possible to mention in a single section or even a single article. However, some of the following parts of this article will give you an idea about how clean code is one of the main concerns for us.

Coding and Application Level Standards

As software developers, we are working on many projects even in the same company. A service or basically an implementation or even a single line of code developed by a developer should be easy to understand for another one. So, that is one of the concerns of the coding standards.

On the other hand, the application level standards should also be considered. Taking the advantage of standards is possible by defining them on the code level and the application level as well.

At Mizu, we have created a git repository to keep the standards organized and maintainable. Having a static site for the standards made it easier to check them quickly.

The advantages of having standards:

The code itself and the application can easily be understood by developers.
New team members can quickly adapt to the applications developed by the team.
It makes the adaption is easier for the other team members responsible for analyzing, testing, deployment, and so on.
Some of the possible bugs can be eliminated at the development stage.
It helps to prevent overthinking the same concepts again and again while developing.
Makes it possible to discuss the standards.

Some of the titles to consider:

Naming, formatting.
Configuration, error handling, caching, logging, monitoring.
Authentication/authorization.
Localization/globalization.
Communication. (e.g. REST, RPC, Messaging.)
Routing. (e.g. deciding to RESTful maturity level.)
Application architecture. (e.g. N Layer, Clean Architecture.)
Project/Folder structure. (Depending on the programming language.)

Consider configuring an editorconfig file that will force you to follow the standards you defined.

Expand the standards to every aspect of the technology you maintain. These aspects can be considered as Development, Test Automation, DevOps, Infrastructure, Security, and Data.

Documentations

The standards I mentioned above can be thought of as a part of the documentation. However, there is more to write down to documentations.

Usage details for the internal libraries.
How todos.
Solutions for incidents.
Standards.

We are preferring to use https://docusaurus.io/ as a documentation tool.

Project Template

The code itself turns into an indicator of standards if it is developed by following the standards defined before.

Most of the time, looking at the code that was developed by another developer comes easy for us to assume that the code is the source of truth and safe to copy/get inspired from. When it comes to standards, one of the things that stop developers from doing that is also having project templates.

At Mizu, we have created project templates for several languages like .NET and Golang. These project templates come with examples developed by following the standards and making it easier to start a new project rapidly.

Don’t overthink and don’t try to implement everything on the project template. Keep the project template as simple as possible and make sure they are always up to date. Yo!

We are preferring to use https://yeoman.io/ as a project template engine.

Clean Architecture

Programming languages are frequently used for developing Web APIs and MVC projects that must have a well-defined project structure and a well-defined reference tree between the project layers. This is where the clean architecture principles come in.

Clean architecture is the common technique of modern web applications nowadays. It is also called onion architecture.

Representation of the dependencies between layers of clean architecture by microsoft.

There are two GitHub projects that implement clean architecture we are following to design internal APIs and web projects. jsontaylordev and ardalis.

Don’t Push for DDD

Having anemic entities/domain objects may not be a good practice at all. However, DDD is a lot more than that. The other concerns of the DDD are bounded contexts, aggregate roots, value objects, specifications, domain events, and so on. OO Programming languages like C# and Java can easily reflect the principles of DDD. However, implementing DDD principles may not be easy as-is for some of the other programming languages. Is it still worth implementing?

We are developing microservices or microservice-ready monolith applications. Most of them are simple CRUD services. Due to their simple business requirements, it should not that hard to develop them by following clean architecture and clean code principles.

In my opinion, don’t push every service/microservice to develop by relying upon the DDD principles. Instead, consider implementing your business requirement by using functional domain objects and use cases mentioned in the clean architecture principles. Trying the simplify things by using DDD may end up with complicated microservices.

Focus on Performance and Maintainability

Performance

While designing and developing systems ready for high traffic, performance is one of the most important concerns. The programming language itself may help you to get more performance but there are many things before thinking over what programming language should be used.

What are the common best practices to gain performance improvements?
What are the programming language-specific best practices?
Which database technology is the best one for current implementation?
Indexes, database queries.
Is it worth applying CQRS by application/service level?
In-memory, distributed, or response caching. Which one is the best fit? Is it possible to use them together?
Startup/warmup time of the service. It is important for systems need to be scaled in seconds and serverless systems.

Maintainability

A fresh start for any project comes with new tools and technologies and excites us. If you are thinking to develop an existing project from scratch, you should know that is not an easy task, and throwing away a mature, bug-free, and stable application may come with a huge cost. Instead, focus on the existing application/architecture to adapt it to the latest technologies and best practices party by part.

Stay away from doing big bangs on the software architecture!

Whether a fresh start or not, to ensure the application and the system itself are maintainable, there are many principles and techniques like fallowings.

Maintainable Code

Should be developed with the guidance of the following principles DRY, KISS, YAGNI, SOLID in mind.
Follows the coding standards mentioned above.
Implements well-defined, sustainable architecture. e.g. the Clean Architecture mentioned above.
Covered by unit and integrations tests.

At Mizu, we are preferring to push teams to develop integrations tests for as many cases as possible instead of developing unit tests that forced a defined percent of coverage. Even so, unit tests are running on the CI step and they need to be successfully passed.

Maintainable Architecture

Easily reflects technology changes.
Has decoupled components like microservices, isolated databases, isolated technologies, and tools as possible.
Has strong monitoring, logging, and alert management capabilities.
Fallows the everything as code approach.
Able to deploy applications/services frequently and with zero downtime.

At Mizu, we are building decoupled microservices able to deploy multiple times in a day. There is no hard limit for deployments.

Be Polyglot, Not Over Polyglot

Building complex systems come up with many edge cases, and using only one programming language limits the system’s capabilities.

Sometimes, a library developed by another language for a specific scenario may be better than the library developed by the language already in use for the same case. Even more, the library may not exist for the language we use. On the other hand, the reasons to choose another programming language for some services/applications may be performance concerns and resource usages as well.

Microservices allow the development of new services in any programming language according to their isolated, standard communication approach. However, developing with more than 3 or 4 languages brings more complexity than its benefits.

Some side effects of using more than 3–4 programming languages:

Learning curve and development speed.
Adjusting the coding-level and the application-level standards for each one.
Implementing common libraries for each one.
Configuring build and deployment steps for each one.
Difficulties in finding developers for some of them.

The programming languages must be picked carefully. It is not enough to want to give it a try. Make sure the language is covering some scenarios better than the languages already in use.

In the production environment, we preferred to use .NET, GoLang, Node.js, and python for backend services.

Don’t Develop a Framework, Develop Small Functional Libraries

Beginning of every new project, there is always someone who says “Let’s make a new framework. Implement everything in it and don’t let the developers write even a line of code.” Please, don’t do that.

Here are some reasons to stop trying to develop framework(like) solutions:

Every code change requires updating and deployment for every new service/application.
A similar framework must be developed for every language in use on polyglot architectures.
Keeping it up to date is an annoying task.
There will be an abstraction of abstractions for some implementations and it is an antipattern.
Developer motivation will get worse.

Instead of developing framework like solutions, consider developing libraries for only cross-cutting concerns and some implementations that require to be carefully designed like caching, logging, monitoring, exception handling.

Microservices

Microservice-based architectures became a common practice of architecture design. One of the most important benefits of microservices is horizontal scalability. Also, it allows the independent configuration, deployment, and development along with its complexity. However, it comes up with some difficulties like caching, consistency, and networking.

Here are some bullets to create an efficient microservice architecture based on our experiences:

Keep them as pure as possible. Don’t develop complex solutions for simple things.
Code duplication is not that bad. Sometimes, code duplications can prevent referencing extra dependencies and getting more complexities.
Pick up one or two programming languages as main, and up to two or three programming languages to be used for some special requirements.
Define code-level and application-level standards.
Run “unit”, and “integration” tests on every commit. Consider expanding tests with “CDC”, and “performance tests”.
Make sure you split the databases as well, and don’t let access for a domain to a database owned by another domain.
Be aware of eventual consistency if possible.
Consider creating a single repository for the files to be used on CI/CD pipelines.
Make your applications dockerized and host them by using docker orchestration tools like Kubernetes.
Implement pull-based logging and monitoring approaches instead of pushing logs and metrics.
Consider using an APM or service mesh that makes tracing easier.

Microservices might be the best fit for large-scale applications and large teams. For startups and small teams, consider designing an architecture that relies on the modular monolith approach which is designed to be microservice ready.

BFFs

We are preferring to use BFF services and keep the other services private behind those BFFs and/or behind the proxies.

A BFF service can turn into a single point of failure. Consider hosting different instances for each frontend like mobile and web. Even better, consider enabling path-based direct access to the services via service mesh or load balancers for requests that don’t need to be modified for any frontend.

Database Design & CQRS

Dealing with database design is a quite hard task. Choosing a relational database might be the fast and sufficient option as usual. However, NoSQL, document-based, key-value store, and event store are the other options in this area.

Splitting the database for commands and queries is considered a suitable option for workloads that have different traffic behavior for queries and commands.

The preferred option at Mizu is using MySQL, which is a relational database. Every service/application has isolated database(s) that is accessible for only the service itself or the domain that the service belongs. This approach brings more scalable and failure-tolerant systems for the database side.

We are preferring to follow CQRS design principles both on the code side and the database side. The database technologies we are familiar with are MySQL, MSSQL, Elasticsearch, Redis, MongoDB, S3.

Applying the CQRS on the code level does not bring scalability by both command and query. However, the approach allows easy splitting when needed on the database level and the service level as well.

Applying CQRS on the service and the database level.

Eventual Consistency & Outbox Pattern

Committing a transaction is a simple task for monolith applications and services that are possible to make this commit inside of it. However, a single transaction may concern different consumer services at the same time. For example, sending a confirmation email after a successful operation. This is the point where implementing one of the eventual consistency patterns is a must.

We have designed a system that implements the outbox pattern to provide eventual consistency.

The following points are the major ones of this implementation:

An outbox table to ensure sending the event is guaranteed inside of the transaction blocks.
A service responsible to check the table regularly and publishing the events that have not been published yet.
Trying to send the event right after the transaction completes, without any delay. If this one fails, the service I mentioned above will try to send it again.
Handling failures. Save the fail event to the outbox table and publish the event, if something went wrong with any consumer.

Representation of outbox pattern with failure scenario.

We also abstracted the message broker by a service which allows sending/consuming events by simple HTTP calls. This approach made it possible to get overall control on messaging system and sending/consuming events by a service w/o implementing language-specific SDKs or something.

Failure Tolerant Distributed Caching

Caching is one of the cross-cutting concerns of the application and of the system as well. In-memory caching is simple, secure(?), and easy to implement. However, distributed caching is way more complicated than in-memory caching because of its network-related nature.

Distributed caching helps to be stateless, scalable, and memory efficient. These capabilities are getting more important when designing microservice or similar distributed systems.

There are some important points to keep the system up even the caching server/cluster fails and get the implementation more efficient by querying less for a cached item.

Develop a failure tolerant code that interacts with the cache server.
Caching should be lightning fast. Timeouts on the network are like nightmares, and make sure you are not waiting for the end of the timeout.
Make sure there is a failover scenario implemented. Consider querying directly to the source of the data when distributed cache failed.
Consider storing the cached data in memory as well for a very limited time and size to reduce network load. This can make a huge impact on the network for the services serving the same data at most.

We are using distributed caching approach at our backend services with the capabilities I mentioned above.

Representation of the failure tolerant distributed caching.

Infrastructure

We are active on both AWS, GCP and currently, AWS is the main cloud provider for our workloads. There is no limit for the deployment count in a day or even in an hour. The CI/CD pipeline we have created, allows us to deploy our services at most 7 minutes even with 4 different test suites explained below.

We follow best practices on every aspect of DevOps, and Cloud. There are some titles below that I pick up to mention.

Twelve-Factor App

The focus of the twelve-factor app is defining a clean methodology for the delivery of services to the cloud platforms and their management on the cloud. Invented in 2012 and still remains valid.

There is always the twelve-factor app on our bag and we are trying to keep the system aligned with its principles. However, some parts of it were considered a little bit different while implemented in our systems.

Dealing with CI/CD Scripts

DevOps pipeline always requires a set of scripts and configurations files to automate things. There is a practice that recommends keeping these files together with the service itself. However, when needed to update any of these files, all of these services should also be updated to keep the DevOps script files and configuration files aligned.

We are preferring to keep scripts and configuration files in a single repository so, outside of the service repositories. Before every build, the repository is cloned to the “build” environment and configured for the service on the build.

Representation of dealing with scripts in CI environments.

Test Steps on CI

Automated testing is one of the most important steps to make sure the actual artifact is working as expected.

There are 4 different test steps fully isolated we run on every build on the pipeline.

Unit tests. There is no predefined coverage limit. So, the developer takes its own responsibility to develop unit tests for the units meaningful to developing tests.
Integration tests. This is the most valuable step that trying to cover all the real-world scenarios possible. The developer and test engineer work together to develop them.
CDC tests to ensure the communication between the services is still valid.
Performance tests. This can be considered simple according to system-wide load tests we made regularly. However, it is worth doing to be sure the related commit has no effect that makes the service slower even 5ms.

I should also mention that all of the test steps are truly isolated. Every component like a database or an external service is created from scratch in the “build” environment and there is no connection to the outside.

Creating an isolated environment from scratch means there are no requests targeting the testing environment or somewhere outside of the test suite. Databases will be created from scratch and the data will be migrated on the fly, and external API calls are completely mocked. This approach allows us to ensure the output of the test results will be the same until any of them break by the developer.

We are also using Sonarcube as a code quality and security analyzing tool. All of the tests and the Sonarcube analysis must be passed. So, any failure on this workflow breaks the pipe.

Hosting the Services

At Mizu, services are hosting on Kubernetes since one of the first releases of Kubernetes. We tried using Mesos at first and gave up quickly:). There is a ton of experience we have in managing the Kubernetes cluster, configuring the entire environment to get high available, production-grade system.

Some quick tips and tricks we’ve learned over the years:

Make sure there is a failover scenario even for a cluster-wide disaster. Consider creating the same cluster twice that one of them has no pods scaled yet(disaster recovery) but is ready to scale in seconds.
Consider defining different node pools for services that behave differently.
Make sure there is a cluster auto-scaler and pod auto-scaler. At this point, the configuration of the pod resources becomes more important.
Especially for large systems, Kubernetes itself is not enough. So, a service mesh or an API gateway should also be integrated into the system for more visibility, security, and so on.
Stop developing services that implement everything on the code side that is possible to integrate by using service mesh or similar approaches. Circuit breaking, retry logic, service discovery, tracing, logging, and so on. This approach comes with a lot of headaches for the distributed systems. Keep services as pure as possible.

We are using Kubernetes with istio which is one of the most popular service mesh solutions. We have also integrated the logging and monitoring tools and practices that I mentioned below.

Logging and Monitoring

Visibility is another concern of distributed architectures. We focused to make the logging and monitoring are completely isolated from the services even the programming language.

We preferred to use the pull-based mechanism instead of pushing logs and monitoring data.

One-line JSON logging to the stdout became a standard approach to providing log messages for the collectors.

The technologies we are currently using for logging are “Filebeat”, “Logstash”, “Loggly”, and for the monitoring side “Prometheus”, “Grafana”, and “Instana APM”.

https://medium.com/cstech/better-logging-approach-for-microservices-3cc2c45e7aaa

Closing

We are always excited and motivated to get a consistent, reliable, sustainable, and robust system by applying best practices and the newest technologies.

There are also a ton of things to mention about frontend, big data, and security. You can find out more on the cstech medium page.

Don’t forget to check out our Github page. There are some projects already open-sourced and there will be more soon!

Thanks for reading.

Resources: