About Reliability

Devashish Patil
CodeByte
Published in
3 min readJul 5, 2024
Designed by Devashish Patil

Everyone has a rough idea about what it means for something to be reliable. For software systems, following can be considered as reliable expectations:

  • The software is working as per the user expectations with required performance that the user needs.
  • If the user makes any mistakes, it still continues to work. Same is true for using the software in unintended ways.
  • Only allows authorized access and prevents abuse

A system is can be said as reliable when it continues to work correctly even if things go wrong. These can be hardware failures, software errors, human errors or even if it is used in a way that is not intended.

Importance of reliability

Reliability is not just important for critical things like vehicles, food quality etc. It is also important for software systems.

Imagine you are storing all your photos(read memories) in a cloud based service and suddenly the data gets corrupted, how would you feel?

Apart from a bad user experience, this may result in loss of revenues for businesses. For example, a payment gateway going down directly affecting payment transactions, or an e-commerce website not able to show products, which indirectly causes reduction in sales.

This may also result in legal/monetary implications if data is reported incorrectly or system is down for more time than what was agreed to(read more about Service Level Agreements or SLAs).

Can you compromise on reliability?

When you are launching something new, or testing the product with an MVP, then it makes sense to focus on shipping the software as soon as possible and compromise on reliability to save costs.

But for systems where you already have a user base which is consuming services from your application, then it becomes absolutely necessary to keep the application reliable. Anything apart from that is a bad consequence.

How to make your systems reliable?

Once you have decided that your application needs to be reliable, which is going to be in most of the cases, following few approaches can be considered.

Testing

When you are trying to build reliable applications, incorporating testing helps a lot. Automated testing ensures you are not pushing code with bugs, breaking changes etc and therefore you’ll have more confidence when making changes to your application.

Many organizations follow test-driven development, where you write the tests first, and then write the actual code to pass those tests. This is usually done for Unit testing. Along with this, to ensure reliability even further, you should incorporate functional and integrations testing in your applications.

Introduce Chaos

Chaos engineering is a fairly new concept but is an effective one. It is the practice of intentionally injecting faults into a system to check its resiliency.

Chaos Engineering is similar to how a vaccine works. You inject your body with small amount of potentially harmful things to build resistance.

The goal of chaos engineering is to find potential issues early on which will allow you to mitigate them and prevent outages and disruptions.

Examples can include, terminating virtual machines or containers randomly, introducing memory leaks to cause resource exhaustion, introducing a flappy firewall or an unreliable network.

Security Hardening

Unreliability may be caused because of insecure on poorly guarded systems. Attackers can cause an outage such as a DDOS attack, or may exploit a known vulnerability to gain unauthorized access.

This is where hardening of the systems that are part of your applications is absolutely necessary.

Examples:

  • Having a Web Application Firewall to protect against DDOS attacks, implementing OWASP top 10 best practices etc.
  • Regularly scanning your code, OS and container images for vulnerabilities and taking action to mitigate those.
  • Having security constructs such as authentication, authorization, encryption etc.

These are just very few ways for improving the security of your application, but security is not limited to just this and is a whole another topic on its own.

If you made it this far, be sure to check out other articles from me at Devashish Patil

Keep Learning.

--

--