Stoic Philosophy and Software Engineering

Published in

Cermati Group Tech Blog

15 min readOct 22, 2018

Marcus Aurelius, who was a Roman emperor and is also a renowned Stoic philosopher, famously wrote words of wisdom on his personal notes for himself to reflect on. These notes were posthumously compiled and published as a book, popularly titled as Meditations.

Marcus Aurelius, a Roman emperor and also a Stoic philosopher (image from History.com).

Marcus Aurelius is regarded as a philosopher king, which is Plato’s concept of a ruler who possesses a love of knowledge, intelligence, reliability, and humility. In his personal notes, he made mentions that he had no control over things but his own mind and perceptions towards anything that befell him.

The emperor wrote on how to conduct ourselves in the face of adversity. Many of his ideas focus on living in harmony with the Providence — the higher power governing the cosmos.

His notes basically talk about how to live our lives and keep our conduct in an unreliable, and even hostile, environment.

Does anything else come to mind that needs to be deployed in an unreliable and hostile environment? Among us software engineers, software might be the first thing that we thought of.

Getting Philosophical

I happened to had a chance to talk in Cermati’s internal brown bag session in September 2018. The brown bag session isn’t exclusive for tech-related topics and engineers, any employee can become a speaker to share their knowledge and can attend the session — it’s scheduled bi-weekly on Thursday and held during lunchtime.

A translated version of Marcus Aurelius’ writings.

As the preparation for the session, I did a reading on Meditations by Marcus Aurelius — since he was one of my sources for the talk’s materials. While the book itself focuses on how the emperor thought a human being should live their life, some of the qualities he thought a good human being should have are actually the qualities we might desire to have in ourselves when designing and maintaining our systems.

The key idea of Stoic philosophy is a valuable thing to remember: the only real power we have is the power over ourselves. In this article, we’re going to see how we can apply Stoic philosophy — based on some of Marcus Aurelius’ writings in Meditations — to the challenges we may encounter in our software engineering work.

We’re going to use example cases that focus specifically on development and maintenance of Internet-based application software. This category of software is a common product for tech startups, so it allows us to pick more relatable example cases.

Every sections following this one will begin with a (supposedly) relevant quote taken from Meditations.

Implementing the System

You have power over your mind — not outside events. Realize this, and you will find strength.
— Marcus Aurelius

On development and maintenance of a large-scale system, it’s unavoidable to use components provided by parties outside of our team. The behaviors of parts that we don’t build and maintain ourselves are generally out of our control.

The more parts the system have, the more difficult it is for us to predict their behaviors. Therefore, we might occasionally encounter unexpected behaviors from the libraries and third-party APIs which our system depends on. These unexpected behaviors may affect how our system works, and might cause problems within our system at times.

A good system is a reliable system. For the system to be reliable, we need to anticipate unexpected behaviors from the libraries and third-party APIs we’re using. Handling these behaviors the wrong way may lead to technical problems, which may also affect the whole company’s performance.

Suppose we have a service (let’s call it service A) that interacts with a HTTP API from a third-party web-based service (we’ll call it service B). Service A is a part of our production system which handles monetary transactions, while service B is an e-banking API endpoint managed by a local bank.

Service A sends HTTP requests to service B and receives a JSON HTTP response. In this case, we had an agreement with the local bank’s software engineers that their API will return a response JSON with the following format.

{
  "status": "OK",     // OK for success, ERROR for failure
  "transaction": {},  // affected object
  "message": null     // null if empty, string if not
}

The status field should only contain two possible values with string data type according to our contract depending on whether the payment is successfully processed: OK or ERROR.

Trusting that the developers of service B will never violate the API response format, we built our system with the assumption that service B’s status field must always be either OK or ERROR. Here’s how our code ended up.

...
let response = JSON.parse(serviceBResponseBody);if (response.status === 'OK') {
  paymentSuccessAction();
} else {
  paymentFailureAction();
}
...

paymentSuccessAction() will mark the transaction in service A’s database as successfully paid, while paymentFailureAction() will mark the transaction as not paid and to be canceled. There are no other actions performed in those function.

We defined three states of transactions in our database: PENDING, SUCCESS, and FAILURE. The PENDING state is used for marking unprocessed transactions, SUCCESS for those we successfully processed, and FAILURE for the failed ones.

One day, service B behaves a bit differently from what we expected from our agreement. It returns the strings ok or error with lower-case characters instead of upper-case as we previously agreed on.

Every single one of our transactions that’s successfully processed by service B is marked as failed transactions in our system. The customers’ balances in the banking database are subtracted for payment, but the paid transactions are marked as failed in our system.

Dante and Virgil sightseeing in Hell, a scene from The Divine Comedy volume I — Inferno.

Hell breaks loose. There’s a panic among our customers, and our company’s staffs are working hard to resolve the issue.

When we complained to service B’s team, we learned that one of their new hires made a mistake which ended up with the status field value in the response JSON transformed into lower-case strings. It breaks our system, since we assumed that service B will always return OK string in upper-case letters on success and anything other than the OK response is considered a failed transaction. We never anticipated any mistake that could happen on their part.

Regarding the problem on our system’s side, we fix the problem by comparing the transaction records provided by service B’s developers with our transaction records on production to check which transactions are supposed to be successful but marked as failures. We then made a script to process the list of successful transactions we got from service B’s team and handles it properly in our system.

Learning from our experience, we decided that we can no longer naively trust service B’s team. We introduce another state to the transaction — we call the new state DISPUTE— and modified our transaction-handling code into this.

...
let response = JSON.parse(serviceBResponseBody);if (response.status.toUpperCase() === 'OK') {
  paymentSuccessAction();
} else if (response.status.toUpperCase() === 'ERROR') {
  paymentFailureAction();
} else {
  paymentDisputeAction();
}
...

paymentDisputeAction() will set the transaction state in the database as DISPUTE, so it’s easier for service A’s developers to query whenever system B returns ambiguous status message. We also added automated job to periodically check for DISPUTE transactions in the database and notify us to check if there’s anything wrong with the system whenever any DISPUTE transactions are found.

The story can be extended to other cases such as server or network failures when service A is communicating with service B, but the story so far should’ve shown how things outside our control can change and affect our system’s behavior.

We can make our system more reliable by approaching things a bit differently. By assuming that anything that we depends on will fail, we’ll be able to see the weaknesses on our own side.

From the weaknesses we spotted, we can improve our system to limit the damage whenever any of our dependencies fails.

Responding to Feedback

If someone is able to show me that what I think or do is not right, I will happily change, for I seek the truth, by which no one was ever truly harmed. It is the person who continues in his self-deception and ignorance who is harmed.
— Marcus Aurelius

Any feedback always holds some value. Our personal preference regarding the feedback’s delivery and content doesn’t change the fact. But despite that, try our best to be proper when delivering one.

When receiving feedback, we can start by looking at the argument it made and see how it aligns with the purpose we serve. A good feedback contains logical arguments, supporting evidence, and maybe recommendations for improvements.

Let’s get back to the service A we have on previous section (we’re calling this payment service now), which is used to process payment through an e-banking API — from the local bank with the biggest user base in the country. The payment service is planned to support some other e-banking APIs in the near future and we’re currently working on it.

Suppose that recently there’s a trend that more and more online transactions are conducted using credit cards or cryptocurrency instead of e-banking. With that as an argument, and provided with research data, our business intelligence analysts suggested that we prioritize adding support for credit cards and cryptocurrency payments instead of other e-banking APIs for now.

As developers, we’re reluctant to take the suggestions. Our payment service handles a lot of business logic, and our payment business flow is built with only e-banking in mind — which used to be the only payment method used by the customers on our market for online transactions.

Adding credit cards and cryptocurrency support to the payment flow might take some time unless we perform a “hack” to the current payment flow to add two more payment methods. But this might mess up the codebase and leave us with some technical debt.

Wanting to comply with good software development practices, we’re leaning towards rejecting the request to go for the quick-and-dirty solution. We argued that we should continue the halfway-done integration with other e-banking APIs first, and then we’re going to redesign the payment flow properly to support the additional payment methods. We estimated the flow redesign project to be started about two months from now.

The business intelligence team argued that the two-month delay is too long. Our competitors are already adding the additional payment methods to their system. Ignoring them would mean that we’re losing potential orders from customers who prefer to pay with credit cards or cryptocurrency to the competitor — which can be a lot since the trend has been shifting rapidly towards credit cards and cryptocurrency payment.

Should we go with the quick-and-dirty solution now?

If the gains from performing the quick-and-dirty solution right now trump the losses of having to deal with the technical debts later, go for it.

The software we’re building is built for the purpose of supporting the organization towards the fulfillment of its missions and the realization of its visions. If by performing the quick-and-dirty solution we can bring the organization closer to our shared goals, and when the setbacks that might be caused by the technical debts can be dealt with reasonably later, it is the correct step to take.

Not going for the quick-and-dirty solution might be the correct textbook step for software engineering. But unless the sole purpose of the software we’re building is to implement the best practices, there will be times when the right step is to ignore them and go with the “hack”.

Following the best practices is a good long-term investment since it would make our code cleaner and our system more maintainable. But being too rigid, we might miss the opportunity to better contribute to the realization of our organization’s visions. Being flexible and open to adapt to various conditions will let us know when we need to break the rule to achieve something else.

Retiring Old Components

That which has died falls not out of the universe. If it stays here, it also changes here, and is dissolved into its proper parts, which are elements of the universe and of thyself. And these too change, and they murmur not.
— Marcus Aurelius

The software we’ve built will be retired someday. But a retired software isn’t necessarily a failed software.

A software might be dropped or rewritten into something different, to adapt to the changes in requirements and environment over a long period of time.

Let’s get back to the payment service we talked about back in the previous section. We’re going to see how the payment service evolved over the years.

Originally, we had a payment service that only supported communications with e-banking APIs. Due to the rise of credit cards and cryptocurrency as methods of payment in online transactions, we added support for both payment methods. After some time, we added more payment channels for all three payment methods and split up the payment service to a few other standalone services.

The services we ended up having for the payment flow after a few years of development.

One day, the government banned the use of cryptocurrency in online transactions. The government decided that cryptocurrency can’t be used as a legal means of monetary transactions. They issued a regulation regarding the ban of cryptocurrency, stating that any local business supporting cryptocurrency as a payment method shall be severely punished.

Since we couldn’t keep the cryptocurrency payment method in our payment flow anymore, we ended up removing the service from our system and supporting only credit cards and e-banking for the payment methods. The service is taken down from the production system.

The cryptocurrency service no longer holds any value to be run in production, so taking it down will free up some computing resources for the others to use. The service has been providing value for us, but the time has come for it to be retired.

The former developers and maintainers of the service can be assigned to join some other teams, allowing their experience working on the cryptocurrency module to be utilized in the other teams’ development and maintenance activities. Or they can be kept together as a team and get reassigned to another fresh new project — depending on what the company requires at the time.

Consider the service’s purpose for the company is already fulfilled. The knowledge gained during the development and maintenance of the service will live on to serve some other purpose.

Anticipating Future Needs

Never let the future disturb you. You will meet it, if you have to, with the same weapons of reason which today arm you against the present.
— Marcus Aurelius

Future-proofing our system is good to minimize future costs in adjusting our system to the changes in environments and requirements. But when we can’t really foresee the future we’re expecting, future-proofing might yield us more cost than benefit.

To tell whether a future scenario should be anticipated, we need to estimate the likelihood of the scenario happening and the opportunity (or risk) involved in the event. A scenario with a high likelihood of happening and posing high opportunity (or risk) to the organization is an obvious scenario that needs to be anticipated, but another with a low likelihood of happening and low opportunity (or risk) to the organization is likely to not worth the effort.

The cost of anticipating the scenario should not exceed the gain (from opportunity) or loss (from risk) from the event. Otherwise, it might not worth anticipating. The likelihood of happening, the potential gain (opportunity), and the potential loss (risk) from the event can be assessed by looking at the direction our organization is moving towards.

Illustration of a tactical war map (image from Lone Sentry).

Let’s assume we’re working for an early-stage tech start-up company, building our first MVP. Imagine it’s Cermati in its early days, in a parallel universe. Cermati is a marketplace for financial products, serving the Indonesian market, that’s aiming to enable financial inclusion in the country.

There are a few key features planned to be included in the system, and several additional nice-to-have features waiting on the backlog. What we need to focus on is launching the product as soon as possible and iterating quickly on the product development after the launch to allow the company to survive its early days.

At this point, we shouldn’t think too much about scalability or fancy features. The product doesn’t exist yet and the user base is practically nonexistent. In this stage, the company’s aim is to get its product and its first foothold in the market. Failure to do so would mean demise to the company.

After a few months of development, Cermati managed to finish its MVP and launch its first version of the product publicly. The company, having secured their first piece of the market, planning to strengthen its position and expanding their market share.

To expand the market share, we might need to focus on supporting a few more additional financial product variants. Also, we might need to work a bit on SEO and features related to digital marketing to allow the company to market the product to a wider audience.

The company managed to gain more customers due to our work in SEO and digital marketing features. The web application now serving more and more users, and the servers seem to need some upgrades. Some parts of the business logic are also noticeably slower nowadays since they weren’t written with performance in mind at first.

At this point, we need to start planning for system scalability and optimizing parts of our code and database queries. While we might also need to upgrade our machines, optimizing the code, database queries, and system architecture might let our software to utilize the resources more efficiently — allowing us to get higher performance from the same machine setup, which lowers the operational cost of the company’s engineering expense.

And then the company decided to build a mobile-based client for its product. We need to refactor the code for reusability, so the same back-end logic can be used for both web application and mobile app. After that, the company decided to support another line of financial product, and we need to build the parts because the current version of the system doesn’t support the flow that we need.

There goes the story, on and on. Later on, we might be adding some machine learning capabilities in the product or redesigning our infrastructure to handle a much higher load. The tasks might look more intimidating, but we’re still in the same cycle of product iterations.

It’s basically an endless cycle, as long as the company lives on (image from Engineering is Elementary).

How far into the future we need to anticipate is highly dependent on the stage of the company, the company’s strategy, and the execution. In the story of Cermati’s parallel universe counterpart, the company’s still in an early stage and doesn’t plan that far ahead. Since what steps the company’s going to take beyond the current goals are still uncertain and prone to changes, prematurely anticipating them might end up with us building something only to be discarded later.

Anticipating the wrong things can be costly. Resources would be wasted on solving the wrong problems, while the important issues aren’t dealt with properly.

Conclusion

‘It is my bad luck that this has happened to me.’ No, you should rather say: ‘It is my good luck that, although this has happened to me, I can bear it without pain, neither crushed by the present not fearful of the future.’ Because such a thing could have happened to any man, but not every man could have borne it without pain. So why see more misfortune in the event than good fortune in your ability to bear it?
— Marcus Aurelius

Marcus Aurelius’ writings can give us some perspectives that might be useful in handling various situations and making decisions. Software engineering requires us to make decisions in our contexts — starting from whether we should put that extra if in the code to whether it’s worth to work on a certain project, or maybe we need to rewrite this service, drop that one altogether, change the communication protocol, and migrate our servers from one data center to another.

Sometimes we must choose between what is ideal and what needed to be done. Most of the time the purpose of our work isn’t to be ideal, it’s to get things done so other parts of the organization can perform their duties. Yet, deviating too much from the ideal form will bring future mess. So we need to keep a balance when making the decision.

At times, we might need to handle emergencies, issue major changes to the product, and rearchitecting the systems that weren’t very well-designed at the beginning — for a good reason. It might not always be fun to do, but it’s an important part of the job and can be critical to the performance of the overall organization.

Many problems might seem intimidating at the beginning, but a lot of them aren’t unsolvable. After gaining more experience, our ability to solve the problems will improve.

Don’t forget to congratulate ourselves whenever we managed to solve a tough problem we couldn’t have solved before — as it marked another milestone in our growth as an engineer, and also as a human being.