David Bowie’s Five Years

Five Years of Scala, Part I: A Retrospect

Guntur Akhmad
Style Theory Engineering & Data
8 min readOct 20, 2020

--

We grow from zero to something with Scala, here comes our retrospection.

Our backend started with Scala monolith back in 2015. As time goes we split it into several microservices — still using Scala — to achieve faster release cycle and every other great things.

In this post I want to do a retrospection of Style Theory’s backend technology that this month marks its fifth year since the first commit; from what we did right, what went wrong, and our current approach to achieving better future.

Happy Days

For years, all of our backend engineers understand Scala, our Tech Leads too. We’ve had enough of engineers and time to translate business and product requirements into Scala code. Releases hit every user story we had.

Using Lagom Scala framework helps — big time — with idiomatic code structure and built-in features most microservices need. Our release cycle becomes faster, hotfixes can be deployed faster and in a non-breaking way.

We crafted a shared library and Scala plugin that consists of reusable functionalities so that each of our Lagom microservices can leverage it to shorten development time.

Inter-service calls are hassle-free, just import other service’s API project in our build.sbt, and call any endpoints easily by invoking functions from the imported API project.

Creating unit and functional tests is a breeze thanks to the syntactic sugars that Scalatest and MockitoScala offer, mocks are easy to do, reviewing pull requests that contains many tests was easy.

What Went Wrong?

We expand our business verticals that was only apparel rental into bags rental and then preloved apparel and luxury bags reselling. We also expand our business from Singapore to Indonesia, and then Hongkong.

With growing business, comes great tech debts.

(1)

Features we need to implement was increased enormously, we basically hit the upper limit on how much we can contribute before we plunge ourselves into the burnout abyss.

Our engineering center is at Jakarta, Indonesia. We were short on backend engineers since hiring was not going smoothly mostly because we went the Scala road. In fact, the supply of backend engineers that have previously work with Scala in Jakarta area is far from enough from our demand.

It is true that everyone can hire backend engineers that did not have a previous experience with a specific language and we do exactly that. It was previously said that new hires can make their pull request within a week of work, but the reality of the situation are:

  1. Such outcome does not apply to all scale of contribution, every person can make pull request but in the end quantity is not what really matters.
  2. Since our product goes bigger than ever, all of engineers busy with implementing features and fixing critical bugs, this makes on-boarding process becomes crippled and far from ideal.
  3. The steep learning curve of Scala for those who haven’t experienced functional programming made this worse; most of our backend engineers came from non-functional languages. Averaged, it took a month for our freshly-onboarded engineers to be able to comprehend Scala basics.
  4. The statement in the previous article is valid at the time it was written, but it doesn’t stand the test of time.
Stand!

There are other languages that are interesting, and sexy, and exciting. Scala is a great example, but trying to understand what happens in a Scala program, it takes a PhD. — Brian Ketelsen, Microsoft.

(2)

With a bigger product and more business verticals to handle, then a huge amount of production bugs, service issues and outages was popping here and there.

Debugging issues in microservices is not easy, it went worse because without a good distributed tracing capabilities in our current microservices implementation, we cannot properly detect which specific service(s) that causes other service(s) to fail. Due to this shortcoming, oftentimes we blamed the incorrect service(s) in the RCA document. This is not healthy.

We are short on time and man power to do research and tinker around with Scala — and Lagom — inner workings to achieve custom functionalities to fix all of distributed tracing issues from the ground up.

All we can do was try our best within our knowledge: optimizing SQL queries, rework our codes, re-allocate service resource budgets, re-assign our containers into their own node pools in Kubernetes. These approaches only fix some issues in a short term manner, while incurring a tech debt in the longer-term.

(3)

Inter-service communication went nuts since we underestimated this by relying on code-based API contract. For example, we import Service X API project to call their defined endpoints from Service Y implementation project.

Some times this approach eases inter-service integration since everyone can just import other service’s api-project to their build.sbt. This approach surely eliminates the need to write additional code to call other services.

Happy dependency.

However, along the way this gave birth to the dreaded dependency hell.

Service X calls on Service Y, while Service Y calls on Service Z, then Service Z calls an endpoint in Service X; while this kind of dependency creates no fundamental problem in microservices design, on the other hand this creates circular dependency in the code-level. All hell breaks loose every time we need to bulk update several api-project in these dependency chain.

An illustration of a cyclic dependency we came through.

This tech debt arises because Scala permits cyclic dependency and we did not aware of this feature. This negligence due to the fact that we need to get things done in a timely manner while there’s no person that can prevent this wrongful action from even happening. I tell you this again: everyone was busy implementing features and fixing bugs.

(4)

Writing tests is an integral part of our backend engineering. We grow our product from just a handful into this huge product now. With every feature we add or change, we modify or create several unit tests, functional tests, and also integration tests.

Writing tests in Scala using ScalaTest and MockitoScala is fun because their feature is complete and they offer syntactic sugars everywhere so our test suites are easy to read and modify.

Back when every microservices we manage only got a handful of features, running test takes no significant amount of time, but today most of our microservices are larger than ever before, test runs take so much time and felt too slow, they also often fail rather mysteriously.

We don’t really know if we wrote our tests incorrectly or just a configuration issue but with our current test configuration, we waste so much time running our tests in our local machine because they are super slow. They are terribly slow that we can arrange a literal ping pong tournament in our Senayan office — of course, it was before COVID-19 happened, now we can only talk to our pets or plants since we go full work from home.

Ping pong in our Senayan office.

When running tests, most of the time they stopped halfway because out of memory, even in Jenkins! Other times it fails due to incomprehensible error; our test just spits lengthy runtime stack trace in the console, googling the stack trace without any luck, re-running it several times fixes them rather magically. Until this day there is too much unpredictable behavior when running the tests.

It was tedious and we waste too much time just for running our tests. We learn the hard way that syntactic sugars in tests don’t guarantee a fast and reliable test execution.

Speaking of time spent, running our microservices locally require several minutes each cold starts because the compilation is slow. Building docker containers in Jenkins also took noticeable duration even that we already cache our dependencies artifacts in the runner machine. Deploying to Kubernetes also took some noticeable time due to big container size and also affected by JVM cold start.

Path to Resolution

It took us a long time to do it but the issues are mostly identified, now we are proceeding to fix them!

After went through many discussions, debates, and mindful thinking we came to strategic and long-term actionable items that we pick up regularly with every resource we got. Some of the items are currently in the works and some are still in our backlog.

  1. Create technical documents for microservice design to design better microservices so that we can avoid microservice that are too large to handle, too complex to comprehend, and possibly will break and slow down the test run.
  2. Action item {1} also means that we should design a better dependency graph to avoid complication when modifying any of the dependency in the chain.
  3. Replace sbt with mill. Scala communities have said good things about mill especially about the improvements in time spent on each compilation.
  4. Make dedicated time to inspect Scala and Lagom inner workings to fix our distributed tracing issues.
  5. Implement, experiment, and benchmark other languages. We are currently incubating GO and NodeJS as our second-class languages, and the results look promising since most of our year-long issues were handled gracefully from the ground-up in both languages. I am planning to share the results in the upcoming parts.

We are not perfect; but we constantly learning

Identify and acknowledge where we did wrong, synthesize those into actionable items, and execute them strategically are what we do at Style Theory.

Thank you for reading this long read, let’s have a constructive discussion.

Another five years coming in hot!

--

--