Introducing Kōjō (工場)

Brad Wilson
Neighborhoods.com Engineering
4 min readJul 26, 2018
With Kōjō, we are going to space today.

At Neighborhoods.com Engineering we have tech debt like any other organization that has ever written any code. One of our more problematic technical children is our legacy task “manager” system called Looper.

Looper is aptly named. It loops and uses pcntl to split off child process to manage worker processes. It more or less blindly runs background tasks without much insight into what it’s actually doing. Sometimes Looper sleeps. Sometimes Looper crashes. Sometimes Looper fork bombs.

Looper is not a lot of fun in production.

Looper started life with some pretty simple tasks around managing a few data feeds and SQS queue workers. As we grew, it grew, but it wasn’t necessarily built to grow in the ways we needed it to. However, as much as Looper is a pain now, it taught us a lot about where we wanted to go, and there is a lot of value in that.

Earlier this year we set out to solve our Looper problem. When listing our design goals we wanted it to address all of the pain points that we experienced with Looper.

We wanted the following

  • Fast - low machinery overhead. We wanted the service to execute business logic as fast as possible.
  • Transparent - provide easily understood insight into what jobs are being worked and how they are behaving.
  • Sandboxed - if a worker is misbehaving, it should not affect its neighbors.
  • Scalable - do exactly as much work as is necessary. If there is nothing to do, be quiet and idle. If there is work to do, scale immediately and appropriately and then scale back down after the work is done.
  • Resilient - the world is dark and full of terrors.
  • Adoptable - be able to migrate existing workloads without much, if any, refactoring. Be able to migrate existing loops/etc. to massive parallelism by making it easy by default. Be easy to deploy and run.
  • Stable - use widely adopted and mature technologies.
  • Distributed - all components should work automatically across any arbitrary number of execution environments that come into and out of existence dynamically.

We looked at many awesome projects that facilitate similar behavior, but found that none of them checked all the boxes that we wanted. In addition, we wanted to be able to easily change what collection of technologies that we used and add features and improvements aggressively.

In order to accomplish this we had to build a few different parts. We needed

  • An actor aware cooperative distributed mutex system.
  • A process model.
  • A task management system.
  • A dynamic and static scheduling system.

Actor aware cooperative distributed mutex

We want to be able to know when a process had crashed on any given execution environment or encountered network segmentation without using polling or arbitrary timeouts.

Process Model

We want to isolate worker processes and be 100% interrupt request driven, meaning our processes react to events instead of polling. Controlling the behavior of the product became a simple asynchronous pub/sub communication model.

Task Management

Jobs have state, we want to be able to understand where it is in it’s life for any arbitrary job that is being worked.

In addition, we want to be able to have metrics about how a particular job is behaving. We want to know how many times it has worked, crashed, been retried, or held. Also, we want to be able to stream in real time the worker processes and their business logic behaviors to ELK and CloudWatch.

Dynamic and Static Scheduling

We want to be able to dynamically schedule jobs to be worked with both PHP and REST APIs. In addition, we want to able to schedule jobs to be worked from any cron expression. Because, idk, some vendor has to have something sent to them at exactly 2:26 AM on the last Thursday of November only on leap years and oriented to Eastern African Time. Don’t worry though, Kōjō has your back.

The Birth of Kōjō

We ended up naming the product Kōjō (工場) which translates from Japanese to “factory”. Probably weeks were lost on naming it; one of the two hardest things.

This even included adding a console command alias to start the service

$ vendor/bin/kojo gō-gō

We have mō-mō fun here (I’m sorry).

Everything that I have talked about so far is sufficiently complex that in order to appropriately cover each of their behaviors I will be writing several follow up articles on how we approached solving for them.

Currently we are in the process of migrating our existing Looper tasks to Kōjō jobs, and it’s going splendidly. We are simply giddy about what we are seeing in production.

Git (see what I did there) Started With Kōjō Today!

In fact, we are so happy with it’s behavior that I am thrilled to be able to announce that we have open sourced it and it is currently available on GitHub and Packagist!

At Neighborhoods.com Engineering, not only are we in a constant state of evolution, but we also highly value shipping early. Kōjō will absolutely be updated by the next time that I blog about it (which will be very soon).

We will improve how it works under the hood, as well as add new features to user space. We will also continue to update and improve guides and examples on how to use it.

We are very happy and excited to be able to share this first release with you today, and we are very much looking forward to your feedback and PRs. We strongly believe in open source projects and are a consumer of many. Today we hope to give back a little, and get back a lot by exposing Kōjō to the vast ocean of smart brains for review and use.

Please stay tuned for deeper dives and more to come from Kōjō!

--

--