I was recently asked, almost out of the blue, about an old repo of mine in Github. This person was mentioning how they had been looking over my Github account (a wasteland of half finished projects) and they wanted to know more about cp-investigator. The repo exists because of the following:
- I built a data replication engine for my employer RJMetrics which depended upon Sqoop.
- Because it used Sqoop, we had to use an environment which had Hadoop installed.
- Because it needed Hadoop and because at the time we were on the AWS stack, we used EMR.
- We had to be able to deploy our codebase so all of our deploy tooling was now built around EMR, it’s lifecycle, etc.
- We ripped out Sqoop, rewrote the application, decided that because we had just changed a lot of things this was maybe not the best time to change the deploy tooling.
- We now had a simple Clojure/Java application being run by Hadoop.
- Months later, we started encountering Out Of Memory exceptions and I started having very sleepless nights.
- We did something drastic, and I wrote code which would spin up child processes (new JVMs) off of the root JVM which would run our jobs. Now an OOM exception would only take down a child process and life would be fine.
- In order to debug issues with spinning up this child JVM without having to wait for the painfully long boot times of my main repo, I created a little toy repo which I could spin up and teardown in under a minute. My previous iteration time had been something like 15 minutes.
I learned a lot from that project, and hope to never have to use most of it. But, the code that I ended up writing to spin up the child processes was written in such a way that you could interact with it almost like a normal clojure.core.async thread…at least for usage in the context of RJMetrics inside the DBReplicators internal repo. Since then I have poked around with trying to make a core.async implementation where you can choose between 3 thread types:
- Go: used for non io bound operations, mainly good for lightweight operations and coordination
- Thread: used for operations which need memory and are probably going to be slightly blocking. Relies upon a fixed thread pool.
- Process: used for an operation which could deliver a fatal OOM. Spins up a child JVM which is managed just like a normal core.async/chan.
My biggest issue has always been trying to do this cleanly. Sockets seemed like the logical way to allow the two JVMs to talk with each other, but those were a pain to manage…till Clojure 1.8. In 1.8 we are given tools for managing SocketServers easily, which means Sockets are now much simpler.
My hope is to continue blogging about trying to build this extension to core.async and share the trials and tribulations along the way. My first goal is to get my classpath cloning Process code into a repo which others can use as a library. It’ll have caveats, but it’s the first step. After that, implement a simple library which shares Transit between two JVMs. Then I’ll need to hook things together and play around with Macros and how to serialize or interpret functions over a socket (I have a couple ideas for this). Right now, I’m planning to use this repo by Bguthrie as a sort of guidebook.
I’m hoping to post a follow up next week with some progress (keep myself accountable).