“The Morning Paper”s of the Year

For the busiest of the busy among you, I present here my highly personal opinion of the best papers presented by @adriancolyer in his exceedingly well-curated the morning paper.

Some papers reviewed this year turned out to be classics from years and years ago, making quite a point about just how mature our space is given those ideas are still so applicable. Several were about deep learning and ML and AI (of course). Some covered topics related — often only indirectly — to the volume, variety, and velocity of data being captured and analyzed. And, finally, there were quite a few related to the new age of distributed systems, with Cloud as the new OS.

These are the few papers that I felt compelled to read through, beyond Adrian’s own very nice summaries, no matter if I didn’t quite grasp all of the material.

So here’s the list and happy reading!

1. HyperLogLog

Yep, we’re doing so so so damn much of some things on the web (searching, liking, playing cat videos) that just the act of counting them needs the invention of new algorithms.

2. Deep Learning Overview

Good an’ all. I lack much of the background to really grok all of this but I appreciated the breadth and (attempt at) context. For a less technical but vastly more accessible overview, I’d personally choose the Nature paper by YeCun et al.

3. Gorilla

At the intersection of so much that we think about (or like to think we think about) these days, this paper covers performance and scale of an in-memory TSDB and the innards that deliver the goods. Apparently we don’t tolerate any degradation in our ability to like or post, so much so that the use case of monitoring all the infrastructure that makes it possible was motivation enough to build an in-memory TSDB from scratch. The notes on their approach to time series compression were pretty interesting as well, and @adriancolyer did a great job with his summary explanation of those approaches.

4. DBSherlock

This paper was interesting as a real-world application of work happening in the broad arena of time-series analysis, and @adriancolyer summed it up well:

Note that the principles used to build DBSherlock are not tied to the domain of database performance explanation in any deep way, so it should be entirely possible to take these ideas and apply them in other contexts — for example: “why has the latency just shot up on requests to this microservice?”

5. Internet-Scale Services

Its talk of system administrators and system-to-administrator ratios of “as high as 2500:1” may make it seem almost quaint and difficult to see as applicable. Heck, it doesn’t mention DevOps or SRE even once! But this one is a true classic — most of the underlying ideas are applicable for at least 2–4 orders of magnitude increases in the system-to-administrator ratio over what it talks about as state of the art. @adriancolyer even put together a nice checklist on github to make it easy to run a quick audit over the architecture and operations of internet-scale services that you might be building. Like many other papers that make Classic status, the ideas are much more broadly applicable than the paper claims — whether intentional or not. E.g., would you ignore circuit breakers even for a small service only serving 1000s of users? (I hope you realized that that’s a rhetorical question.)

6. Decomposing Systems

A paper from 1971 has valuable insights for 2016. You throw “microservices blah blah” and “Conway’s Law blah blah” around? Read this damn paper first.

‘Nuff said.

7. Kraken

Trouble simulating production load for your cloud service? Easy! Just run the tests in production! *gulp* I suppose the easy summary of this is “chaos monkey for cloud service load testing”. If this kind of thing interests you, I hope you’ve also seen stuff like one of Netflix’s Strata talks that includes the nugget “Netflix is a log generation company that also happens to stream movies”. C’mon now, that was funny.

8. Honorable Mentions

A few papers came close but didn’t quite meet my arbitrary criteria to make the arbitrarily-sized best seven papers of the year. Here they are: