Weekend Reading

My favorite computer science, data, and programming links of the last week. Also, the opportunity to come work with me in 3 different ways.

Jobs at the Square Foot

Experienced Data Engineer

Senior Devops Engineer

Lead API Engineer

Distributed systems

Blog: A Critique of the CAP Theorem | Paper: https://arxiv.org/pdf/1509.05393v2.pdf

How to build a distributed counter: http://www.cakesolutions.net/teamblogs/how-to-build-a-distributed-counter

Data engineering and operations

Techniques to Achieve High Write Throughput With Elasticsearch — The Hoard

CockroachDB Stability Post-Mortem: From 1 Node to 100 Nodes: https://www.cockroachlabs.com/blog/cockroachdb-stability-from-1-node-to-100-nodes/

How Kafka’s Storage Internals Work — The Hoard

The world beyond batch, parts 1 and 2: https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 These are a little dated now, but worth reading.

An introduction to Kafka Streams: https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/

How Spark DStreams work: https://people.eecs.berkeley.edu/~haoyuan/papers/2012_hotcloud_spark_streaming.pdf

TPOT automates machine learning for you: https://github.com/rhiever/tpot

Auto sk-learn also automates machine learning for you with SciKit ML: http://automl.github.io/auto-sklearn/stable/

Scala

API design for heaps in scala: http://typelevel.org/blog/2016/11/17/heaps.html