Image for post
Image for post
PostgreSQL performance degrades rapidly with more connections. Credit:

Over the last few years, the software development community’s love affair with the popular open-source relational database has reached a bit of a fever pitch. This Hacker News thread covering a piece titled “PostgreSQL is the worlds’ best database”, busting at the seams with fawning sycophants lavishing unconditional praise, is a perfect example of this phenomenon.

While much of this praise is certainly well-deserved, the lack of meaningful dissent left me a bit bothered. No software is perfect, so exactly what are PostgreSQL’s imperfections?

I’ve been hands-on with PostgreSQL in production since 2003 with deployments ranging from small (gigabytes) to…

“It tastes like a foot” — Cliff Moon (2020)

A little detour from our regularly scheduled program. I find the pre-made mixes in the store to be lacking in flavor and considering that… quite expensive. Now that the bars are closed thanks to the Roni, I’m stuck fending for myself.

You’ll need a blender of some kind (or a very powerful arm) to accomplish this one. And probably a lack of health conditions that prevent consuming more than the recommended daily allowance of sodium. Enjoy!

Image for post
Image for post
Not an actual photo of my bloody mary. Trust me, I’ll never make anything this good looking.

Makes around 12 strong ones 💪


46oz bottle of V8 2 lemons, juiced 4 tbsp…

After years of trials and tribulations in San Francisco apartments, the single best thing I did for my home network was to switch most devices to the 5GHz band and limit it to 20MHz channel width. Once I did that pretty much all of my WiFi woes went away. But why?

Very recently, as many of the more blessed among us have shifted to working from home to reduce the spread of the SARS-CoV-2 virus, it has become apparent that many of the home WiFi setups that were perfectly fine for flipping through Instagram or streaming movies are an absolute…

It’s 10:30 on a Monday morning. Daily standup time. You walk into the room where it’s been held every day since the company moved into this building. The room is named Anyone Can Change The World, which is also the title of the talk your CEO gave over five years ago when the company debuted at TechCrunch Disrupt.

Your teammates begin to shuffle in, one after the other. Engineers aren’t usually morning people, and it is a Monday, so it’s not really the cheeriest crew. You pull out your iPhone and open up Hacker News. Something about “Goodbye Microservices” is…

“You should go all in on your single points of failure,” I say with obnoxiously casual confidence, moments before incredulous glares dart my way.

In 2015, Amazon’s DynamoDB database service suffered from a multi-day outage in their US East Coast region that had major ripple effects. Queueing (SQS), auto-scaling for compute (EC2), and metrics (CloudWatch) services within AWS were severely impacted as their core functionality depends on DynamoDB. There was much weeping and gnashing of teeth.

It seems reasonable that Amazon would then try to reduce dependency on DynamoDB in the future, but they did exactly the opposite!

This was extracted from my piece on counting production incidents to reduce its footprint.

Unfortunately for die-hard metrics folks, the reality is that the higher-ups really do want a single figure that fits nicely into one cell of a spreadsheet. Here’s a rough methodology that can be a starting point, but do take liberty in tuning for specific needs:

  1. As a rule of thumb I just invented, each system should have around 5–10 quality metrics. Find the target acceptable quality threshold for each metric. Use data to drive this discovery. For example, if there is strong evidence that unrecoverable request…

“What can be counted doesn’t always count, and not everything that counts can be counted.” — William Bruce Cameron

Image for post
Image for post
Source: US NTSB Aviation Incident Database

Aviation incidents and fatalities go down over time. The chart above tells us that much. It also tells us something else: fatalities per incident are all over the place! Just measuring the number of incident reports would obscure this very important point.

High-performing Internet service teams all have one thing in common: an effective incident management and blameless post-mortem process. …

Image for post
Image for post

This is a graph of latency in milliseconds. It’s the latency of Segment’s streaming pipeline fetching a critical piece of customer-specific configuration. This pipeline often handles north of 500,000 messages per second. Normally when I see a graph like this, it makes me very anxious. How can the exact same work suddenly become over 20X faster?

The story starts two and a half years ago, when it seemed like major incidents were happening all the time at Segment. We knew that very soon customers would begin to lose trust. To stop the bleeding, some extra process¹ was introduced and developers…

Image for post
Image for post

This blew me away. I have been using AWS in production for ten years. Until today, I made the assumption that their Relational Database Service (RDS) carried around a 33% premium over EC2. That’s the way it started out. AWS usually drops prices over time, right?

RDS uses the same MySQL and PostgreSQL software available for download to the public. Other than that, it’s basically a bunch of well-oiled provisioning, failover, and backup scripts¹. The tacked on support isn’t life-changing, to say the least. But a 33% premium seems pretty reasonable to avoid some drudgery. …

I’m a bit of a low key UUID buff, and not just the RFC 4122 variety. I use the term loosely, as a category for unique identifiers generated by stateless, distributed algorithms. I even wrote a Go library called ksuid, a stateless and lexicographically sortable mash up of Snowflake IDs and RFC 4122 UUIDs.

I think developers overuse UUIDs. In short, if you don’t need to generate unique identifiers in a distributed, stateless fashion, you don’t need UUIDs. They will only make things worse.

If you’re using a relational database as your application’s source of truth, it has some variant…

Rick Branson

I do Software Engineering on High-Impact, Large-Scale Internet Services.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store