Live System Testing Done Safely and Accurately

New features and products are released all of the time! They often undergo alpha testing by internal users and beta testing by select customers before their full release. Since the feature or product is new, it is generally understood and expected that issues could be encountered.

What about releasing new architecture for an established feature, product, or service where issues are not expected?

There are few things more nerve-racking than implementing a brand new system architecture for a critical, high-throughput service AND releasing it! Even the smallest issue can quickly become disastrous for highly utilized services.

Imagine a traffic signal…

A smarter approach to filtering by using a Star Schema

We live in a world that is bursting at the seams with data. According to, the largest online storage providers estimate their raw storage to be in the millions of terabytes. Humans aren’t able to effectively analyze large amounts of data all at once. That’s a job for AI. In order to see the bigger picture, we must focus our analysis down to no more than a few dimensions at a time.

In other words, we must sequentially analyze individual subsets of data. It follows then that our analysis is heavily reliant upon our ability to efficiently filter down…

A pragmatic approach to computed metric alerting

“False Alarm”

  • How many nights have you or your operations team been woken up unnecessarily?
  • How many critical alerts have you seen resolve within minutes of firing?
  • How many times have you had to investigate and declare a “false alarm”?

The answer is most likely “too many”. Perhaps attempts have been made to “tune” these fragile alerts individually, but somehow they keep resurfacing; forcing you to play a never-ending game of whack-a-mole. This brittleness points to the presence of a fundamental flaw in your alerting design which must be corrected in order to reach a stable solution.

Understanding the current alerting design

Before jumping into a new…

Once upon a time, Rails had a feature under ActiveRecord called: IdentityMap.

Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.

When does Rails pro-actively cache ActiveRecord objects?

Answer: On associations when the inverse_of option is properly configured or inferred by convention.

Navigating to the associated record(s), then navigating back, should not incur an additional query.

# 1 query for blog_posts table
> blog_post = BlogPost.first
SELECT * FROM `blog_posts` LIMIT 1
# 1 query for comments table > comment = blog_post.comments.first SELECT * FROM `comments` WHERE blog_post_id =…

Once upon a time, there was an AWS S3 outage for the US-EAST-1 region which lasted ~5 hours. This led to cascading outages for many online services. Sites became unavailable. Financial services were interrupted causing significant monetary loss. In Invoca’s case, some calls were negatively impacted since some IVR voice prompts were not cached across all servers and could not be downloaded from S3 in realtime. However, we were able to quickly remedy the issue by syncing our IVR voice prompt cache across all relevant servers. Calls are our lifeblood and any negative impact is unacceptable.

Official outage response:

Cross-Region Replication

Gabriel Kent

Senior Software Engineer at Invoca

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store