Simulating Customized Chaos in Golang using Toxiproxy

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” ― Abraham Lincoln

Accident is always unexpected and sometimes unavoidable.
Some of the infrastructure might just be having a bad day and gone failure.
Or there might be a datacenter outage causing a chaos in your micro-services environment.
Or maybe, there might be a bad deployment that makes your services unable to serve the request.

Whatever the reason or the problem is,
In the utopia of the Software System,
And in all the hopes and dreams of all the engineer around the world,
We really want to live a life without any incident happen in our services.

That’s why in the Software System, We have Unit Test, Integration Test, Regression Test and Acceptance Test to furthermore build confidence and improve the safety in the software development process.

And to help us simulate the chaos in network condition whether for unit testing or integration testing, we could us Toxiproxy.

1. What is Toxiproxy ?

Toxiproxy is a framework for simulating network conditions. It’s made specifically to work in testing, CI and development environments, supporting deterministic tampering with connections, but with support for randomized chaos and customization. Toxiproxy is the tool you need to prove with tests that your application doesn’t have single points of failure.

Toxiproxy support several languages such as :

  • toxiproxy-ruby
  • toxiproxy-go
  • toxiproxy-python
  • toxiproxy.net
  • toxiproxy-php-client
  • toxiproxy-node-client
  • toxiproxy-java
  • toxiproxy-haskell
  • toxiproxy-rust

And have severals Toxics customization to help simulate the chaos that we are looking for :

  • latency : Add a delay
  • down : Bringing a service down
  • bandwidth : Limit a connection
  • slow_close : Delay the closing of TCP socket
  • timeout : Stops all data
  • slicer : Slice the data into small bits and could delay between sliced package
  • limit_data : Close connection when data exceeded the limit

2. Implementation in Golang

  • First, Download & Install the toxiproxy files in here
    This will install the toxiproxy-cli & toxiproxy-server
    Also start the toxiproxy-server
  • Let’s do the initialization for the client and proxies needed.
    We could also do initialization in the toxiproxy-cli binary.
    We will only use the single proxy with redis
  • Simulate the case Redis down by disabling the proxy
  • Simulate the case high latency redis by injecting latency toxic
  • Here is the full code
package mainimport (
"testing"
"time"
"github.com/Shopify/toxiproxy/client"
"github.com/garyburd/redigo/redis"
)
var client *toxiproxy.Client
var proxy *toxiproxy.Proxy
var proxies []*toxiproxy.Proxy
func init() {//Init the Toxiproxy client
client := toxiproxy.NewClient("localhost:8474")
//1. Init single proxy
var err error
proxy, err = client.CreateProxy("redis", "localhost:26379", "localhost:6379")
if err != nil {
panic(err)
}
//2. Init multiple proxy, we could read the configs from file and populate the proxies
proxies, err = client.Populate(
[]toxiproxy.Proxy{
{
Name: "redis_1",
Listen: "localhost:26380",
Upstream: "localhost:6380",
},
{
Name: "grpc",
Listen: "localhost:29090",
Upstream: "localhost:9090",
},
})
if err != nil {
panic(err)
}
}
func TestRedisDown(t *testing.T) {
//Disable proxy so no connection can pass through
proxy.Disable()
defer proxy.Enable()
// Test Redis Down
_, err := redis.Dial("tcp", ":26379")
if err == nil {
t.Fatal("Unexpected success connection to redis")
}
}
func TestRedisSlow(t *testing.T) {
//Add Toxic latency to simulate slow response from redis
proxy.AddToxic("", "latency", "", 1, toxiproxy.Attributes{
"latency": 500,
})
defer proxy.RemoveToxic("latency")
// Test latency redis
start := time.Now()
//Connect to redis proxy
conn, err := redis.Dial("tcp", ":26379")
if err != nil {
t.Fatal("Connection to redis failed", err)
}
//Do any operation to redis proxy
_, err = conn.Do("SET", "key", "value")
if err != nil {
t.Fatal("Unexpected error command to redis", err)
}
//Check the process time
elapsedTime := time.Since(start)
if elapsedTime < 500*time.Millisecond {
t.Fatal("Unexpected fast response time from redis:", elapsedTime)
}
defer conn.Close()
}

3. Pros & Cons

Toxiproxy is chaos engineering tools that based on proxy service ,thus making the design quite complicated since we need to manage the route of application and over-complex the deployment process.

Compared to its monkey counterparts from netflix, Chaos monkey is the first open source chaos engineering tools that has more integration in deployment process but only have one experiment type (shutdown) while Toxiproxy have quite a varieties of toxics.

Thus, we could conclude that Toxiproxy have these several pros and cons.

Pros :

  • Easy to configure and use, great starter to learn more about chaos engineering
  • Varieties of toxics

Cons:

  • Proxy based tools over-complex the deployment process
  • Only network based toxics. Not recommended for validating production systems

4. In Summary

Everyone knows that accident is unexpected, but that doesn’t mean that we don’t have to be prepared. By simulating chaos in our testing process (Unit Testing & Integration Testing), we could identify failures before they become outages. Removing any Single Points of Failures and build more resilient system. In other words, to “break things on purpose” is to furthermore prepare for the real breaking.

In Tokopedia, Resilience is the core of our system. The moving from monolith to microservices is the first step toward the journey of reliability. And we keep improving in each services through series of improvement, testing, simulated accident. Making sure that there will be a 0% unexpected downtime.

As always, we have an opening at Tokopedia.
We are an Indonesian technology company with a mission to democratize commerce through technology and help everyone achieve more.
Find your Dream Job with us in Tokopedia!
https://www.tokopedia.com/careers/

https://github.com/Shopify/toxiproxy

https://www.gremlin.com/community/tutorials/chaos-engineering-the-history-principles-and-practice/#:~:text=Chaos%20Engineering%20is%20Preventive%20Medicine,end%20up%20in%20the%20news.

https://www.gremlin.com/community/tutorials/chaos-engineering-tools-comparison

--

--

--

Story from people who build Tokopedia

Recommended from Medium

What is “Scrum”?

Convex Hull Problem

Get Together With ElasticSearch

Features of S-Wallet (Multi Lingual)

POCKET CHANGELOG V0.2.1

Debugging in Different Browsers

5 Quick Ways to fix dns_probe_finished_no_internet Error | Webeeky

Firebase remote config on Android

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Timothy Agustian

Timothy Agustian

Proud Indonesian and Tokopedia Software Engineer

More from Medium

Let’s GO! Part 1: A Word or Two About GO

Echo 101: Get service metadata from Web UI

2 Different Ways to Implement BFS in Golang

GoReleaser 1.3 — the first of 2022