Introducing Hystrix for Resilience Engineering
by Ben Christensen
In a distributed environment, failure of any given service is inevitable. Hystrix is a library designed to control the interactions between these distributed services providing greater tolerance of latency and failure. Hystrix does this by isolating points of access between the services, stopping cascading failures across them, and providing fallback options, all of which improve the system’s overall resiliency.
Hystrix evolved out of resilience engineering work that the Netflix API team began in 2011. Over the course of 2012, Hystrix continued to evolve and mature, eventually leading to adoption across many teams within Netflix. Today tens of billions of thread-isolated and hundreds of billions of semaphore-isolated calls are executed via Hystrix every day at Netflix and a dramatic improvement in uptime and resilience has been achieved through its use. The following links provide more context around Hystrix and the challenges that it attempts to address:
Maintaining high availability and resiliency for a system that handles a billion requests a day.medium.com
How our API and other systems isolate failure, shed load and remain resilient to failuresmedium.com
Hystrix is available on GitHub at http://github.com/Netflix/Hystrix
You can get and build the code as follows:
$ git clone git://github.com/Netflix/Hystrix.git
$ cd Hystrix/
$ ./gradlew build
In the near future we will also be releasing the real-time dashboard for monitoring Hystrix as we do at Netflix:
We hope you find Hystrix to be a useful library. We’d appreciate any and all feedback on it and look forward to fork/pulls and other forms of contribution as we work on its roadmap.
Are you interested in working on great open source software? Netflix is hiring! http://jobs.netflix.com
Originally published at techblog.netflix.com on November 26, 2012.