The sweetest fruit salad recipe for energy saving services
Chapter N. Time to enjoy the dessert
(This story begins in Chapter 0. Launching an energy-saving rocket to the stars)
You reached the release phase, we hope it took less than what it took to us. Once completed all the described steps, we got in our repository a robust, fault tolerant and fully documented and testable service that satisfies all the goals we decided.
Let’s recap the challenges we went through:
- Asynchronous: Avoid blocking the consumer while the request is being processed.
- Outbound pressure control: Protect the consumed services based on their SLA to avoid overflowing them
- Retry mechanism: Properly react when the consumed services are unavailable or have issues (either retryable or non-retryable errors).
- HealthCheck: Provide a complete health check summary and monitoring of the service.
- Profiling: Provide necessary elements to supervise the internals of our components.
- Caching: Avoid unnecessary and repeated calls.
- Human-friendly software lifecycle: Define the required behaviour and seamlessly develop the code according to it (BDD will help us on this)
- Discoverable/Testable: Facilitate the understanding of service functionalities and easily test each of them
Thanks to all the previous achievements, we could deploy it smoothly and keep it completely monitored and under control. But it was not the end, just the beginning of its evolution. Several new requirements came, the most impacting, has been a change in the expected SLA of the service, requiring it to also handle very large messages, besides the current large amount of IoT tiny messages.
During its life and to respond to these changes, we have taken advantage of all the tuning features we incorporated to the service, like:
- Scaling it up and down to properly respond to variations in the required throughput, thanks to Kubernetes
- Together with scaling, tune the outbound pressure control of consumed services by changing properties like maximum connections, acquisition timeout, maximum idle time, maximum life time… To avoid that higher throughput impacts the consumed services (our collaborators, which we want to use with care)
- Refine the retry policy by adapting the exponential backoff configuration (first backoff, maximum backoff, maximum retries, etc.) to the consumed services behavior and ensure that we are not losing a message while making a safe use of the consumed services
- Obtain a proper cache balance between success hits and its size (maximum size, expire after access, expire after write, etc.), enhancing the hit ratio, to avoid unnecessary calls to the consumed services but using a reasonable amount of memory in the service
Since its initial release, more services have joined the family, following the architecture, design principles, technical stack and completeness delivery approach. Each of them has provided a space for its evolution, improvement and refinement, which has been applied back to the existing ones, keeping all the family aligned
Our Cucumber test suite became a complete description, test as documentation? ;-) of the service features in human language: