DevOps Zen
When a system enters DevOps Zen it has achieved calm, unthinking persistence. I categorize the path to DevOps Zen into four stages: three stages of life and the final elevation into life eternal.
Infant
First there is an idea, and then a team of developers are brought together to make this idea real through code. As coders bring this idea to life, they can build it so that it grows into a healthy application or they can build it so that the application requires constant care and attention just to maintain basic functionality.
To achieve DevOps Zen, coders must Code Well. Good code communicates the nature of what the system is meant to do. Good code is testable. Good code is concise.
How to Code Well
- Use software design patterns, start with Gang of Four and Martin Fowler
- DRY (Don’t Repeat Yourself): when duplicate code is identified, immediately encapsulate it into a shared function, class, or module
- Follow the single responsibility principle and write granular functions that perform a single task
- Use Test Driven Development + Behavior Driven Development
- Manage code changes using a standardized Git workflow
- Use pull requests and code reviews to collaboratively validate code changes
Adolescent
Things change. As applications evolve, they gain more features. More features mean more code. If these features are Coded Well they can be tested in an automated fashion. Every change should run these tests without human intervention, providing immediate feedback on the quality of these newly added features and how their introduction impacts the quality of the old ones.
To achieve DevOps Zen, development teams must Automate Quality. All new features must be tested as soon as possible on every code change to ensure they meet expectations. All old features must be tested as soon as possible on every code change to ensure they still work as expected.
How to Automate Quality
- Continuous Integration with build tools like Jenkins, Go.cd, Travis CI, etc.
- Continuous Deployment
- Unit tests
- Integration tests
- Functional tests
- Static code quality analysis
- Static security vulnerability detection
- Include docstrings in source code to render code documentation
- Passive penetration testing on every build
- Active penetration testing every night
- Load test against a production-like environment every night
- Configuration management and orchestration with tools like Chef, Puppet, Ansible, Cloud Formation, etc.
Adult
An idea has been made real and an application is launched. End users are using the application for the first time. These users do unexpected things. Integrated services and 3rd-party software do unexpected things. The addition of new features over time may introduce unexpected behavior. The application should constantly stream information about its well-being to a central location where it can be monitored and analyzed.
To achieve DevOps Zen, development teams and IT support teams need to be notified when anomalies are detected in the application. They need to see application and hosting infrastructure state from any point in time to discover the root cause of these anomalies.
How to Monitor All Things
- Catch and record all errors into application logs
- Aggregate application logs into a single searchable index
- Record hosting infrastructure and network metrics
- Trigger alerts to development and support staff when application or infrastructure issues are detected
- Trigger alerts to development staff when automated builds are broken
Immort
The application has achieved criticality. Lives will be impacted if it is not running continuously. The application cannot go down.
To achieve DevOps Zen, development teams and infrastructure management teams must design applications to self-heal and automatically identify, isolate, and contain issues so they cannot spread. Individual features may be allowed to lose capability, but the application will remain alive when one part is in error.
How to Live Forever
- Automatically failover to redundant infrastructure when primary resources are lost
- Active/Active data replication to standby environments in remote data centers
- Encode Circuit Breakers around service integrations
- Harden hosting infrastructure and network topology to reduce security vulnerabilties
- Intrusion Detection Systems/Intrusion Prevention Systems intercept all network traffic