Dev, Ops and the fine line in between

DevOps. A term which has its presence registered prominently in almost all of the technical forums and blogs. A lot of tools built by many promising startups to make ‘DevOps’ easier. Each of them making lives of engineers and in turn users of many business critical applications, simpler one step at a time.

Every one of them has their own view of solving DevOps problems which is good. After all, not all applications are same, neither are the engineers that build and manage them. Some of them need to scan the application logs, some based on certain workflows, some of them the server statistics, some based on real-time using accessing patterns and behaviors, some based on commits going to their RCS and some need to include some hooks in the code.

Everything else is pretty much fine. When it comes to adding some DevOps triggers into the code, some or rather most of us give it a second thought.

An Example (read as persuasion). When some important snippet of code, like lets say, user login, throws out an error cause it couldn’t connect to the database to fetch some vital information, user’s login and profile details in this case, the user sitting in front of the computer gets a simple familiar error message “Login Failed” or “Invalid Credentials” in case of less mature application or if the developers had applied sense of humor, “Something went wrong”. Now the user of this ‘StockMarketBanking IGotToGetThisWorkingLikeRightNow’ application panics and tries to call support only to be put to hold and gets a message that their waiting token number is 91. There are numerous others on the line waiting to report the same. Each of will be promised that the Operations team are on it all hands literally hacking stuff to get this resolved.

The engineers are foraging through the logs — through the endless stacks of exceptions, transaction data and some such. The exact message they are looking for might be buried in deeper. If only the team had been notified of this earlier, they would not be facing the wrath of several angry people refreshing their screens desperately. This situation could have been handled differently and how.

Coulda Woulda Shoulda.

Setting up Runscope tests with test accounts for workflows that are never supposed to break. Adding hooks in the code to log most critical of those exception cases to Sentry. Streaming application and access logs to PaperTrail making them easier to search and log transaction friendly. Configure CloudWatch alarms on AWS to be notified of any anomalies in infra parameters. If some web application flows break for some users, plug in Rollbar which would send the JS error messages from your Angular or React application to a nicely organized dashboard instead with their browser version, Operation system they are using, IP address/Locales from which they access instead of printing something on the browser console. Most of these tools provide cool integrations with other such tools — When a Runscope test fails, bring down the application health on StatusPage, create a JIRA and assign it to the relevant person. When the tests start passing back again, mark it healthy. Yes, almost all of them can be automated within these tools’ domains and all of them discussed above has configurations to send alerts over emails, Slack or HipChat.

Or you have Jenkins, the big brother this kingdom, that can help stitch such events. Setup Github Webhooks to the Jenkins Job, that would trigger the job as soon as new code is pushed to the repository, build the code, deploys to your cloud through Chef or Puppet, executes the Tests if it passes, toggle your Blue/Green switch, execute the Runscope test, tag the build on Github and record the deployment on Sentry. All these done and dusted by the time your coffee is ready.

And having your Infrastructure as Code would come in handy at the most unexpected of situations. It is 2017. Lets ‘dev’ and ‘ops’ our application ecosystem like it is 2017. Hyphenated.