Going Live in a Cloudy, Agile Environment

Published in

DataReply

4 min readMar 11, 2019

In a previous life, before focusing on cloud solutions, I was working with monolithic enterprise software. There were racks of servers, reams of cables, flashing lights and spinning disks plus teams for each piece of the architecture puzzle.

A nerds dream. My dream.

To understand how requests flow through the system meant understanding how the packets moved through each of the many blocks drawn on a Visio diagram that spanned multiple A3 pages when printed.

And each of these blocks fell under the responsibility of a different team.

Then between the blocks were the networking team and the firewall team.

When it came to rolling out new releases it was like trying to herd cats. Was there any network change planned for the same time? Did the OS team have any patch planned? Any security patches for the database that are outstanding? Did anyone inform the backup team to pause the backups while we’re installing? Who informed the users? And don’t forget that the firewall changes have to be approved 4 weeks before going live!!

On the rare occasions that everything came together we’d be waiting for one of the users to remember that they’d planned an important customer event the day after the go-live so please don’t change anything.

It was a small miracle every time we brought new releases live, on-time and without unexpected side effects. The post release parties and celebrations were fuelled jointly by pizza and relief.

While there was a lot stress and anxiety involved in these situations, I still consider myself lucky to have seen these moments. The team dynamic changes at 3am when something isn’t working quite as expected. Everyone pulls together, helping colleagues from different areas and focusing on the collective success. Where else can you see Project Managers acting as debugging ducks in the middle of the night at a random customers office?

Now in the modern software process we’ve become used to CI/CD pipelines automatically taking care of deploying updates to microservices multiple times an hour. With each process decoupled from neighbouring ones, teams are empowered to push updates independently without having to jump through the hoops we previously did. Innovation is no longer hampered by organisational processes.

I was recently reminded of how far we’ve come when we deployed a chatbot for a customer (you can read more about it here). We’d created an MVP to showcase how a bot could enable the users to work independently, removing their dependancy to the engineering team for simple platform actions. There was no benefit to us being involved in simple tasks like scaling a cluster when the approval was already there, we only slowed them down and got distracted from our own tasks.

So with the MVP in place we showed it to our product owner and he took it into a meeting to show the innovative direction we were moving. The others in the meeting immediately wanted to have it, and were given the link to the (draft) documentation. Everyone left the meeting happy and pleased at how things were progressing.

Then we started seeing some traffic to the chatbot backend.

And getting congratulation emails from platform users.

Then feature requests.

As they’d left the meeting and shown the chatbot to their team members our test users had created a snowball effect. As each user saw how easy it was to manage their own environment they immediately wanted access and excitedly passed on the good news to their peers. In the 5 hours after the initial meeting over half of our user community were actively using the chatbot!

I guess we’re live now? ¯\_(ツ)_/¯

So how did we manage to go live without any long winded planning meeting, organisational workshops and a fine grained timeline?

We’d taken the decision at the start of the development process to go with serverless options wherever possible. Our bot was backed by AWS API Gateway, Lambda, Lex and Dynamo DB. All services where AWS take care of the maintenance and scaling for you. It made no difference to us if we had 1 or 1000 users, the capacity would be provisioned to support them. We also knew that we’d have to operate in the future so we’d built in security and logging mechanisms from the beginning. Along with the (draft) documentation we actually had everything in place that we needed. Improvements could also be brought live through our CI/CD pipeline without interruptions to the users.

In the end the snowball effect and excitement of the new feature had saved us the effort of planning how to rollout the bot. This confirmed our belief that building from day 1 with an eye on how to run later in productive mode is the correct design choice and makes the transition easier to manage. For us that meant

Serverless where possible
Build in security and logging from the start
Documentation should be part of the process, not an afterthought

Keep these in mind next time you start coding a new solution. Happy building and let us know what cool things you’re developing.

Data Reply is a Reply Group Company specialising in applying Advanced Analytics tools and techniques to address client needs in three key areas: Speed, Efficiency and Insight. We are an agile network of Data Scientists and Engineers in the UK, Germany and Italy with a very practical focus on technical solution delivery.

Going Live in a Cloudy, Agile Environment

Written by Gavin Perrie