But it worked in Dev…

Joe McGrath
6 min readJan 23, 2017

--

Remember, every line of code you write, every test, every design decision with a few exceptions should target Production.

What do I mean by this?

It is easy to abstract yourself away from Production. We write code on our device of choice. If we can we run a suite of tests locally which will tell us if it performs as expected. We test it locally using tools such as Vagrant, or if we are lucky by running it in a container to further make sure that the code does what it is supposed to.

We check it in to our source code repo and it’s tested further — I hope. Automated tests can run on your CI infrastructure and tell everyone what you did wrong. Peers will review and comment on your code, telling you what is good, what is bad, and what you need to change — again hopefully.

If it gets the rubber stamp of approval it will be merged into your mainline branch, where further tests will occur, and then hopefully automatically deployed to your pipeline environment.

This should kick off even more tests, browser tests, integration tests, all the tests! Tests are good, we like tests.

So all the tests are good in your dev and test environments. Sign off — if appropriate, if you have full automated CD and go from checking your code in all the way through to production then I take my hat off to you and bow down in awe of your clear magnificence and superiority. Then boom — it doesn’t work…

Well why not?

As I said at the top of this it’s easy to forget about Production. Often Production environments will operate with a number of constraints that you certainly won’t have within your local environment, and for whatever reason have not been replicated within your pipeline.

New Environment variables that you require in your application may not have been added for the production environment and so your application won’t start.

Permissions on databases are a bit more restrictive and so you need someone, or something to create a new user that you expected to be there.

Networks may be slightly more locked down and there is a process to get that new port you decided to use for your application opened up in the intrazone firewalls so that your new service can communicate as appropriate.

That third party service that you integrate has a specific development api and another for production, with different keys, and you didn’t get one for production. Or you need to be on a whitelist to access and again this hasn’t been communicated to the 3rd party, or you have no facility in Production to do this and it needs to be built.

You are pulling down dependencies directly from the Internet when you are running your Configuration Management code but your production service doesn’t have outbound internet access.

In Dev and Test the micro services all run on the same node — save costs after all. But in production they are split out in various ways, and don’t necessarily have direct access to the same resources, but again it has maybe been assumed that they do.

You may have to deal with a Managed Service Provider who has built your infrastructure from a gold image that they provide, and then apply a level of hardening and configuration on top of that. On top of that they may mandate certain polices that restrict what commands users can and cannot run. You might find that some of your deployment scripts won’t run because they assume the user will have the correct permissions — as they did in dev/test.

You have implemented a service that is available from your hosting provider, be that Azure, AWS or A.N. Other but this has not been approved or accredited by the responsible team on the customer side and so a complete reworking is required. This one can’t actually be caught by tests usually but it is still an example of something we need to take in to consideration.

You have a database migration that runs like a dream in your dev environment with its 100 records, but production has 10 million and suddenly you’re looking at 3 days for your migration to complete because you perform a table scan for every record you want to update.

This all sounds like an awful pain.

Well it often can be and sometimes there are only a handful of people who are aware of what these restrictions might be. An easy way to help focus the mind on this way of thinking is to start it early in your development process. Its not necessary for everyone in the team to know every single restriction, but if this detail is known it can be captured in the stories that are being written.

If anything requires clarification you should be able to speak to your architects about it, and likewise if something changes then The Architects should communicate this as early as possible to the rest of the team.

What can we do?

When we start building a service it is easy to focus what is directly in front of us. Get our local environments set up, get the CI Server configured to build artefacts and run our tests and then get started actually building things. having something to show at the end of our sprint, progress and features, not to forget Pointz! In this rush it is easy to forget Production.

If we start thinking about this early in the development process then it is easier to build in the restrictions and requirements that may be in place in the Production environment. This also helps us drive the design of the Production environment from an Infrastructure and services perspective. It can help start a lot of the conversations that we may have to engage in with other teams on the customer side, such as Assurance, Security, and the Support Teams where appropriate.

I’m not saying that we have to implement Barracuda WAFs in all our dev environments as this is just overkill, but if we can mock it up then we can simulate these things earlier in the pipeline and as such catch them early before they become an issue.

The same goes for any other differences that may be present in the Production Environment. Services should be designed to be logically separate and configured as such. They should be able to exist in isolation, and coexist on the same hosts with little or no configuration changes required.

Your pipeline should include an environment that is as close to production as possible, including any additional hardening or technologies that are in place.

If at all possible you should also have a realistic data volume here as well. Actual production data would be great, but this is not always feasible so look to generate or perform some sort of acceptable obfuscation and use that instead. This should allow you to catch any of these issues before they go to Production, because we really don’t want to have to roll back a deployment in Production, and then answer the awkward questions about why it went wrong, or why its taking so long.

An added benefit to starting this process off early is that we are less likely to be caught out by unexpected constraints which suddenly get dropped on us one sprint before go live and hilarity/panic ensues to duct tape a solution together in order to get our service out the door.

We can also look at the services available from our hosting provider if appropriate, and how we can use these to make our lives easier by offloading the resiliency and management of complex systems to them, freeing us up to build the parts of the service that actually do what we are being asked. Or a solution that we have in Dev may not be right for Production and so we need to find a replacement.

So the next time you write a line of code, or develop some new functionality, and if it is appropriate, you should ask yourself: “How will this run in Production?”

--

--