Configuration Management: A Guide for Startups

Jake Miller
Engineered Innovation Group
6 min readJan 16, 2023

What is Configuration Management

When I started my journey as a CTO at MetaCX, I was well-equipped with an understanding of how engineering should handle processes for things like incident management, change management, or configuration management. At least, I understood from the perspective of a significant software company like Salesforce, my prior employer.

Unfortunately, there is such a thing as too much process, especially for a startup. The problem is that startups often don’t have the resources to set up these intricate processes, let alone follow them — and then prove they’re following them.

Since I founded and launched the Engineered Innovation Group, I’ve evangelized the need for startups to adopt lightweight processes as soon as possible.

Processes for a start need to be right-size, and, more importantly, they should align with your organization’s policies that prepare you for SOC2, ISO27001, or other compliance and security frameworks. The frameworks are good guides for establishing policies and procedures because they are well-established and industry standard expectations.

Configuration management is a set of policies and procedures that prescribe how configurations will be created, stored, and managed, and changes are approved.

A change is any configuration modification by adding, removing, or updating a value, service, etc. At its core, a change is anything that can affect the behaviors or structure of an environment. For example, if a port needs to be opened for Redis within a VPC, then that request would be made something like this:

Configure Redis’s Port to 6382, and update the firewall to allow the port on the VPC between Kubernetes and Redis. Why not the default Redis port? How astute because we should avoid default ports as a security practice.

Why Do we need change management?

Change Management is necessary because it tracks what, who, when, and why a change was made. This makes determining what might have caused an incident or outage, or performance issue. The information about who approved and implemented the change is necessary for security audits and forensic analysis.

Change management in SaaS is vital from a technical perspective because it ensures that new software and updates are properly integrated into existing systems without causing disruptions or conflicts. This includes testing and validation, identifying and addressing compatibility issues, and creating and implementing a plan for data migration.

Without proper change management, there is a risk of software bugs, data loss, and other technical problems that can negatively impact the performance and reliability of the entire system.

Change management helps keep the DevOps and infrastructure teams on track with what changes are made, when and by whom. A robust change management strategy helps minimize these risks and ensure a smooth transition to new software, ultimately allowing the organization to take full advantage of its features and capabilities.

But, from lived experience, I know how challenging and time-consuming it can be, but it’s worth the effort for two reasons:

  1. Debugging and troubleshooting
  2. Security, Privacy, and Compliance

Debugging and Troubleshooting

Imagine one of your developers released a new and shiny feature into production. This was an important one. A big one! 98% of clients have asked for the feature, and you’ve just sent the release notes proclaiming this excellent new feature is now live.

An hour later, while sipping champagne and preparing to hear your users’ delight, you hear your phone buzz and excitedly look at the text message. It’s from your lead engineer, and they’ve said the system has…..slowed down. No, it’s not responding at all now.

“What do you mean slowed down?” you ask as you open your laptop and go to app.yourdomain.com. All you see is the browser loading indicator spin and spin and spin.

It was working fine an hour ago! What happened?

This is where you choose your own adventure.

Ending A

Your lead engineer isn’t sure what would have caused the issue. It was working fine an hour ago. It looks like there aren’t anomalies surfacing about CPU or memory utilization. He is concerned that perhaps they’ve added code causing an issue, but he’s unsure what it could possibly have been. He looks through the GitHub repo, commits, and sees nothing suspect. He scours the logs and sees that there aren’t any requests making it to the database even though the APIs are responding.

This takes nearly an hour until finally, after text messaging all of the engineers on the team to ask if they changed anything, an expert team member mentions that he had made a firewall change. It turns out he fat-fingered the port number. They must go into the system manually, update the port, and redeploy the machines.

By the time they’re done, it’s been 1.5 hours of downtime.

Ending B

Your lead engineer pulls up the change management board and realizes another engineer updated the firewall ten minutes ago. Ah ha! He quickly reverts that configuration change, and now the application responds again. The release was a red herring, and because of good documentation, the issue was resolved within minutes of being reported.

Security, Privacy, and Compliance

Why do you need to track security, privacy, and compliance changes? Ports, cipher and encryption keys, values to. Being able to tell the last time a change occurred for tasks like rotating cipher keys or updating database credentials.

Changes aren’t made by the same person who approves the change — which drastically reduces the risk of someone making malicious changes on their own for data exploitation or otherwise.

Plan Your Process and Policies

There is a difference between processes (also called procedures) and policies. The policy is the rules to be followed, and the process is the instructions on how to follow them.

We’re working on this solution at the Engineered Innovation Group so that you can focus on building innovation and not the mundane process while still adhering to a process that satisfies SOC 2 controls.

Here are the most important things that should be included in your startup change management policy:

  1. Changes to any environment are always recorded in writing
  2. Change to a production environment requires that the change is tested in a test environment first
  3. The person implementing the configuration change cannot be the person who approves the change
  4. What is the reason for the change?
  5. What is the rollback plan?
  6. What impacts the application, users, and business if change doesn’t work?

Environments

To alleviate friction and frustration with testing changes in a cloud environment, I recommend that you have a dev environment that can be ‘nuked’ and quickly rebuilt with something like a Terraform script. Developers have no restricted access to the dev environment. They can change any configurations, change machine types, etc. This means all the overhead that goes with change management can be bypassed while developers test their changes.

Once the dev is ready, then a change management request should be made.

Recommendation

If you’re using a platform like Jira or Trello, you’ll be able to create ticket types. This could be tracked in a spreadsheet even, but you don’t get all of the status change histories as you do in a tool like Jira.

At the very least, you will want to have a ticket called something like Change Request. On that ticket, you should have the following fields:

  • Change Requires Description
  • Environments: All | Int | Prod
  • Reason
  • Implemented By
  • Implemented Date
  • Approved By
  • Approved Date

It’s also unrealistic to believe that your lead engineer or CEO will have the time to manage this.

How the Engineered Innovation Group can Help

At the Engineered Innovation Group, we don’t just build products. We help build product and engineering organizations. We help you to define your software development lifecycle and tune the policies and procedures necessary to fulfill the process.

--

--