ChatGPT as a DevOps force multiplier

Robert Sweetman
Version 1
Published in
7 min readOct 5, 2023
3D render of the ChatGPT logo
Photo by ilgmyzin on Unsplash

I’ve been paying for a pro ChatGPT account, from my own money, nearly from day one. It’s become so valuable to me for just getting things done that recent headlines trumpeting OpenAI and possible bankruptcy did cause me some serious concern.

This emotional response raises some questions:

  • Why is ChatGPT so important to me?
  • Is there a way for me to illustrate this, especially when it comes to DevOps-related tasks?

This blog post is an attempt to shed light on this situation.

The first question is very easy to answer and hopefully, I can provide a compelling example, based on this.

As this blog title states, ChatGPT is a DevOps force multiplier. I reckon, conservatively, that I get double the amount of things done in this space than I did before it was available.

If anyone is aware of DevOps, just on the topic of automating things, you just have no idea what you’re going to have to tackle next. Beyond some fundamentals about networking, monitoring, scripting, Linux or Windows servers and various automation approaches it is nearly as wide as it is deep. Haven’t even mentioned “which cloud” there either… You get the point.

Let’s imagine a possible ticket to tackle the second, more important part of trying to illustrate WHY tools like ChatGPT are so valuable.

I’ve got some automation which attaches AWS EC2 instances to an active directory domain which is way off hosted on an Azure VM. No fancy schmancy things like hosted AD here! No, we’re going old skool.

Now, having fought through VPN setup, firewall rules, NACLs and security groups we’ve got an SSM Document that will… wait for it… join newly created instances to this non-AWS domain. YAY!!

Bit of an issue though. Automation allows you to blow away instances at the drop of a hat but by doing this we’re also leaving machine names/objects in the domain. Something to do active directory entry cleanup on termination is needed…

While I’ve got some vague, hand-wavey ideas, the actual ‘how’ to achieve this would look something like this, in the old approach.

  1. Go look at the AWS lambda docs
  2. See if anyone has written a blog post with anything remotely similar
  3. Has anyone got anything useful in Git Hub?
  4. Talk to colleagues if I’m still stuck
  5. Realise that while I can probably do something it’s going to take a while…
  6. Go manage the team expectations associated with getting this done
  7. Go back to the docs, and start trying to pick a language, maybe I can do something in PowerShell as I know this?

Turns out, talking to the team that PowerShell isn’t known by enough people so that’s out. Rust, which I’d like to use, isn’t out of the ‘experimental’ stage in Lambda yet… So we’re going with Python, which I DON’T KNOW

Let’s go ask ChatGPT all about how to do this the… I created a file and here’s where I began…

okay, we’re going to start a fairly lengthy process of creating an AWS lambda function, using Python which will do the following: —

  • monitor an event stream for EC2 instance deletion events
  • based on the instance name, check whether it’s been added to dev or prod domains previously
  • if it was there, delete/remove it from the domain

Here’s what I got back, just in the first pass…

ChatGPT output which outlines the major steps and some options
Python ChatGPT output with a lambda function example

Let’s review…
Firstly, already, so many questions have already been answered

  1. The Python packages to use, or at least look at first
  2. We’re going to react to terminated events
  3. We’ve got some idea about the fact we’re getting an instance list then we’re also checking which ‘zone’ they’re in…

However, there are also some issues. What I failed to make clear was that the domain isn’t in AWS, so let’s address that next…

Sadly it’s a little bit more complicated than that. The domain join/unjoin commands need to be run in PowerShell as the domain is NOT in AWS-managed AD or even in the local environment but in the Azure cloud.
Let’s assume however that there is a service principal set of credentials that can be pulled from AWS Secret Store

Now, we are cooking on gas. I recognise the PowerShell code in there and it looks “fairly” decent but actually, we’re going to have to tackle the fact that this can’t be run on the instance itself because it is busy being deleted…

Another issue is that we can’t actually run this PowerShell domain unjoin command on the instance ’cause that’s the one that is being deleted… Can we run this on ‘something’ that the Lambda function is run on or what are the options?

Happily, I know enough about the environment/constraints of what I’m working with to recognise that option 3 is very likely my best bet. An added bonus of this route is that the SSM Document created could very well be uploaded/managed in code and uploaded using Terraform, as well as be available to be run manually at a target if needed. Hooray!!

Let’s pick option 3 going forward and use a cleanup-domain SSM Document which has the action aws:runPowerShellScript. How do you call that from a lambda?

Now, there’s a bit in this code that I added (FIXME:) as there’s an obvious issue with assuming the first tag value is the instance-id. However, I do now know how to call the document and pass it some parameters.

I’m going to stop showing the ‘dialogue’ between myself, the problem space and ChatGPT to try and explain exactly why this is so valuable. It doesn’t matter a great deal if, at this stage, someone points out that this code isn’t even correct because…

I’ve got to a workable foundation in NO. TIME. AT. ALL.

Absolute max — this has taken 30 minutes. I would have had to read/test/try and debug SO MUCH MORE THAN THIS to achieve the same basic starting point any other way.

Given the wider context, which is that I don’t know Python very well at all, I’d guess I’d be very fortunate to get to this point within a day.

I have enough background to know, that with boto3, I can run this locally with my AWS credentials in the terminal. My development/failure loop becomes extremely fast now. I’m totally off to the races.

I even asked ChatGPT how to add and run tests, and they worked.

I can easily create a dummy SSM document with some parameters so that I can ensure all the comms work. I can check the domain/unjoin/remove entry process independently and then put this back in.

I’m in a strong position to break other aspects of this down if something happens to not work. I can see the code running/executing and think of failure scenarios or things I need to add to manage the user input.

I’m in a position to quickly get this off the ground and realise, oh, I need to add the AWS region to (nearly) everything AWS API-related for any of it to run… but that’s fine.

Looking again to compare the time requirements for this task, pre-versus-post LLM I think the breakdown goes something like this…

Day 1 pre-LLM would get you to an outline vs. code you can iterate and test

Day 2 pre-LLM would get you to the same point as post-LLM Day 1

On Day 2, post-LLM you’d be done because, don’t forget you’re continuing this whole dialogue process to tune/debug and help fix any other issues.

The same task pre-LLM would most likely end on Day 4… so, as I mentioned at the start of this blog — I’ve become 2 times more productive.

At the beginning of tackling this issue, we were standing in front of a huge yak herd. So many yaks…

Working with ChatGPT and having that dialogue almost immediately (by comparison) narrowed things down, resulting in one well-behaved yak that’s open for dialogue.

I really hope this blog has helped illustrate what a ridiculous force multiplier LLM’s are for tackling tasks in unfamiliar DevOps territory.

If there’s one thing I’ve noticed with DevOps, it’s all mostly unfamiliar the first time around!

About the author:

Robert Sweetman is a Consulting Engineer here at Version 1.

--

--