How GPT-4 helps me refactor old code

Gilad Hoshmand
7 min readJun 8, 2023

--

Following my last story How ChatGPT writes tests for me — let’s see some more practical use for GPT-4, this time — refactoring old code.

Let’s dive deeper into GPT-4 capabilities

Refactoring

Refactoring techniques, tools & methodologies are a whole domain — I’m not going to attempt to explain or describe this world. If you are needing an intro I do recommend finding solid, thorough reads about it.
I once took a very interesting course in university about “Proving Code Correctness” which in part elaborated excellently on different refactoring practices.

Today I show how GPT-4 can be a great assistance in different refactoring scenarios.

Refactoring existing code

Scenario: you need to refactor/modify some code that was written years ago by someone who’s not around anymore, things start off nicely, you reduce number of lines, cleanup, you’re happy.
Then as you gain some confidence — things start to break, tests appear flaky, you start to lose your mind. Next you devise a plan on how to sell the Let’s throw this code in the garbage option.

It’s definitely one of these

Secret option #3: Refactor with AI at your side — how would I have loved to have this tool in previous projects I worked on!

Note: Tests are a must for this scenario — make sure they exist and are up to date covering all code branches.

Check out the example:

ME: simplify and add j-doc to the following code

checkPermission: function (check, item, user) {
var ans;
var access = check.permissions.filter(function (permission) {
return compareShareUser(permission.user, user.id);
});
if (access && access.length > 0) {
if (!(access[0].exclude && access[0].exclude.indexOf(item._id.toString()) >= 0)) {
ans = check;
}
}
return ans;
}

Again — the subtleties (ie. let vs. const) & style are up to you but this is much much better. The J Docs are so clear! It interpreted the terrible names and explained what actually are in simple words.

At this point you don’t even have to commit the code but you can easily understand it without hours spent reading & debugging the pile of garbage.

It’s not even such terrible code; hold your breath for the next example:

// don't even try to read this
const checkPermissions = function (req, item, originalItem, companyId) {
return new RSVP.Promise(async function (resolve, reject) {
let checkIfItemIncluded = () => {
let tempItem = originalItem || item;
if (tempItem && tempItem.type && tempItem.type.startsWith("typeA")) {
CollectionA.aggregate()
.match({ _id: originalItem._id })
.graphLookup({
from: "items",
startWith: "$_id",
connectFromField: "_id",
connectToField: "children",
as: "parents"
})
.then((originalItemWithParents) => {
let ancestors = originalItemWithParents[0].parents.map((p) => p._id);
ancestors.push(originalItem._id);
CollectionA.findOne({
permissions: {
$elemMatch: {
user: req.auth.data.id,
isRoot: true,
"options.includeItems": { $in: ancestors },
},
},
})
.select("_id")
.then((found) => {
if (found) {
resolve(true);
} else {
resolve(false);
}
}, reject);
}, reject);
} else {
resolve(false);
}
};
let checkSecondary = () => {
CollectionA.findOne({ children: { $in: [item.id || item._id] } })
.select("permissions")
.populate("permissions")
.lean()
.exec(function (err, parent) {
if (err) {
reject(err);
} else {
if (parent) {
checkPermissions(req, parent, originalItem || item, companyId).then(resolve, reject);
} else {
checkIfItemIncluded();
}
}
});
};
if (!req.auth || !req.auth.data || !req.auth.data.id || !item || (item && !item.permissions)) {
return resolve(false);
} else {
if (!companyId) {
({ companyId } = await findCompany(item));
}
const ans = await isCompany({ userId: req.auth.data.id, companyId });
if (!ans) {
console.log(`Permission DENIED. user dont belong to company ${companyId}`);
return resolve(false);
}
findUser(req.auth.data.id)
.then((user) => {
var access = item.permissions.filter(function (share) {
return comparePermissions(share.user, req.auth.data.id);
});
if (access && access.length > 0) {
if (
access[0].exclude &&
access[0].exclude.indexOf((originalItem || item)._id.toString()) >= 0
) {
return resolve(false);
} else {
return resolve(true);
}
} else {
if (user && user.roles && user.roles.indexOf("all") >= 0) {
User.findOne({
_id: { $in: item.permissions.map((s) => s.user._id) },
supportAccess: { $gt: new Date().addDays(-1) },
})
.then((auth) => {
if (auth) {
resolve(true);
} else {
checkSecondary();
}
})
.catch(reject);
} else {
checkSecondary();
}
}
})
.catch(reject);
}
});
}

This code is unreadable and has everything I despise about old patchy code — callback hell, bad naming, too many lines, too many things done, huge methods within methods, mixed layers — it’s a mess.

If you write code like this — I’ve no words for you — you’re hopeless.
If you’ve been a victim of such code read on for help.

Fuck This Guy

GPT-4 to the rescue!

/**
* Checks if an item is included in a collection.
* @param {object} req - The request object.
* @param {object} item - The item to be checked.
* @param {object} originalItem - The original item.
* @returns {Promise} - Resolves to boolean.
*/
async function checkItemInclusion(tempItem, originalItem) {
if (tempItem?.type?.startsWith("typeA")) {
const originalItemWithParents = await CollectionA.aggregate()
.match({ _id: originalItem._id })
.graphLookup({
from: "items",
startWith: "$_id",
connectFromField: "_id",
connectToField: "children",
as: "parents",
});
let ancestors = originalItemWithParents[0].parents.map((p) => p._id);
ancestors.push(originalItem._id);
const found = await CollectionA.findOne({
permissions: {
$elemMatch: {
user: req.auth.data.id,
isRoot: true,
"options.includeItems": { $in: ancestors },
},
},
}).select("_id");
return !!found;
}
return false;
}
/**
* Checks if an item is a child of a collection.
* @param {object} req - The request object.
* @param {object} item - The item to be checked.
* @param {object} originalItem - The original item.
* @returns {Promise} - Resolves to boolean.
*/
async function checkSecondaryPermissions(req, item, originalItem, companyId) {
const parent = await CollectionA.findOne({ children: { $in: [item.id || item._id] } })
.select("permissions")
.populate("permissions")
.lean();
if (parent) {
return await checkPermissions(req, parent, originalItem || item, companyId);
}
return await checkItemInclusion(originalItem || item, originalItem);
}
/**
* Main function that checks if a user has permissions.
* @param {object} req - The request object.
* @param {object} item - The item to be checked.
* @param {object} originalItem - The original item.
* @param {string} companyId - The company id.
* @returns {Promise} - Resolves to boolean.
*/
async function checkPermissions(req, item, originalItem, companyId) {
try {
// If request, item or user id is missing, resolve to false
if (!req?.auth?.data?.id || !item?.permissions) {
return false;
}
if (!companyId) {
({ companyId } = await findCompany(item));
}
if (!(await isCompany({ userId: req.auth.data.id, companyId }))) {
console.log(`Permission DENIED. user don't belong to company ${companyId}`);
return false;
}
const user = await findUser(req.auth.data.id);
var access = item.permissions.filter((share) => comparePermissions(share.user, req.auth.data.id));
if (access && access.length > 0) {
return !(access[0].exclude?.indexOf((originalItem || item)._id.toString()) >= 0);
}
if (user?.roles?.indexOf("all") >= 0) {
const auth = await User.findOne({
_id: { $in: item.permissions.map((s) => s.user._id) },
supportAccess: { $gt: new Date().addDays(-7) },
});
if (auth) {
return true;
}
}
return await checkSecondary();
} catch (error) {
throw error;
}
}

Wow!

  • Separate methods
  • 30% less lines of code
  • No callback hell
  • Documentation for each

I couldn’t ask for more.. so I asked for some pointers:

While chatting a little more we ran into some interesting ideas

Suspicious

I didn’t notice this addDays method but turns out it’s actually a custom method so I asked if we can improve that..

This leads us on our next mission…

Tips & Takeaways

  • You can use AI aid for different scenarios to your liking and as you see fit — it’s not a “one size fits all” situation.
  • Own the end result
  • Tests are your best friend when refactoring — check out How GPT-4 writes tests for me
  • Don’t put any confidential/IP code into a black box on the web
  • The input you supply should have all the information needed for refactoring — nothing less, nothing more — messy input = messy output.
  • Large Language Models (LLM) are trained on a set of data which is naturally finite and up to a certain point in time.
If you’ve made it this far — you know what to do

Remember: Own the final code, own the implications of any change to the code base, you are responsible for it and no one else. Code that was refactored or produced is to be used as assistance but the code committed is signed by you and you alone. There is no blaming a machine after production exploded because of generated code you did not understand or test well enough.

--

--