Leveraging AI for Reverse Engineering: A Comprehensive Guide

UknOwWho_Ab1r
6 min readFeb 18, 2024

--

A Guide to Understanding Complex Codebases through Large Language Models

Reverse Engineer

Are you a Junior Developer dreaded going in a Company and struggling to understand somebody else’s code ?

Do you feel like how are you going to add a feature or contribute to the given codebase ?

Do you have the urge of understanding in depth and also get things done faster ?

Recent Technological Field is swept up by the Generative AI Revolution and the most important questions are :

- “How are we going to differentiate a different way of thinking ?”
- “How one can leverage LLM to process their work quicker ?”
- “How does one can leverage Large Language Model to build his favourite thing ?”

To answer this questions, let’s dive below how one can use the Large Language Models to get their shit done (Hacker Mindset :xD)

Lets look into some code first . The below code is from a codebase of our http webserver.

#include "ConfigParser.hpp"

#include <iostream>
#include <sstream>
#include <fstream>

#include "WebServerConfig.hpp"
#include "ServerConfig.hpp"
#include "LocationConfig.hpp"

ConfigParser::ConfigParser(WebServerConfig& webServerConfig)
: _config(webServerConfig),
_currentServerConfig(NULL),
_currentLocationConfig(NULL),
_parsingState(STATE_WS),
_oldParsingState(STATE_WS),
_key(""),
_value(""),
_paramterLength(0),
_lastChar('\0'),
_isQuoteMode(false),
_lineCount(1),
_charCount(0)

{
_mulitValues.reserve(ConfigParser::MAX_VALUE_LENGTH);
_key.reserve(ConfigParser::MAX_KEY_LENGTH);
_value.reserve(ConfigParser::MAX_VALUE_LENGTH);

parsingStateHandlers[STATE_WS] = &ConfigParser::_handleStateWs;
parsingStateHandlers[STATE_KEY] = &ConfigParser::_handleStateKey;
parsingStateHandlers[STATE_OWS] = &ConfigParser::_handleStateOws;
parsingStateHandlers[STATE_VALUE] = &ConfigParser::_handleStateValue;
parsingStateHandlers[STATE_COMMENT] = &ConfigParser::_handleStateComment;
parsingStateHandlers[STATE_LOCATION] = &ConfigParser::_handleStateLocation;

_httpKeys["client_max_body_size"] = std::make_pair(0, &ConfigParser::_processClientMaxBodySize); // done
_httpKeys["error_page"] = std::make_pair(0, &ConfigParser::_processDefaultErrorPage); // just hardcoded

_serverKeys["client_max_body_size"] = std::make_pair(0, &ConfigParser::_processClientMaxBodySize); // done
_serverKeys["error_page"] = std::make_pair(0, &ConfigParser::_processErrorPage); // done
_serverKeys["listen"] = std::make_pair(0, &ConfigParser::_processListen); // done
_serverKeys["server_name"] = std::make_pair(0, &ConfigParser::_processServerName); // done
_serverKeys["location"] = std::make_pair(0, &ConfigParser::_processLocationPath); // done

_locationKeys["root"] = std::make_pair(0, &ConfigParser::_processRoot); // done
_locationKeys["index"] = std::make_pair(0, &ConfigParser::_processIndex); // done
_locationKeys["cgi_extension"] = std::make_pair(0, &ConfigParser::_processCgiExtension); // done
_locationKeys["upload_store"] = std::make_pair(0, &ConfigParser::_processUploadStore); // done
_locationKeys["return"] = std::make_pair(0, &ConfigParser::_processReturn); // done
_locationKeys["allow_methods"] = std::make_pair(0, &ConfigParser::_processMethods); // done
_locationKeys["autoindex"] = std::make_pair(0, &ConfigParser::_processAutoindex); // done
}

ConfigParser::~ConfigParser()
{
delete _currentServerConfig;
delete _currentLocationConfig;
}

const WebServerConfig& ConfigParser::getConfig(void) const
{
return (this->_config);
}

void ConfigParser::_throwConfigError(const std::string& message, char offendingChar, const std::string& offendingString, bool givePosition)
{
std::stringstream ss;

ss << "Error: {" << message << "}";
if (givePosition)
ss << " at line {" << _lineCount << "}";
if (offendingChar)
ss << " at character {" << _charCount << "}" << " with char {" << offendingChar << "}";
if (!offendingString.empty())
ss << " with string {" << offendingString << "}";
throw std::runtime_error(ss.str());
}

void ConfigParser::parseConfig(const std::string& configPath)
{
std::ifstream configFile; //closes automatically when it goes out of scope
std::string buffer;

if (!_isFileNameValid(configPath))
throw std::runtime_error("Error: Invalid config file name");
configFile.open(configPath.c_str(), std::ifstream::in);
if (!configFile.is_open())
throw std::runtime_error("Error: Could not open config file");
buffer.reserve(ConfigParser::BUFFER_SIZE);
while (configFile.good())
{
configFile.read(&buffer[0], ConfigParser::BUFFER_SIZE - 1);
if (configFile.bad())
break;
std::streamsize bytesRead = configFile.gcount();
buffer[bytesRead] = '\0';
if (bytesRead)
{
for (std::streamsize i = 0; i < bytesRead; i++)
{
_parseOneChar(buffer[i]);
_lastChar = buffer[i];
}
}
}
if (!configFile.eof())
throw std::runtime_error("Error: Could not read config file");
if (_lastChar == '\\')
throw std::runtime_error("Error: '\\' is last char in config file");
configFile.close();
//validate the WebServerConfig
//check if all location and server blocks are closed meaning currentServerConfig and currentLocationConfig are NULL
}

bool ConfigParser::_isFileNameValid(const std::string& fileName)
{
if (fileName.size() < 5)
return (false);
return (fileName.compare(fileName.size() - 5, 5, ".conf") == 0);
}

void ConfigParser::_parseOneChar(char c)
{
if (c == '\n')
{
_lineCount++;
_charCount = 0;
}
else
{
_charCount++;
}
if (_lastChar != '\\' && c == '\\')
return;
(this->*parsingStateHandlers[_parsingState])(c);
if (_lastChar == '\\' && c == '\\')
_lastChar = '\0';
//std::cout << PURPLE << "_parsingState: " << _parsingState << " c: " << c << RESET << std::endl; /////////////////////////////////////////////////////
}

/* ========================== Save Char ============================ */
void ConfigParser::_addCharToKey(char c)
{
if (_paramterLength < MAX_KEY_LENGTH)
{
_paramterLength++;
_key.push_back(std::tolower(c));
}
else
{
_throwConfigError("Key too long", 0, _key, true);
}
}

void ConfigParser::_addCharToValue(char c)
{
if (_paramterLength < MAX_VALUE_LENGTH)
{
_paramterLength++;
_value.push_back(c);
}
else
{
_throwConfigError("Value too long", 0, _value, true);
}
}

/* ========================== Validation of blocks ============================ */

void ConfigParser::_validateLocationConfig(LocationConfig* currentLocationConfig)
{
/* Check location path */
// std::cout << BLUE << "_validateLocationConfig" << currentLocationConfig->rootDirectory << RESET << std::endl;
if(currentLocationConfig->rootDirectory == "")
_throwConfigError("Location path not set", 0, "", true);

// std::set<std::string>::iterator temp = currentLocationConfig->cgiConfig->cgiExtensions.begin();
// std::cout << "AAAAAAAAAAA" << *temp << std::endl;
// if((*temp) == "")
// { currentLocationConfig->cgiConfig->cgiExtensions.erase(temp);
// currentLocationConfig->cgiConfig->cgiExtensions.insert(".py");
// currentLocationConfig->cgiConfig->cgiExtensions.insert(".sh");
// }
}

void ConfigParser::_validateServerConfig(ServerConfig* currentServerConfig)
{
if (currentServerConfig->ipAddress == 0 || currentServerConfig->port == 0) {
throw std::runtime_error("Error: Server has no valid Ip adress or port\n");
}
if (currentServerConfig->serverNames.empty()) {
std::cout << "Set is empty. Adding 'localhost'." << std::endl;
currentServerConfig->serverNames.insert("localhost");
}
}

/* ========================== reset block key counts ============================ */
void ConfigParser::_resetKeyCounts(std::map<std::string, std::pair<int, _processValueFunctions> >& keys)
{
std::map<std::string, std::pair<int, _processValueFunctions> >::iterator it = keys.begin();
for( ; it != keys.end(); ++it)
{
it->second.first = 0;
}
}

Now let’s use the above code to prompt it out from a large language model and see if can explain us the understanding of the codebase and see what it can show us :

Prompt Used : Help me break down the code in syntax by syntax and understand it with questions

mistral-next

We can now see in the above not only it talks about the code and also it does show you how to understand the whole code. To make it clear for you.

Here’s more clarified version :

This code defines a class called ConfigParser that reads and parses a configuration file for a web server. The configuration file contains settings for the web server, such as the IP address and port to listen on, the maximum size of a client's request body, and the error pages to display for certain HTTP status codes.

The ConfigParser class has several member variables, including a WebServerConfig object that holds the parsed configuration, a pointer to the current ServerConfig being parsed, a pointer to the current LocationConfig being parsed, and several variables that track the current parsing state and the characters being read from the configuration file.

The ConfigParser class also has several member functions, including:

ConfigParser(WebServerConfig& webServerConfig): the constructor, which initializes the member variables and sets up some data structures that will be used during parsing.
~ConfigParser(): the destructor, which frees memory allocated for the ServerConfig and LocationConfig objects.
getConfig(): a getter function that returns a reference to the WebServerConfig object.
_throwConfigError(const std::string& message, char offendingChar, const std::string& offendingString, bool givePosition): a helper function that throws a runtime error with a custom error message.
parseConfig(const std::string& configPath): a function that reads the configuration file from the specified path and parses its contents.
_isFileNameValid(const std::string& fileName): a helper function that checks if the filename is valid (i.e. it ends with ".conf").
_parseOneChar(char c): a helper function that processes a single character from the configuration file.
_addCharToKey(char c): a helper function that adds a character to the current key being parsed.
_addCharToValue(char c): a helper function that adds a character to the current value being parsed.
_validateLocationConfig(LocationConfig* currentLocationConfig): a helper function that validates the current LocationConfig being parsed.
_validateServerConfig(ServerConfig* currentServerConfig): a helper function that validates the current ServerConfig being parsed.
_resetKeyCounts(std::map<std::string, std::pair<int, _processValueFunctions> >& keys): a helper function that resets the count of keys in a given map to 0.

The ConfigParser class uses a state machine to parse the configuration file. The state machine is implemented using a set of member functions that are called based on the current parsing state. The state machine transitions between different states as it reads characters from the configuration file, and it uses the member variables to keep track of the current key and value being parsed, as well as the current ServerConfig and LocationConfig objects.

GPT-4 OUTPUT:

GPT4 Output

So now from here, one can start making their things happen in an easier way.

If you have liked it and want to help us support our endeavour in getting financial freedom feel free to support us in patreon / connect to linkedin.

--

--

UknOwWho_Ab1r

Software Engineer | Mechanical Engineer | Redis Side Quest Hackathon Winner | Gen AI Enthusiast