Simulating the Infinite Monkey Theorem

Programming in C++

Jackmcnish
6 min readJul 9, 2023
A monkey typing on a computer
A monkey typing on a computer [Image courtesy of stable diffusion]

Introduction

In the era of advanced language models like OpenAI’s ChatGPT and Google’s Bard, it becomes increasingly intriguing to delve into alternative methods of text generation. One such method that deserves attention is the fascinating concept known as the Infinite Monkey Theorem. This article sets out to delve into the depths of this theorem and replicate its essence using the powerful C++ programming language.

The Infinite Monkey Theorem

The infinite monkey theorem is a thought experiment which has always fascinated me. It suggests that if a monkey were to randomly press keys on a typewriter for an infinite amount of time, it would eventually produce any given text.

Whilst not practically possible to replicate, it is possible to simulate generating random strings until a target string is achieved, similar to a typing monkey.

Implementation in C++

Being eager to improve my skills in C++, I chose it as the language for this experiment, it also has the advantage that it is fast, notably for the execution of for-loops when compared to Python (source).

To simulate a typing monkey, we can structure our code with several functions, one called “generate” which generates a random string when called, and another called “score” which outputs a score between 0 and 1 for how close the generated string is to the target string. Based on these two functions, other utility functions can be created such as “testUntilAchieve” which keeps generating random strings until the target string is created, and “repeatTest” which repeats “testUntilAchieve” a set number of times to average the experiment results.

“generate”: Generating Random Strings

To create random strings, we can create an alphabet containing letters, then randomly select an index, and thus a letter, using C++’s rand() capabilities; we can then add this letter to the generated string until the desired length is achieved, as in the implementation below.

#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;

string generate(const string &targetString) {
char alphabet[26] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'};
string generatedString = "";
for (int i = 0; i < targetString.length(); i++) {
int randomIndex = rand() % 26;
char generatedChar = alphabet[randomIndex];
generatedString.push_back(generatedChar);
}
return generatedString;
}

int main() {
srand((unsigned)time(NULL));
string targetString = "hello";
cout << generate(targetString) << "\n";
}

It is important to note that for rand() to work properly, meaning for it to generate a different string each time it is called, we must set a random seed using the srand() , generally time is chosen as in this implementation.

“score”: Scoring Generated Strings

To score a particular string against it’s target, we can iterate over the length of the string, and see if each character matches its target; then a perfect score would be obtained when an identical string is tested, meaning it has the exact same letters, in the exact same order. This is implemented below.

#include <iostream>
#include <string>
using namespace std;

double score(const string &targetString, const string &testString) {
int correctCount = 0;
for (int i = 0; i < targetString.length(); i++) {
if (targetString[i] == testString[i]) {
correctCount++;
}
}
return (double) correctCount / targetString.length();
}

int main() {
string targetString = "hello";
string testString = "hillo";
cout << score(targetString, testString) << "\n";
}

It is important to note that for the ‘score’ function to work, the strings must be of the same length.

“testUntilAcheive”: Generating Random Strings until the Target String is Generated

The logic behind testUntilAchieve is to use generate and score to keep generating strings until the score is perfect (ie. 1.0). This can be done using a while loop as below. This also includes printing the results so far, as well as the expected amount of iterations, which is simply the amount of elements in the alphabet to the power of the string length.

#include <iostream>
#include <string>
#include <cmath>
#include <cstdlib>
using namespace std;

void testUntilAchieve(const string &targetString) {
double currentScore = 0.0;
int count = 0;
while (currentScore < 1.0) {
string generatedString = generate(targetString);
double currentScore = score(targetString, generatedString);
count++;
cout << "Target String: '" << targetString << "' -> Generated String: '" << generatedString << "' -> Score: " << currentScore * 100 << "% -> Count: " << count << "\n";
if (currentScore == 1.0) {
break;
}
}
cout << "----- PROGRAM OUTCOME -----\n";
cout << "It took " << count << " tries to obtain the target string.\n";
int expectedTries = pow(26, targetString.length());
cout << "The expected amount of tries was " << expectedTries << "\n";
}

int main() {
srand((unsigned)time(NULL));
string targetString = "hi";
testUntilAchieve(targetString);
}

Note that this function does not return anything, not even the number of different tries it had to make before reaching the target string, for this reason I made another very similar function which did not print anything but simply returned this count for use in repeatTest .

“repeatTest”: Multiple Simulation Runs

This function aims to use testUntilAcheive and run it a given number of times, and then returns the average number of tries necessary. This can be useful if only for testing purposes, as the returned average can be used to validate whether the functions have been properly defined, notably: the returned average should be close to the expected number of tries, as above, this is equal to the number of elements in the alphabet to the power of the string length. This can be implemented as below (note the use of countTestUntilAchieve):

#include <iostream>
#include <string>
#include <cmath>
#include <cstdlib>
using namespace std;

double repeatTest(int testRepeats, const string &targetString) {
int count = countTestUntilAchieve(targetString);
int countSum = 0;
for (int i = 0; i < testRepeats; i++) {
int count = countTestUntilAchieve(targetString);
countSum += count;
}
return (double)countSum / testRepeats;
}

int main() {
srand((unsigned)time(NULL));
string targetString = "hi";
int testRepeats = 1000000;
cout << repeatTest(testRepeats, targetString);
}

The full code is available on my GitHub repository here

Future Work

Future work can be done to improve the readability of the C++ code, as well as increase the speed of the computations which increase at an exponential rate due to rate of increase of the number of possibilities, multiplying by 26 for every character in the string. I suspect it could be possible to parallelize the repeatTest function so that runs can be split between different threads.

A monkey typing on a computer

Another exciting area to improve could be the use of algorithms which keep the correct letters when making a future guess, which could dramatically reduce the amount of computations needed.

Limitations

It should be noted that this is nothing more than a thought experiment and not an actually feasible way of generating text as it is, this is because we are generating random strings until the string we inputted is reached, this means we are not actually creating new strings. An interesting possibility could be to keep this base method of creating random strings and using a neural network or reinforcement learning to make a model which improves itself to generate text which makes sense.

Your turn

Over to you, think about how you can improve the code; for example, make some error handling so that the score function cannot score strings of different lengths.

If you enjoyed this article, please give it a like and subscribe for more, this really helps. Have a great day!

--

--

Jackmcnish

Civil Engineering student from Imperial College London who loves learning and coding!