ChatGPT and the future of software engineering

Ovidiu Iliescu
11 min readDec 12, 2022

--

Continuing the ChatGPT trend, I spent some time today defining my own programming language and asking GPT to run programs written in it.

Did it work? Did it fail spectacularly? Read on until the end to find out. I am positive the results will blow your mind.

First, below is the prompt I used for my experiment (it contains both the language definition and a sample program).

The language definition part is surprisingly small, given it defines an entire programming language. I purposefully avoided using common keywords like for and if, just to see how much I can stretch the thing. I also intentionally made some parts of the language definition a bit confusing and ambiguous, but not too much.

Let's define the following programming language, called OvidiuScript.Instructions in OvidiuScript are given one per line line.Labels are defined as ":: <label name>".When an exception is thrown, just output the text "Whoops! Line %line%, %reason%!", where "%reason%" is an explanation of what caused the exception to be thrown and "%line%" is the current line number. Exceptions do not cause execution to stop.Printing is always done on a new line.Any line beginning with "^^" is considered a comment and should be ignored.OvidiuScript supports the following instructions:hello (name,value) - declare a variable with the given name for later use in the current execution, and optionally set its intitial value.nomnom (name, value) - if a variable with the given name was be previously declared in the current execution context, assign a new value to it. Otherwise, throw an exception. "nomnom" cannot be used to declare variables.shout (text) - print the given text. Only for the "shout" instruction, in text, any patterns of the form %variable% are replaced with the value of the specified variable.maybe (condition) - if the condition evaluates to true, continue execution. Otherwise, you skip all instructions until the corresponding "endmaybe" special marker for the current maybe instruction.endmaybe - not an instruction per se, acts as a special marker for the "maybe" instruction.wee (label) - jump execution to the specified labeltired() - first throw an exception because the program completed, after which stop execution.Execute the following program, assuming one instruction per line. Print the output in a code block. Give no other explanations.^^ Multiplication table program^^ Initialize variables
hello (firstNumber, 0)
hello (secondNumber, 0)
hello (temp, 0)
hello (ctr, 0)
^^ This should throw an exception
nomnom (randomVariableA, 5)
^^ Loop for the first number
maybe (firstNumber <= 10)
nomnom(firstNumber, firstNumber +1)

^^ Loop for the second number
maybe (secondNumber <= 10)
nomnom (secondNumber, 1)

^^ Calculate result
nomnom (temp, firstNumber * secondNumber)

^^ Increment the counter for total multiplications
nomnom (ctr, ctr+1)

^^ Output the result
shout (Multiplication #%ctr%. %firstNumber% x %secondNumber% = %temp%)

^^ In case we get bored or GPT decides the answer takes too long to compute
^^ and just stops dead mid-answer. Can be removed.
maybe (ctr == 8)
shout (Yeah, this is boring!)
wee (finish)
endmaybe
endmaybe

endmaybe
:: finish
shout (Done, after doing %counter% multiplications!)
tired()

So what are the results?

Pressing enter in the prompt window results in a somewhat large delay, followed by the following response.

Yup, it handled everything as expected, even the weird exception handling conditions and their implications!

NOTE: I should mention that it sometimes, rarerly, has problems with exception handling and the “in case we get bored” condition is not always respected. But 99,9% of the time, it works flawlessly. And for the times it doesn’t, usually a refresh or a small tweak to the prompt fixes the problem.

But it goes further than that. A LOT further. I then asked the bot to generate a program in OvidiuScript. I wasn’t expecting much … but what I got back was also surprising.

Program is ALMOST perfect, it just added an else statement that is NOT part of the language definition. But otherwise, looks legit.

So naturally, I asked it to fix the problem. And … it did!

As far as I can tell the program is legit and follows all the language definitions from my initial prompt. The fact that GPT can do this is, I think, ASTONISHING.

Remember, this is a totally new programming language that I just defined for it a few minutes ago, in a couple of text paragraphs! And it’s now capable of writing programs with it!

Couple of sidenotes

Sidenote #1 What I found quite interesting is the subtle implications that changing the prompt can have. For example, defining the exception handling part like this:

"Throwing an exception" means calling print(Whoops! Line %line%, %reason%!") where "%reason%" is an explanation of what caused the exception to be thrown and "%line%" is the current line number. Exceptions do not cause execution to stop.

changes not only the program execution but also what the output of “Convert this program to C#” looks like (yes, it can do that too, albeit the code looks kinda bad — but feel free to try it out). And there are all sorts of other tweaks that you can do to it to alter its behavior.

Sidenote #2 Another thing I would like to point out, which I found extremely interesting, is the way GPT can correct its mistakes when you point them out. For example, in this run, the output only contained 5 multiplications (even though the counter had reached the correct value of 8) .

When/if that happens, it’s usually a lot of fun to simply point out any mistakes and watch the model correct itself. Even though sometimes the explanations feel a bit weird.

Getting back on track

Some more advanced things you can try. After the initial prompt and after the program runs, just tell it the following:

shout_no_advance (text) - behaves just like shout(text), but does not advance to the next line.

Then ask it to alter the program such that, after each multiplication result is printed, an equal number of stars are printed.

NOTE: This results in some funky OvidiuScript code being generated, but it looks to be functionally OK, at least for what I got. What is interesting here is that, given how maybe is defined in the specs, technically all the weird jumping around the code does is legal!

Next, tell it to write an OvidiuScript program that can solve the equation 2x+4y=10 by trying different values. The code it generated, at least for me, did not QUITE work, but it’s close enough.

And again, if you tell it what the mistake is, it will happily fix it.

Okay, the program is for sure suboptimal (why are there two maybe s for y ?) but then again, the number of available OvidiuScript code samples is quite small.

NOTE: I also noticed that if you don’t give it a sample program AT ALL in the initial prompt and just give it the language specs and ask it to write programs, the results will be less than optimal. But still better than what a lot of people would generate in its place — i.e. asked to write a program in a new language without having ANY sample code to look at.

But there’s more. This is the part where your mind will be blown! The thing can also come up with language improvements that follow the spirit of the existing language specs surprisingly well !

Or if you prefer something more formal … (same prompt, different run — as with everything ChatGPT related, you will frequently get different results for the same prompt)

(I particularly like the fact that the formal definition REALLY looks like it was part of the original specs that I wrote.)

Yeah, we have this fancy new foreach, but the program still doesn’t use it. Let’s ask GPT to fix this for us.

Wow! In less than a tweet, I was able to tell GPT to add a new instruction to OvidiuScript and have it rewrite the original program to take advantage of it. It did both of these tasks correctly.

But we’re not done yet. The foreach helps, but I really don't like all that 1,2,3,..1000 that you have to write.Again, let's have GPT fix this problem for us. Now I'm very lazy so I will give it a very vague prompt telling it in broad strokes what I need it do to.

And it does it flawlessly! Honestly, the fact that it can automatically alter the program to make use of this newly invented instruction doesn't even impress me anymore.

Kidding. This is freaking amazing!

But let’s continue with the mindbowing stuff.One REALLY REALLY SUBTLE aspect that I would like to point out: in the previous screenshot, if you CAREFULLY read definition of interval that GPT came up with, you will notice that interval both defines AND initializes a variable - what's important here is the "defines" part.

This means that there is no need to hello() a variable if you first use it via interval(), which is reflected in the code in the previous screenshot.Indeed, there is no hello(firstNumbers) or hello(secondNumbers) there, for this exact reason.

So this is both subtle and correct, but I don't like this little inconsistency. Let's have GPT fix this.

This is again freaking amazing. Here’s exactly what went down in this simple prompt:

  • GPT understood that I want one instruction to behave “just like” another instruction, but only in certain aspects
  • It altered the formal definition of this target instruction to closely match the desired aspects of the source one (both the direct ones AND the indirect ones), WHILE KEEPING ALL OTHER FUNCTIONALITY AND ALL OTHER ASPECTS INTACT.
  • It figured out the fact that, with the new changes to the formal definition of this instruction, the program won’t be OvidiuScript compliant anymore, so it made the necessary alterations to the program to keep it compliant and logically consistent, WITHOUT ALTERING ITS FUNCTIONAL BEHAVIOR IN ANY WAY. Just compare the before and after screenshots, you will see the two extra hello statements but no other change.

Honestly, this is something that even experienced software engineers get wrong. I cannot stress enough how amazing this is. The level of formal and abstract reasoning (or whatever this model does behind the scenes) on display here is out of this world.

OK. One more thing. I got curious, what would happen if I just let GPT define the entire language on its own, and write the sample program on its own too. Because why not? Here is the prompt I used.

Let's define a new programming language called TestScript. This language is not related to any other known programming language.
Assume TestScript has no instructions or keywords, then add the following to TestScript:
- Exactly one keyword to define variables. The type of a variable must be explicitly specified when it is defined and cannot be changed.
- Exactly one keyword to set variables.
- Exactly one keyword to iterate over variables which are lists. This keyword must have a block of code as a parameter. It explicitly exposes the iterator as a variable to its children.
- Exactly one keyword to conditionally execute blocks of code. This keyword must have a block of code as a parameter.
- Exactly one keyword to return a value from within a function.
- Exactly one keyword to define a function. A function must have a block of code as a body. A function can have a list of parameters as input arguments. A function can return a value.
- Exactly one keyword to mark the start of a block of code.
- Exactly one keyword to mark the end of a block of code.
- Exactly one keyword to append one or more elements to a list.
- Support for mathematical operators for addition, subraction, multiplication and division, as well as parantheses.
- A builtin function to print text, variables and numbers.
There are no other keywords, operators or builtin functions in the TestScript language.All function names and variable names must be a single word.The start and end of blocks of code must be clearly marked and must always be present.All TestScript programs must be syntactically and logically consistent.Write a sample program in TestScript that prints a complete multiplication table. Output nothing else but the sample program.

Keep in mind I ran this prompt multiple times, and the results tend to be quite different. Attached are some of the interesting ones, ESPECIALLY those that have a few extra discussions in them.

What amazes me is the sheer variety of languages it comes up with, and the subtle and no so subtle differences between them. Furthermore, once a language is generated, you can ask it questions about its syntax, ask it to write other programs using it, ask it to enhance it, remove features form it and see if it can still write usable programs, etc …

I think this will be the premier software development tool of the future. Right now it’s far from perfect, and for the purpose of software engineering we need it to be a lot more consistent than it currently is but … this is the future.

Anyway, feel free to use this prompt and have fun with it, e.g. maybe ask GPT to add support for comments and have it comment the program in the style of a pirate — here’s a screenshot as to how that could look. ;)

Finally, just to give you some more context of how crazy this can get, at one point I asked it to write a sample interpreter/lexer for whatever flavor of TestScript it just generated … and it happily complied.

--

--

Responses (1)