GPT-3 and the Rise of Human-centric Adaptive Software — Part 1

Paolo Perazzo
9 min readNov 17, 2020

--

💡 This article is part of a series: Intro, Part 1, Part 2, Part 3

The problem with today’s computer-centric user interfaces

By definition, a User Interface (UI) is the interface between a user and a computer system. Such interface is designed to convert the intent of the user into a set of actions to be performed by the computer.

In other words, a UI exists so that a human can “talk” to a machine.

The User Interface

Today’s interfaces are highly unbalanced towards the language of the machine, rather than the user. In fact, current UI are mostly designed based on how a computer internally works.

For example, to create a task in a task management app, a user has to manually break it down into various fields (task name, description, assignee, due date, reminder, recurrence, project, etc.) so that the computer can understand it and then trigger the appropriate workflows. These fields are simply mirroring the database structure used by the computer to store task information, making these products nothing more than “glorified database interfaces”.

Each input field is then represented differently: text for task name, a drop-down to pick one or more assignee, a calendar picker for the due date, another picker for the time, another bunch for the recurrence, a label for project, an (editable) drop-down for tags, one for priority, one for status, etc.

The user has to understand what to do, what’s the meaning of each UI component, which button to click. The user has to think. It’s a case of heavy cognitive overload that makes products difficult to use and the user unsatisfied, if not frustrated.

Why? Because with these type of UIs, users are forced to learn the language of the computer to communicate their intent.

The Designer Interface

To make matters worse, there is not even one universal “language” the user has to learn to interface with a computer, as the UI is different for each product even in the same category.

In fact, there is another cognitive layer added between the user and the computer: the product Designer Interface.

The Designer Interface is the designer’s own interpretation of the language to be used to communicate with the computer, presented to the end user via a User Interface that ends up being different from product to product.

In this layer, the designer is also trying to explain to the user how to talk to the computer: on top of how the interface is visually presented, the copy used by the designer for buttons, placeholders, hints, labels becomes part of that human-to-machine language.

The Translator Interface

As we moved from the internal computer language to the designer own language (e.g. English) for the product copy, we have an additional layer between the end user and the product designer: the Translator interface.

The Translator could be the end-user themselves reading another (human) language if the product is not translated, or an actual translator that localizes the interface created by the designer.

The first case represents an even bigger cognitive overload for the user as they need to translate in real time the interface copy to figure out how to use the product. The second case can still cause misunderstandings due to not-native translations and poor user experiences due to significant difference in certain languages’ morphology (think about text in German or Chinese).

One-for-all user interfaces

On top of the Telephone game effect due to all these layers, the fundamental challenge for current user interfaces is that they offer one single, generalized instance for every user.

You can run as many A/B tests as you want to improve your UI, UX, and copy, but ultimately, the one user interface of your product is presented to all the users, at best personalized by language translation.

You can display only one label on that button, you can use only one type of widget for that input field, you can use only one sentence for that hint. You have a one-for-all user interface.

As a result, your one user interface, your “product language”, won’t be understood by everyone, no matter how optimized it is or how good you are.

Some users will understand the purpose of a button, some won’t. Some users will understand how to use your widget, some won’t. For some, your hint means something; for some, it means something else; for some, it doesn’t mean anything at all.

This challenging user experience is the consequence of the computer-centric design adopted to enable the human-to-machine communication: such design starts from the “language” of the computer that gets turned through various layers of “interpretation” and “translation” into the one interface presented to all users.

How GPT-3 changes everything

Imagine now to literally reverse today’s design approach to user interfaces: what if we design UIs that start from the human language rather than the machine’s? What if we let users simply speak their own, native, language when interacting with a computer? What if we move the cognitive overload from the humans to the machines, as it should be?

Imagine to have a UI that replaces a sequence of commands, formulas or user interactions with a single input field where the user enters, in their own words, what they want to accomplish, letting the computer interprets that intent and generate the appropriate set of actions.

The ultimate effect is a multitude of user interfaces, “personalized” to each user based on how they naturally communicate their intents, in whatever language they speak.

No more guesswork from the user about the meaning of that button label, no more need to interpret that obscure hint, no more learning curve to use a product.

What can make this new design approach possible?

The GPT-3 interface

When the first GPT-3 demos started to appear, I noticed one of the most interesting use case of GPT-3 “text in, text out” model: human language as input, machine language as output.

Some examples are the GPT-3 apps that “translate” a human description of an intent (“Change directory to /home”) into machine commands (“/cd home”): text into Linux, text into SQL, text into LaTeX, text into Regex, etc.

Despite their “simplicity”, these apps are very useful to showcase the power of GPT-3 and the new model of multiple, personalized user interfaces I described. If before every user had to interface with the computer through only one rigid Command Line Interface (CLI) and its structured commands, now each user has available a flexible interface, “personalized” to their own way to describe an intent, in their own language. Thanks to GPT-3, we’ve ultimately got a “universal” language to interface with machines: the human language.

No need for humans to learn Linux, SQL, LaTeX, Regex, or any other computer language (here in the form of commands), but rather the machine that finally has learned to interpret the human language, in its personalized forms too — all thanks to GPT-3.

For computing, for the way we design and build applications, this is going to be a huge shift.

We can now start to design interfaces from the user side and appropriately train GPT-3 to translate that human language into whatever language the computer understands.

GPT-3 as Creative Interpreter

Some more advanced GPT-3 demos showed how GPT-3 can be more than a straight translation between human language and the command-based language of a CLI.

In this example by Jordan Singer, GPT-3 designs a mobile app from a description provided by the user in plain english.

Here GPT-3 is not actually doing the complete design of the mobile app, but rather “interprets” the app description and maps its elements to modules that the Figma plugin built by Jordan then displays.

Specifically, human language is mapped by GPT-3 into a JSON representation of a Figma canvas that is then rendered by the plugin code. The training dataset is incredibly small: two descriptions in plain english and the two JSON for the canvas structure are sufficient for GPT-3 to generate its own {description, JSON} pairs.

You can note here the creativity component of GPT-3: below the dashed line, GPT-3 generated both the app description in plain english and the corresponding JSON needed to represent that in Figma, simply based on the two initial examples. This “creative” ability by GPT-3 is likely helpful when the app description is not fully detailed.

To be clear though, here a lot of the actual “design” work is done by the plugin code, not by GPT-3 (for now). But the end result from a user perspective is directionally impressive: a user now can “talk” to an application like Figma to generate their desired output. No visual UI, no learning curve, no Figma specific knowledge — just human language.

GPT-3 as Knowledge

Another remarkable aspect of GPT-3 is that it doesn’t just act as a “language” translator or interpreter to interface human to machines: it also brings knowledge.

Look at this amazing example of a GPT-3 application from Yash Dani that generates a balance sheet in Excel simply based on some inputs entered by a user with natural language.

The amount of layers between the user input and the final result that are replaced by GPT-3 is just unbelievable.

Today, to create a balance sheet you have to go through several steps:

  1. The business owner describes their transactions to the accountant
  2. The accountant interprets the business owner’s inputs and convert the human language into the accounting language
  3. The accountant first has to learn how to interact with its complex table- and field-based interface that a designer created based on the computer internals
  4. The accountant then enters the accounting information in the accounting software
  5. The computer can now process the data and apply the business logic that a developer had to manually code based on the designer specifications
  6. Finally the Balance Sheet is presented to the user as output.
  7. The business owner simply describes their transactions to GPT-3 using human language, exactly as they were talking to the accountant
  8. GPT-3 turns that human message into an entry of the balance sheet in Excel.

In Yash solution, all these intermediate layers, including human knowledge, custom user interface and business logic, are all replaced by GPT-3:

That’s it. No accountant with deep accounting knowledge, no product designer to define the UI and the business logic for the accounting software, no learning curve by the accountant to learn that UI, no UI at all beside the “Tell me about your transaction” question.

Just one question and GPT-3.

GPT-3 not only replaced an entire UI with natural language, but also the accounting knowledge of the accountant and designer of the accounting software. Some might argue at a philosophical level that GPT-3 doesn’t actually have accounting knowledge, but from an outcome perspective you wouldn’t notice the difference.

What you notice instead is the ultimate simplicity for the user, the business owner, to generate the desired report just by answering some questions without any intermediary. You would also notice that this software has been built in few hours and possibly without much field knowledge from the developer.

This is an example of some training data provided:

Input: I bought an additional $1,200 worth of inventory which I paid for immediately.
Output: [[“add”, 1200, “Inventory”], [“remove”, 1200, “Cash”]]

The total training data consisted of eight examples like that. Eight sentences and corresponding Python. Eight.

It’s easy now to understand the impact that GPT-3 can have on software design and development.

💡 Part 2 introduces the concept of Adaptive Software enabled by GPT-3, by analyzing the architecture of a GPT-3 application

Originally published at https://ppaolo.substack.com.

--

--

Paolo Perazzo

Cross-pollination ignites disruptive innovation. Part of Andiamo founding team, acquired by Cisco. Started SiVola. Building something new at Companyons. For you