The Shape of Interfaces to Come

Jason McAffee
29 min readJun 19, 2023

--

Natural Language Interfaces (NLIs) are revolutionary systems designed to facilitate human-computer interaction through voice or text, employing natural, everyday language. Modern exemplars of NLIs, including Alexa, Siri, and ChatGPT, utilize Artificial Intelligence to assist us with daily tasks, illustrating the power and potential of these interfaces.

Recent advancements in AI, powered by sophisticated Large Language Models like GPT, PaLM, and Llama, suggest that Natural Languages Interfaces are poised to further augment, and in some cases completely supplant, traditional User Interfaces in the near future.

In this article we will explore a brief history of Natural Language Interfaces, the capabilities of Large Language Models which can be utilized through NLIs, as well as assess the impact LLMs will likely have on the capabilities exposed through NLIs.

Through this exploration I aim to establish that we are on the cusp of liberating ourselves from the confines of the traditional visual User Interfaces of smartphones and personal computers. This liberation will be made possible by innovative software products that are moving towards offering more intuitive, conversational interactions.

Appeal of Natural Language Interfaces

From social interactions and mating calls, to territorial displays, echolocation, survival strategies, and the abstraction and communication of knowledge and information, vocalizations play a crucial role in the communication systems of animals and other life forms.

Speech and writing are modes of language, enabling us to convey complex ideas to one another through the senses of sound and sight, facilitated by the interfaces of our ears and eyes.

Vocal cords first developed in complex life forms around 500 million years ago. Speech in Homo Sapiens emerged between 50,000 and 2 million years ago, while writing appeared about 5,000 years ago, initially as a mechanism to keep track of food rations.

Darwin posited, “an organ constructed for one purpose may be converted into one for a wholly different purpose.” This concept was demonstrated in the lungfish’s use of its swim bladder as a lung, an example of evolution through natural selection. Victor Negus expanded on this in his treatise, The Mechanism of the Larynx, indicating that voice originated with the lungfish. He noted, “it became imperative that only air, and not water or other harmful substances, should enter [the lung]. With this objective in view, a valve was evolved to guard the entrance to the pulmonary outgrowth.” This valve forms the basic mechanism for vocalization: by pushing air against the closed valve, sound is generated as the membrane flutters.

Hearing is believed to have evolved 250–350 million years ago, with vibration detection predating it. Sight, beginning as photosensitive patches based on opsins and ion channels, emerged around 540 million years ago.

Given the clear evolutionary advantages of seeing, hearing, and vocalizing that have served life for millions of years, it follows that speech, our primary interface for interacting with others in natural settings, should extend to our interactions with computers. Since we learn to read, write, and speak from a young age, interacting with computers through a Natural Language Interface requires minimal additional learning.

Moreover, interacting via speech allows for multitasking, enabling us to accomplish other physical tasks simultaneously, like walking, driving, or cleaning the house. It allows us to engage without the distraction of a visual device. This mode of interaction also reduces the time and energy required to perform certain tasks. Instead of manually turning off every light in the house, one can simply say, “Alexa, turn off the lights,” or request a song without having to navigate through various apps and type in search fields.

This reduction in friction enhances engagement with software products, potentially leading to increased revenue for companies that integrate these capabilities. For instance, ordering food, playing a song, purchasing stock, or shopping online all become easier and more convenient, thereby making these tasks more likely to be performed.

A Brief History of Natural Language Interfaces

Recognizing the potential benefits and appeal of Natural Language Interfaces, research into their application for interfacing with computers began to emerge as early as the 1960s and 70s. Let’s delve into a brief overview of how these interfaces were initially conceived and how they have evolved over time.

ELIZA

ELIZA was developed by MIT computer science professor Joseph Weizenbaum in the mid-60s, with the intent to explore communication between humans and machines. Although ELIZA only simulated conversation by using pattern matching and substitution, and had no capabilities that would be deemed as understanding the conversations, it was still convincing enough that some users became emotionally attached to the program, and would often forget that they were conversing with an application rather than an actual human being.

SHRDLU

UI for SHRDLU
A Series of Interactions with SHRDLU

SHRDLU was developed by MIT computer science professor Terry Winograd in the late-60s. SHRDLU was a program where users moved objects around in a “blocks world” using natural language entered as text prompts.

The combination of a limited vocabulary, access to interaction history, the ability to determine which states were possible in the world (using memory of interactions and physics rules), and the ability to remember names of objects and their arrangements, all culminated together to make a convincing illusion that the program had real understanding of the world and what the interactions meant.

SHRDLU may be considered by some as a Potemkin Village, where initial demonstration looks good, but under the covers there isn’t enough substance to make things work in a more general way, which was a common issue in the “demo or die” days of AI.

Natural Language Interface Databases LUNAR and BASEBALL

Retrieving data from datasources is a specialized skillset requiring knowledge of syntax, table entities, as well as aggregation and filtering mechanisms.

LUNAR and BASEBALL were early implementations of databases, created in the late-60s, which would allow queries in natural languages. e.g. rather than “select * from rocks r inner join materials m on r.id = m.rock_id where material =’magnesium’ ”, a researcher could ask “Which rocks contain magnesium?”.

Our conclusion is that, despite the large number of NLIDBs that have been implemented for more than 50 years, these systems have not attained a high enough success rate (recall close to100%) to make them widely used in business applications.

Watson

Notable for it’s 2011 appearance on Jeopardy and winning the one million dollar prize against former Jeopardy champions, Watson is a question answering system built by IBM which applied Natural Language Processing, information retrieval, knowledge representation, automated reasoning, and machine learning. Watson is capable of answering questions through a Natural Language Interface.

Siri

Released by Apple in 2011 as an iOS feature, Siri uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Internet services.

Siris capabilities include: making phone calls, reading text messages, setting timers, answering general knowledge questions, setting reminders, searching the internet, controlling device settings such as brightness and wifi, taking pictures, navigation, controlling apps, and sending money.

Alexa and Google Assistant

Amazon introduced Alexa in 2014, a virtual assistant based on a Polish speech synthesizer called Ivona, acquired by Amazon in 2013. Google Assistant followed in 2016, compatible with phones and smart devices like Google Home. Both assistants share core capabilities like voice interaction, audio playback, setting alarms, making purchases, and controlling IoT smart devices. Additionally, they offer plugins or skills for integration with other apps to enhance their capabilities.

ChatGPT

With an original pilot in late 2022, ChatGPT’s incredibly impressive features led to it gaining 100 million users within the first month of its release, making it the fastest growing app in history. Put in perspective, it took TikTok 9 months to reach 100 million users, and Instagram 2.5 years to reach 100 million users.

ChatGPT’s primary function is to emulate a human conversationalist, adept in writing and debugging computer programs, imitating various personas by writing in similar style, composing music, writing stories, writing student essays, answering test questions, writing poetry and song lyrics, language translation, sentiment analysis, few shot learning, contextual learning, emulating computer systems, playing games… the list of capabilities is exhaustive, and it’s capabilities are progressing at an almost logarithmic rate. ChatGPT-4 is a notable enhancement over GPT-3, and has yet to be made freely available to the public.

ChatGPT plugins are similar to skills/capabilities extensions, and allow ChatGPT to integrate with third party apps, such as Instacart and Wolfram Alpha.

AI’s Current Capabilities

In order to envision the future of interfaces, it’s essential to gain a clear understanding of the current capabilities of AI. This will help us appreciate how our everyday software is poised to evolve and improve with the incorporation of AI. Let’s take a closer look at these capabilities and how they’re already shaping the world of technology.

Question Answering

Among the most striking and impressive capabilities of Large Language Models (LLMs) is their ability to comprehend questions and synthesize data to provide answers to general knowledge queries. This ability, which is progressing at an incredible pace, has ignited widespread interest practically overnight.

When posed with a question, Large Language Models, like ChatGPT, can confidently furnish a response, demonstrating their remarkable aptitude for natural language understanding and processing.

Speech To Text

Speech to Text technology facilitates our interaction with applications by converting spoken words into text through the microphones on our devices. This technology subsequently transmits this text to a Large Language Model, thereby freeing us from the need to type while looking at a screen.

A commonly known implementation of this technology can be seen in iPhones’ dictation feature. This functionality is available in iMessage, Notes, Mail, Safari, and any other applications where text entry is required.

ChatGPT’s mobile application also provides the ability to form prompts via speech to text.

Text To Speech

Text to Speech is another compelling capability of AI that facilitates the use of Natural Language Interfaces. This technology transforms written text into audible speech, making the interaction feel more natural and seamless.

For instance, consider the Bark model which generates remarkably natural-sounding speech from a given text.

Hello, my name is Jason. I currently live in Utah, and enjoy hiking, meditation, playing music, and working with large language models

Eleven Labs TTS was had even more impressive results.

Natural language interfaces are poised to augment traditional user interfaces in the near future.

Text Generation

Large Language Models possess the capability to generate extensive text based on a provided prompt. The AI’s applications are incredibly diverse and can range from crafting stories and formulating sales pitches to providing recipes and drafting business plans — the possibilities seem boundless.

As a case in point, I tasked ChatGPT-4 with writing a story using only a zero-shot prompt:

Using the voice of a fantasy novel writer, write me a story about a frog and a bear that go on an adventure. There should be at least two conflicts and resolutions.

In the ageless dominion of the Murkwood, where sunbeams rarely pierce the verdant canopy and shadows whisper ancient tales, dwelt an unusual fellowship. Bromli, an aged frog sage with eyes that held galaxies of wisdom, and Torgun, a youthful bear, the last scion of the Ursan guardians. An uncanny friendship had blossomed between them, a friendship spun from the fabric of dreams, dreams that whispered tales of adventure.

Upon a day steeped in the violet hues of twilight, the Oracle of Eldertree spoke a prophecy of impending doom. The Ice Wraith, an entity of frost and death, was stirring from its millennia of slumber, threatening to drown their world in a ceaseless winter. Their salvation lay in the sacred Emberstone, a gem hidden deep within the treacherous vaults of Fireheart Mountain.

The companions ventured forth, their first tribulation awaiting in the sombre heart of the forest. The Nightstalkers, shadowy felines draped in tales of stealth and guile, prowled these woods, eyes gleaming with a predatory allure. Torgun, with his warrior’s spirit, chose to engage them in a straightforward battle but found his prodigious might useless against their nimbleness.

Bromli, invoked his profound knowledge of the forest’s lore. The Nightstalkers, for all their ferocity, were creatures of honour, bound by the unspoken laws of the wild. With a soft croak, Bromli challenged them to a contest of riddles, an age-old ritual that valued wit over violence. Bemused yet bound by tradition, the Nightstalkers agreed. Bromli, using his eons of wisdom, outwitted them, earning their grudging respect and safe passage through their realm.

Undeterred, they pressed on to their goal, the fiery veins of Fireheart Mountain. The sacred caverns were the abode of Pyrrhus, an ancient Phoenix of flame and fury, the final barrier to the Emberstone. Torgun, his Ursan pride flaring, was eager to test his mettle against Pyrrhus. But the Phoenix, being a creature of fire, only grew more potent as Torgun’s anger flared.

In the light of the flames, Bromli understood the riddle of their second challenge. The Phoenix, a symbol of rebirth and renewal, did not seek combat but a demonstration of resilience and the will to overcome. Releasing his warrior pride, Torgun humbly approached Pyrrhus, enduring the sweltering heat and the searing flames. His resolve, his willingness to adapt, impressed the Phoenix, and as a token of his respect, Pyrrhus presented them the Emberstone.

Together, the sage frog and the young bear had weathered adversities, learned from their encounters, and above all, demonstrated that even the unlikeliest of alliances could conquer the most formidable of challenges. Their tale was one of friendship and courage, a testament to the indomitable spirit of adventure that lived within them.

Sentiment Analysis

Sentiment analysis involves discerning emotions, evaluations, opinions, intentions, and aspects within a given text. This capability of AI can prove invaluable in understanding user feedback, interpreting social media sentiment, and much more.

Several types of sentiment analysis exist, but some of the key ones include:

  • Fine-Grained Sentiment Analysis: This type mirrors a five-star rating system, categorizing the text as very positive, positive, neutral, negative, or very negative.
  • Emotion Detection: In this type, the emotion evoked from a text is classified into specific categories such as anger, fear, happiness, etc.

Example texts and analyses:

  • “I love the new design of the app!” → Positive, Happy
  • “This app has really gone downhill. What a waste.” → Negative, Frustrated

Image Generation

The advent of AI-powered tools like Midjourney and DALL-E has revolutionized the process of image creation. It’s now possible to generate images featuring any style, scenario, subject, or landscape you can envision. The quality of the output is often so high that it becomes nearly impossible to distinguish the generated images from actual photographs of real-world scenes or individuals.

a commercial photo portrait of the young instagram model looking at the camera, appropriation artist, shiny/glossy, clear edge definition, unique and one-of-a-kind pieces, light brown and light amber, Fujifilm X-T4, Sony FE 85mm f/1.4 GM

Image Classification

Image Classification has allowed us to interpret and categorize the visual world in ways that were previously unfathomable. With image classification, AI algorithms can categorize and label images based on their content, identifying elements such as objects, places, people, writing, and even actions. This technology is at the heart of facial recognition systems, self-driving vehicles, and various applications in healthcare, agriculture, and security. Whether it’s recognizing a cat in a photo, identifying cancerous cells in a medical scan, or analyzing satellite imagery to track climate change, image classification provides us with an invaluable tool to make sense of the vast amount of visual data we generate and collect every day.

The visualgm-6b model produced this description of the image above:

This image features a man and woman sitting on benches with various toys, including bicycles, cars, and dogs. There are also several cats in the scene, which adds to their playful atmosphere. The presence of these toys suggests that they might be enjoying spending time together or playing around outdoors while taking care of their pets.

The Grounded Segment Anything model detected and outlined these figures:

Language Translation

Large Language Models possess the remarkable ability to translate hundreds of languages with a proficiency that is strikingly close to human capability. This not only breaks down barriers of communication but also promotes cultural exchange and understanding on a global scale.

An example of this capability in action is Google Translate’s conversation feature. This powerful tool seamlessly integrates multiple LLM functionalities to facilitate real-time conversations between speakers of different languages. The process begins with the detection of the spoken language, which is then transcribed into text. Following this, the text is translated into the desired language, and finally, a text-to-speech functionality verbalizes the translated text. This efficient and interactive process showcases the transformative impact of AI on cross-language communication.

Music Generation

With the expansion of AI’s potential, Large Language Models such as GPT, Flan-T5, and others have transcended linguistic boundaries and ventured into the realm of music. These models have a comprehensive understanding of music theory, including the specifics of notes, chords, and even the intricate patterns of popular chord progressions.

This in-depth musical knowledge equips them with the capacity to generate original pieces of music, extending their creative influence into the realm of auditory art. This capability opens up a vast new world of possibilities for music composition, offering an unconventional and exciting approach to the creation of melodies and harmonies.

Write a song using chord progressions found in the key of E Major. Use a common day pop song format. Do not include lyrics. Indicate whether the chord used is major or minor.

Absolutely, here’s an example of a pop song format with chord progressions in the key of E Major:

Verse: E Major — B Major — C# minor — A Major

Pre-Chorus: F# minor — B Major — A Major — B Major

Chorus: E Major — B Major — C# minor — A Major

Bridge: F# minor — B Major — A Major — E Major

There are a myriad of sites and apps, such as AIVA, that generate audio tracks using AI.

Other examples include classical music composed by AI

Hip hop using AI vocal technologies that have singing done in the voice of various performers

Sequential Recommender Systems

Sequential recommender systems are a powerful feature of AI that can discern patterns in sequential user actions and use this information to predict and recommend potential future actions. These systems can be extremely valuable in numerous settings, particularly in customer service and marketing.

For instance, let’s consider a real-world scenario where a customer begins a sequence by opening a checking account, followed by applying for a credit card, and finally applying for a personal loan. This sequence, once captured and processed by the system, forms a useful predictive pattern.

So, when another customer follows the same initial steps, i.e., opening a checking account and applying for a credit card, the system can extrapolate based on the previously established sequence and anticipate the customer’s next likely move. As such, it can proactively offer a recommendation for a personal loan, ensuring a seamless and tailored user experience. This smart anticipation of user needs makes sequential recommender systems a game-changing tool in the world of AI interfaces.

Code Generation

Code generation is an exciting application of AI, allowing Large Language Models to write or assist in writing computer programs. This ability empowers developers to speed up their workflow, automate repetitive tasks, and even bridge gaps in their own programming knowledge.

For example, AI-powered models such as Codex or GPT-4 can generate a range of code snippets, from simple HTML structures to more complex JavaScript functions, all based on a descriptive prompt given by the user. These models can be given a problem statement or a specification in plain English, and they will generate the corresponding code.

What’s more, AI models can even generate entire programs based on comprehensive descriptions, handle debugging tasks, and provide suggestions for code optimization. This can make coding more accessible to beginners, and can also allow experienced developers to focus on higher-level tasks, boosting productivity and innovation in software development.

IDEs now have built in integrations with AI tools such as CoPilot and Cody, which help automatically generate code based on surrounding code, function names, comments, etc.

In addition to creating functional code from scratch, these AI-powered tools possess other handy features that streamline the coding process. They are capable of adding comments to code, thus making it easier for other developers to understand the logic and purpose of various code segments.

Moreover, these tools can automatically generate unit tests, a feature that significantly reduces the manual labor and time involved in the testing phase of software development.

One exemplary demonstration of the potential of AI in code generation was provided by OpenAI president Greg Brockman. In the demonstration, he sketched out a basic design for a website on a notebook, took a picture of the sketch, and used GPT-4 to create a fully functioning website complete with HTML, CSS, and JavaScript. This example illustrates the potential of AI to dramatically simplify and expedite the process of turning ideas into functional code.

Named Entity Recognition

Named Entity Recognition (NER) is an integral aspect of Natural Language Processing, dedicated to identifying and categorizing specific elements within a piece of text. These elements, referred to as ‘named entities’, could be people’s names, company names, geographical locations, dates, product names, or even specific numeric expressions like money and percentages.

NER is widely used in various applications such as information extraction, knowledge graph construction, question answering systems, and many more. By being able to accurately recognize and categorize these named entities, AI systems can better understand the context of a text and offer more relevant responses or actions.

Autonomous Driving

Autonomous driving represents a pinnacle of AI’s capabilities in the real world. Self-driving cars, or autonomous vehicles (AVs), harness the power of AI to interpret their surrounding environment and make decisions in real-time.

They achieve this through a variety of sensors that gather data about the environment. These include Light Detection and Ranging (LIDAR) systems that provide 3D images of the surroundings, radar and sonar sensors that measure distances, cameras that provide visual inputs, and Inertial Measurement Units (IMUs) that track velocity, orientation, and gravitational forces. Global Positioning System (GPS) technology is also used for accurate location tracking and navigation.

Once data is collected, the AI algorithms get to work. They interpret the data to recognize and track objects (other vehicles, pedestrians, traffic signs, etc.), predict their future movements, and plan the vehicle’s path accordingly. These algorithms use techniques from fields like computer vision, machine learning, and robotics.

Importantly, these decisions need to be made in a fraction of a second. Advanced AI capabilities, such as deep learning, reinforcement learning, and convolutional neural networks, are utilized for this purpose, allowing the vehicle to make decisions quickly and efficiently.

The goal of autonomous driving is to create a safer, more efficient, and accessible means of transportation.

Healthcare

The potential of AI in healthcare is enormous and continues to grow at a rapid pace. With a wealth of data at their disposal, AI applications have been instrumental in transforming several facets of the healthcare industry.

One significant application of AI in healthcare is in medical imaging. AI models can be trained to identify and diagnose diseases by analyzing medical images such as X-rays, MRI scans, and CT scans. For instance, convolutional neural networks (CNNs), a type of deep learning algorithm, are being used to detect cancerous tumors in mammograms with increasing accuracy. They can also identify anomalies or abnormalities in scans that could suggest other diseases, often with a level of precision comparable to or even exceeding that of human experts.

Another promising application is in the field of drug discovery and development. AI algorithms are capable of predicting the efficacy of certain compounds, thus identifying promising candidates for new drugs. These algorithms can analyze vast databases of molecular structures, perform complex simulations, and forecast the potential effectiveness of a compound in treating a specific disease, saving significant time and resources in the drug discovery process.

Additionally, AI has proven instrumental in predicting the onset of diseases. By analyzing patient data, AI models can detect subtle patterns and correlations that might be missed by human doctors. This ability to predict potential health risks enables early interventions, improving patient outcomes and reducing healthcare costs.

Finance

AI has significantly altered the landscape of the finance sector. Leveraging AI capabilities, financial institutions are able to automate complex processes, make more accurate predictions, and detect suspicious activities more effectively.

Fraud detection is one area where AI has shown immense promise. Machine learning algorithms, fed with historical data on fraudulent transactions, can predict and identify potential fraud based on patterns that may be too subtle or complex for human analysis. These systems can send real-time alerts to both financial institutions and their customers, thereby reducing the risk and impact of fraudulent activities.

In the stock market, AI is being used to analyze and predict market trends. Machine learning models can digest vast amounts of financial data from various sources, including market indices, economic indicators, and corporate reports, and can make predictions about stock price movements. They can also analyze sentiments from news articles and social media posts to gauge public sentiment towards specific companies or industries, which can influence stock prices.

Algorithmic trading is another domain where AI is making its mark. These systems can execute trades at superhuman speeds based on pre-set instructions, but advanced AI models can also learn and adapt their strategies based on changing market conditions, maximizing profits and minimizing losses.

In portfolio management, AI can assess the risk and potential returns of different investment strategies, assisting wealth managers and individual investors in making informed decisions.

Lastly, in regulatory reporting, AI can automate the generation of reports and help comply with changing financial regulations, thereby reducing manual effort and improving compliance accuracy.

Home & Car Automation

The fusion of AI assistants, such as Alexa and Siri, with the Internet of Things (IoT) has ushered in a new era of home and car automation. Through Natural Language Interfaces, we can now seamlessly control various devices in our homes and vehicles.

In the realm of home automation, AI assistants integrated with IoT devices enable us to effortlessly manage our lights, thermostats, dishwashers, and washing machines using voice commands. By simply interacting with the AI assistant, we can adjust the lighting ambiance, set preferred temperatures, initiate cleaning cycles, and control other smart appliances, all with the power of natural language.

Similarly, in the automotive industry, car manufacturers like Tesla have embraced voice command capabilities to enhance the driving experience. Through AI-powered voice assistants integrated into the vehicle’s system, drivers can perform a range of tasks using voice commands. This includes locking doors, adjusting climate controls, making phone calls, playing music, and even activating windshield wipers, all while keeping their hands on the wheel and eyes on the road.

This seamless integration of AI assistants and IoT devices in both homes and cars simplifies everyday tasks, enhances convenience, and contributes to a more intuitive and efficient lifestyle.

Emergent Abilities

Emergence, a phenomenon observed in various complex systems, has also been witnessed in the realm of AI. As the number of parameters and the training data increase in AI models, there is a higher likelihood of unexpected abilities emerging. These newfound capabilities, not explicitly trained on, are known as emergent abilities.

The concept of emergence transcends disciplines such as physics, biology, economics, and computer science. An analogy can be drawn to the emergence of consciousness from the intricate interconnections among billions of neurons in the brain, giving rise to subjective experiences and higher-order cognitive functions that cannot be reduced to the properties of individual neurons.

Emergence is when quantitative changes in a system result in qualitative changes in behavior.

List of emergent abilities of LLMs and scale at which the abilities emerge.

This qualitative change is also known as a phase transition — a dramatic change in overall behavior that would not have been foreseen by examining smaller-scale systems.

In the context of language models, their emergence has been primarily achieved through scaling along three key factors: computational resources, model parameters, and training dataset size. By pushing the boundaries in these aspects, researchers have uncovered remarkable emergent abilities in Large Language Models (LLMs).

Despite not being explicitly trained or programmed for specific tasks, LLMs have showcased a range of emergent abilities, including but not limited to language translation, question answering, code generation, creative writing, technical documentation and summarization, conversational engagement, and contextual understanding. These unforeseen capabilities highlight the transformative potential of scaling AI models, unlocking novel possibilities and expanding the scope of AI applications.

In Context Learning via Prompt Engineering

Another notable emergent ability in Large Language Models (LLMs) lies in their capability to learn through prompt engineering. Prompt engineering involves providing a pre-trained language model with specific prompts or instructions, allowing it to generate desired responses without further parameter updates or training.

In the context of LLMs, prompts serve as the input or instruction for a given task. By presenting the model with a prompt that includes a few input-output examples as a preamble, it can generalize and perform the task for unseen inference-time examples. This approach, known as few-shot prompting, was proposed by Brown et al. (2020).

Prompt engineering refers to the process of refining a language (or other generative AI) model’s input to produce the desired output, without updating the actual weights of the model as you would with fine-tuning.

LLMs possess the remarkable ability to acquire new skills and develop novel approaches to problem-solving through prompt engineering. By providing guidance within the prompt, these models can navigate complex tasks, learn from a limited set of examples, and generate meaningful and contextually appropriate responses. This demonstrates the incredible flexibility and adaptability of LLMs in acquiring new skills and knowledge without the need for extensive retraining or fine-tuning.

A few examples of prompt engineering techniques can be found below

Chain of Thought

Chain of Thought represents a technique in prompt engineering that involves guiding the model through a series of intermediary reasoning steps, enabling it to handle complex reasoning tasks. By breaking down the problem into manageable steps and providing exemplars, the model can develop the ability to perform operations it has not been explicitly trained on.

To illustrate this concept, let’s consider an example where we guide the model through a sequence of reasoning steps to perform a subtraction operation involving words. We start by teaching the model that the numeric value of each letter corresponds to twice its position in the alphabet. Then, we train the model that the numeric value of a word can be determined by summing up the numeric values of its constituent letters.

In this context, we provide an example prompt that states “dog — cat = 6”. By presenting this example, we prompt the model to reason and understand the numeric values associated with the words “dog” and “cat”. Finally, we ask the model to compute the subtraction “elephant — dog”, and it correctly responds with “110”.

Through this chain of thought, the model learns to perform complex operations by leveraging its ability to reason, generalize, and apply the learned rules. This demonstrates the power of prompt engineering in enabling models to handle novel tasks and acquire new skills by providing structured guidance and intermediate steps.

Affordances

Affordances play a crucial role in prompt engineering by allowing us to define the boundaries and limitations of what a model can answer. They provide a means to restrict the scope of responses, which can be particularly valuable for organizations seeking to ensure that their AI chatbots or language models focus on specific topics or domains.

For instance, consider a custom ChatGPT tailored for a finance-related application. By implementing affordances, we can establish guardrails that confine the model’s responses to questions and inquiries related to finance. This ensures that the model provides accurate and relevant information within the designated domain.

For example, we could put guardrails around a custom ChatGPT that was only able to answer questions related to finance:

The Near Future of Interfaces

The advent of Natural Language Interfaces, enabled by the remarkable capabilities of Large Language Models (LLMs), has opened up new horizons for the enhancement of software products. With the ability to understand and synthesize questions, commands, and instructions, LLMs have paved the way for intuitive and effortless modes of communication, namely voice and text.

Within the next few years, we can anticipate a significant shift in the landscape of major software applications. It is highly likely that most of these products will integrate some form of Natural Language Interface, harnessing the power of AI to enhance user experiences and productivity. These interfaces will not only assist users in content creation and data analysis but will also offer capabilities such as message summarization, contextual understanding, language translation, code generation, and more, as discussed in the previous sections.

The integration of Natural Language Interfaces into software products will empower users with simplified and intuitive means of interaction. Instead of relying solely on traditional graphical user interfaces, users will be able to engage with the software through voice commands or text-based conversational interactions. This shift will transform the way we interact with technology, making it more natural, efficient, and accessible to a broader range of users.

Google Duet AI

Google Duet AI is an upcoming advancement in the suite of Google products, including Gmail, Docs, Sheets, Slides, and more. This innovation aims to integrate generative AI capabilities directly into these applications, revolutionizing the way users interact with and create content.

With the introduction of Google Duet AI, users can expect remarkable enhancements in text generation. In Google Docs and Sheets, for instance, the AI-powered system will be able to generate content based on user prompts. This means that when users provide a prompt or an initial input, the generative AI will assist in dynamically creating and expanding upon the content, offering suggestions and automating portions of the writing process. This feature will be particularly beneficial for drafting documents, composing emails, or working on spreadsheets, saving time and streamlining the content creation process.

By leveraging the power of AI and natural language processing, Google Duet AI will empower users to generate high-quality content more efficiently and effortlessly. The AI capabilities will complement users’ creativity and expertise, providing valuable suggestions and assistance in real-time, thereby enhancing productivity and enabling smoother collaboration within Google’s suite of products.

Text Generation

Docs and Sheets will generate or revise content based on user prompts.

Text Summarization

With text summarization, users can summarize email chains, long document proposals, etc.

Image Generation

With image generation users can create new images tailored to slides in slide decks.

Slack GPT

One of the upcoming features is the ability to summarize entire threads. With Slack GPT, users will have the option to generate concise summaries of lengthy discussions, condensing the key points and highlights into a more digestible format. This will significantly aid in reviewing and extracting valuable insights from extensive conversations, saving time and improving productivity.

Additionally, Slack GPT will offer the functionality to adjust the length of messages. Users will be able to effortlessly shorten or expand the content of their messages while preserving the intended meaning. This flexibility allows for more effective communication, enabling users to convey information concisely or provide additional context as needed.

Another intriguing capability of Slack GPT is the ability to modify the tone of a given message. Users will have the option to adjust the tone, making it more formal, casual, friendly, or professional, depending on the desired communication style. This feature will empower users to tailor their messages to suit specific contexts and audiences, enhancing the clarity and impact of their communication.

Zoom IQ

Zoom, the video conferencing platform, introduces Zoom IQ, an advanced suite of features designed to enhance meeting productivity and analysis. With Zoom IQ, users can access powerful tools that provide comprehensive insights and metrics, transforming the way meetings are conducted and analyzed.

Text Summarization of Meetings

One of the key features of Zoom IQ is the ability to summarize entire meetings. Leveraging advanced natural language processing and machine learning algorithms, Zoom IQ can generate concise summaries of meetings, capturing the key points and discussions. This feature streamlines post-meeting reviews, allowing participants to quickly grasp the main takeaways and follow-up actions without needing to revisit the entire meeting recording.

Speech Analysis

Furthermore, Zoom IQ empowers users with in-depth analysis and metrics. It provides valuable insights such as talk time, sentiment analysis, and talk speed for individual speakers. Users can gain a deeper understanding of meeting dynamics, identifying the distribution of speaking time, the overall sentiment of discussions, and the pace at which participants are communicating. These metrics enable better meeting management, facilitate equal participation, and enhance overall engagement and collaboration.

Photoshop

Adobe Photoshop has introduced capabilities that leverage the power of natural language to manipulate and transform photos. With this innovative feature, users can now describe their desired edits using natural language, allowing Photoshop to generate images, swap clothes, erase unwanted elements, and perform a range of sophisticated manipulations.

For example, users can describe their intent, such as “Remove the background and replace it with a beach scene,” or “Swap the clothes between the two individuals in the photo.” Photoshop’s AI-powered algorithms analyze the natural language instructions and perform the necessary edits to achieve the desired results.

By incorporating natural language processing and AI capabilities, Adobe Photoshop has greatly expanded the creative possibilities for users. This feature empowers both professionals and enthusiasts to bring their visions to life with greater ease and efficiency, while maintaining the high level of control and precision that Photoshop is known for.

The Shape of Interfaces to Come

In this article, we have explored the transformative potential of Natural Language Interfaces (NLIs) powered by Large Language Models (LLMs) in software products. These interfaces enable us to interact with technology through voice or text, utilizing the capabilities of AI to understand and synthesize information. We have discussed the appeal of NLIs, their historical development, current AI capabilities, and emerging possibilities.

As we look to the future, it is evident that Natural Language Interfaces will play an increasingly significant role in shaping the way we interact with software products. Initially, we may rely on typing text prompts and commands, but ultimately, the evolution towards hands-free and intuitive voice commands is expected. By embracing speech as the primary means of interaction, we can communicate and engage with software products more efficiently, harnessing the natural speed of speech compared to typing. Speech allows us to communicate over 150 words per minute, whereas most people can only type at around 40 words per minute. The average reading rate is 183 words per minute.

Natural Language Interfaces, incorporating Speech to Text and Text to Speech capabilities, combined with the intelligence of Large Language Models, will bring forth a new era of multitasking and frictionless interaction with software products. Everyday tasks will become smoother and faster, allowing us to allocate our time and attention to other activities without being encumbered by cumbersome interfaces. The power of AI-driven interfaces will enable us to perform tasks in a more streamlined and less distracting manner.

The potential of Natural Language Interfaces has captivated our imaginations for years, as depicted in popular television shows and movies. These fictional portrayals have showcased increasingly realistic visions of the future, mirroring the advancements we are witnessing today. With every qualitative leap in AI capabilities, what was once deemed science fiction is becoming an attainable reality.

As we move forward, it is clear that Natural Language Interfaces will continue to evolve and reshape the way we interact with technology. The integration of AI-driven capabilities into software products will unlock new levels of productivity, convenience, and seamless user experiences. With each passing day, we are edging closer to a future where intuitive and conversational interactions with technology become the norm, enabling us to navigate a world of software products with greater ease and efficiency.

Here are some glimpses I have compiled, inspired by scenes from various science fiction TV shows and movies, that provide a peek into what interfaces may look like in the near future. These depictions showcase the potential of AI-powered interactions with computers and software products.

--

--