On Natural User Interfaces

Cyril Anderson
19 min readFeb 2, 2023

Originally published May 2, 2014. Some new revisions added in December 2022 – January 2023, but much of it left the same, including some now somewhat dated references.

This article reviews some of the history of human-computer interface technology, and the major paradigms that we have seen, from batch interfaces, to command line, to graphical interfaces and, most recently, to natural user interfaces. Natural user interfaces are interfaces that more naturally match the ways we normally interact with the world and with people. This includes such technology as touch interfaces, gesture and motion, facial expression and emotion detection, avatars, natural language and speech, haptics, and AR/VR. The article also speculates about the possibilities for learning and training.

Preface

Welcome. This article is intended to give a bit of deeper background around trends in what are called “Natural User Interfaces,” or NUIs. This term refers to a computer technology trend related to how we interact with computers.

Fair warning that this article is intended to be forward looking. It is NOT about looking at tools that are currently available off the shelf. This is not about immediately applicable information. This is a look at where the technology of human-computer interfaces has come from, where it is, where it is probably going in the years to come, and what kinds of possibilities that could introduce for computer based training and learning.

So in that respect, it’s about getting yourself mentally prepared for what will be coming a few years down the road. For those who like to think ahead, to dream about future Instructional Design possibilities using the tools that haven’t been invented yet.

My recommendation: if the preface and introduction pique your interest, bookmark this article, email yourself the link, and maybe set it aside for a quiet Sunday afternoon when you have some time to read and reflect. Then you can process it and reflect on the future possibilities of what you can do with this technology. Anyway, I hope you enjoy the article.

Introduction: What is a Natural User Interface (NUI)?

In a recent article, I talked about the future potential for the Kinect sensor (or something like it) to enable on-the-fly adjustments to a presentation in eLearning. In that article, I brought up the concept of a Natural User Interface, or NUI (pronounced “noo-ey”). I introduced the term there almost in passing, but I recognize that a lot of people might not be familiar with the concept.

The intention of the present article is to go into a little more background about the concept of NUIs. I try to give some sense of the significance of this new type of human-computer interface, describing the paradigms of human-computer interaction that came before it, how it has already changed how we use computers, and how future developments promise to further shape our interactions with computers.

Finally, I will try to look ahead a bit at how these types of interfaces could shape the way we train people using computers.

Let’s get started.

Paradigms of human-computer interaction

So the first question for those unfamiliar with the notion of an NUI would be “what is a NUI?”

Well, to answer this question, it helps to go back a bit into the history of computing.

Computers as we generally know them (electronic calculation devices) have a history going back about 80 years, since the time of the second world war. If you want to be technical, you can trace computing back to Ada Lovelace and Charles Babbage and the Difference Engine and Analytical Engine in the early to mid 1800s, but for simplicity, let’s say 70 years, starting in the early 1940s.

What started as a technology used to automate complex computations for a handful of high-end research and military institutions via massive electrical and electromechanical machines has evolved and grown over these eight decades to become a technology that is an integrated, essential part of the fabric of life (at least for people in relatively developed parts of the world). Along the way, the power, speed, and storage capacities of computers have increased exponentially, while the costs and sizes of components have at the same time shrunk at exponential rates. Computers have gone from machines numbering a handful in the whole world to numbering somewhere in the tens of billions. Some billion powerful computers are carried around in people’s pockets in the form of smart phones, and embedded computing devices appear in almost any electrical device produced today.

Along with these developments, the means through which people interface and interact with computers have also dramatically changed. This change has come both as a result of technological developments, and at the same time as a driver to uptake of computers amongst the general population. Throughout this, human-computer interaction has gone through a number of important paradigm shifts.

A paradigm, for those unfamiliar with the term, is a dominant contemporary pattern or way of conceptualizing and doing things. There have been a few major paradigms of human-computer interaction, with corresponding shifts as the technology moves from one dominant mode of interface to another.

I first want to speak about three major early paradigms of human-computer interaction:

  • Batch interfaces (1940s to 1960s)
  • Command Line Interfaces (1960s to 1980s)
  • Graphical User Interfaces (1980s to 2000s)

I will then speak about the recently emerging paradigm of Natural User Interfaces (NUI) for the period of the late 2000s to present. I will discuss some of the different examples of NUIs, and finally look at new possibilities for training and learning opened up by these sorts of interfaces.

A few notes about these date ranges. First, these are approximations. Second, breaking these up into distinct, non-overlapping time periods with sharp boundaries is a bit artificial. In reality these shifts were processes that took time. Third, these time periods are about the predominance of one paradigm of human-computer interface. Other earlier paradigms persist in broad use beyond the time when they are “replaced” and “overtaken” by the new paradigm. For example, 40 years after the appearance of GUI interfaces, command line interfaces are still broadly used on a daily basis by people who are serious users of computers.

First paradigm: Batch interface (1940s to 1960s)

The first computer interface paradigm was the batch interface. In this setup, users entered commands through stacks of punch cards punched by hand and fed into a card reader peripheral, which read the punched holes via optical scanning and turned the entries into electrical inputs. Programmers would carefully enter their code on the punch cards and submit their stack of cards as a batch to be scheduled and run by the administrators of the machine. The program would include commands to print the (hopefully) successful results of the program execution to a printer connected to the computer.

Remember, this was a time when computers were huge machines taking up most of a room, and a whole university or department might share one of these machines. It was a scarce, in demand resource, so programmers had to wait their turn for their code to be run. Computers could run one program for one user at one time. This produced a serious bottle neck in performance. Users could not typically just sit at the computer by themselves and use it because the resource was limited and the time could be used more efficiently if the programs were run together one after another as a batch.

Photo from Lawrence Livermore National Library via WikiMedia Commons

This cycle from submission of the program as punch cards to the computer department, to organizing and scheduling the card set as part of a batch, to entering it into the computer, to running the programs could take days, depending on how busy the administrators and technicians of the computer center were. And if there was a bug, something miscoded in the punch cards, the program would fail, and the programmer would have to start again, identifying where the error was without any sort of modern compiler guidance (“syntax error on line 57,” etc). Such aids didn’t exist. The programmer would try to track down the error in logic by hand, and then resubmit the revised program to the queue. Let’s just say it was a system that encouraged refined first draft work.

In a batch interface, the computer reads commands, coded in rigidly structured messages, carries out commands, and gives output through a printer. The computer would take in the programs of many people at one time, and process them, one after another, as a batch. It was in this time period that the first computer languages were developed.

The frustrations of dealing with these batch processing systems were a major drive for computer science researchers of the day to look into alternate modes of human-computer interaction.

Second paradigm: Terminals and Command line interface (CLI) (1960s to 1980s)

Then followed the command line interface (CLI). This came about along with development of early computer displays and monitors with keyboards used as inputs. Users could input characters through a keyboard and see them displayed on the screen. This would take place at a terminal with a keyboard and display connected or networked to the main computer machine.

The main computer would be set up to time share between multiple users. The computer basically rapidly switches between carrying out tasks for each user, allowing the central computer to “simultaneously” handle many users at once. To get a sense of how this works, imagine getting your household chores done by doing laundry for a minute, then switching to keeping an eye on dinner for a minute, then switching to attending to your kids for a minute, then switching to tidying the living room for a minute, then switching to sweeping the floor for a minute. Then imagine this task switching a million times faster. You’re doing one thing at a time in little slices, but to a casual observer, everything is smoothly proceeding all at once. Generally, your computer at home or at work “multi-tasks” in a similar sort of way. The coordination of the time sharing created a certain amount of overhead using up computer resources, but this became less of a concern as computers became faster over time.

So the user no longer had to punch cards, and no longer had to give them to someone else to feed into the machine, and wait. The different programmers and application users could get access to a terminal, and use that to interact directly with the computer in something resembling real time. The user could input text information, and get text output back more or less immediately.

This paradigm also overlapped with the appearance of the first so-called “micro-computers” used as office business machines (e.g. the IBM era). It was alsoe paradigm under which the first “personal computers” were born. These were standalone computing machines small enough to fit on a desk.

The user of one of these machines could use the keyboard, aided by the feedback visuals from the screen, to type documents, or to enter commands. The user controls the computer and performs such basic file management actions such as creating, saving, deleting, copying, and moving files and directories using text based commands typed into a a command line. Text commands can also be used to run programs and utilities, join these together into pipelines, and run scripts. As well as utilities to enter your pipeline m text based commands on remote computers such as Telnet.

This can still be seen today in the bash command line in both Unix-based systems like Linux and Ubuntu, and in MacOS, as well as in the mstsc / Command Prompt ultility in Windows. MS DOS, the first Microsoft operating system, worked via command line like this.

This is known as a Command Line Interface or CLI. More advanced computer programming languages, compilers for those languages, and text and keyboard shortcut text editors like Vi and EMacs to write programs were also developed at this time.

People who are serious about computers will even today spend a lot of time in the command line. For those who are used to the commands, it offers a beautiful, efficient, lightweight, bare bones way to perform actions on a computer. As a technical writer in a software company that follows a Git and GitLab based “docs as code” approach, I spend a fair amount of times on. managing my writing in the GitBash command line.

cd, ls, dir, git add, git commit, git push, git branch, git clone, git pull are more or less daily parts of my workflow.

Timeline of key developments

  • Assembly languages. Late 1940s-
  • Unix. (Late 1960s and 1970s). Operating system
  • Fortran programming language 1957
  • COBOL programming language 1959
  • ALGOL 68 programming language 1968
  • B programming language 1969
  • Telnet (teletype over network) 1969
  • Thompson shell 1971
  • The C programming language 1972
  • CP/M-86 OS 1974
  • Microsoft 1975
  • Apple Computer 1976/7
  • Vi, EMacs text editors 1976
  • Apple II computer 1977
  • Bourne shell 1979 sh
  • IBM PC, IBM PC-DOS OS 1981 The personal computer
  • MS-DOS OS 1981
  • Bash 1989

Third paradigm: Graphical User Interface (GUI) (1980s to 2000s)

The next paradigm was the Graphical User Interface or GUI (“goo-ey”). This consists of a “desktop metaphor,” with program windows, menus, virtual “buttons” and other controls on the screen with which the user interacts using a mouse and pointer. Associated with this is the acronym WIMP=Windows, Icons, Mouse, Pointer.

The earliest GUI was from research at Xerox PARC in the 1970s. These ideas were later taken up by Apple Computers in the early Macintosh and Microsoft in their Windows OS. Interactions simulated the way a person might interact with a real world machine, by “pushing” (with mouse clicks) virtual buttons, turning virtual dials, etc. It was at this stage, corresponding with a sufficient miniaturization of computer components and a fall in price, that the idea of a home “personal computer” that moderately well-off people at least could afford, took hold. With the desktop metaphor, windows, and mouse pointers, it became much more natural for everyday people to use computers. There were still many rough edges, and certain arcane bits of knowledge to learn, but overall, it became much simpler for everyday people to do basic things with computers. Computers were starting down the road to becoming a household appliance that average people would use as part of their everyday lives.

Windows 1 screenshot via WikiMedia Commons

The emerging paradigm: The natural user interface (NUI) (2000s to present)

The next paradigm of human-computer interaction is so-called Natural User Interfaces, or NUI. This can encompass a variety of types of interaction, but the overarching idea is that rather than having artificial or mechanical intermediary means of input, the user interacts with the computer in ways more like those used to interact with people and objects in the real world, and more directly. This typically means touch (including multitouch), haptic feedback, body movements/pose/hand gestures, facial expressions, speech recognition and synthesized speech outputs, and giving queries or commands to the computer in something much closer to the ambiguities of everyday natural language rather than in rigid computer syntax.

What does this mean? Well, to illustrate, let’s look at the predominant method of computer interaction that we’re just coming from and still wrapped up with. Namely, the mouse. Or, more precisely, the mouse and pointer as a means of navigating graphical menus and control interfaces on a 2D screen display. This is generally accompanied by the keyboard for the user to enter in alphanumeric data like on some electronic typewriter. This form of interaction almost completely dominated from around 1984 right up through to around 2008, a period of 24 years. The 1984 date marks the appearance of the Apple Macintosh computer (128k), which featured a GUI and mouse. 2008 on the other hand marked the appearance of the iPhone 3G, which helped to explode the popularity of capacitive multi-touch smartphones. (As much as I dislike Apple’s closed model and think they’re somewhat past their prime, I have to grudgingly give them credit for having been squarely at the center of both of these technological inflection points.)

The mouse has become so much a part of our daily activities, at home and at work, for so long, that it’s easy to lose sight of how awkward and un-natural a way this is of interacting with with a computer. Or interacting with anything for that matter. You sit in front of a computer screen, staring at it. You have a button on the screen. You have to grab this mouse on the desktop, drag it along the horizontal plane of a physical desktop surface in order to move the visual of a pointer arrow on the vertical plane of the “desktop” on the screen surface. And then you click on a button on the mouse to “click” the on-screen button. Once upon a time, this was the only way to mediate the pressing of a button. It was simply the only way to do it. But what is the most natural instinctual way to do this, today, given the technology widely available now, namely touchscreens? Well, since 2008, with the iPhone, and since 2010, with the iPad, it’s simple. You reach out your hand to the screen and touch the button directly on the screen to press it. The whole step becomes much more natural and effortless.

Admittedly, it’s still kind of weird, because you’re still blocked by this 2 dimensional surface as you bump up against it and touch it or move your hands over it. It’s still a little limiting and artificial. But it’s getting there. You’re completing the GUI metaphor of the desktop workspace on which you place things and move things around to do work. Instead of moving them with a mouse, you move them directly with your fingers. You’re still operating something like some old fashioned instrument panel, but that has become a little more naturally engaging. You move like you’re actually operating an instrument panel in real life.

As mobile computing and mobile internet have taken off, this has impacted web and application design so that even on the desktop, user interface principles inspired by touchscreen usability for stubby fingers – lots of white space, simplified menus and controls, and large button targets – have become predominant. Designers try to build applications that work well on both.

Interacting with the computer in these more natural, everyday ways means that in a sense, the interface fades from attention and becomes invisible to the user. But the idea is that generally the experience is smoother, more realistic, more like a real world interaction. The distance between the user and the computer becomes smaller. In this way the computer becomes a little more like an extension of the user’s body. The user simply smoothly interacts with the computer to do what he needs to do.

We call such an interface a Natural User Interface, abbreviated NUI, and pronounced “noo-ey.” It’s the idea of an interface that drapes itself over us, fits us like a glove by letting us interact with the computer more like we interact with real world objects and people.

In popular entertainment, we see some examples of futuristic concepts of use of NUIs. The computer on Star Trek TNG, for example, which the crew commanded through voice or touch screen control panels as they walked around the ship and did their thing.

Or the gesture interfaces Tom Cruise’s character used in the Pre-Crime unit in Minority Report.

http://www.youtube.com/watch?v=8deYjcgVgm8

Or more recently in the battle “simulation” in the film Ender’s Game.

Natural User Interfaces can take on a variety of different forms.

Multitouch

Multi-touch touch capacitive screens as seen in modern smartphones and tablets are one good example of an NUI. You interact with screen items by touching them with one or more fingers to stretch items, rotate them, shrink them, etc.

Conversational interfaces via text or speech

Virtual assistants or agents such as Apple’s Siri or Microsoft’s Cortana are another example, or another aspect of natural user interface technology. Here users interact in a somewhat conversational manner with the computer using speech. Some of the predictive elements of Google Now would also be examples.

Avatars

(No, not the blue aliens!) Face to face contact can make an interaction more personal and lively. An avatar is a graphical representation, whether static or animated, of a user, whether human or AI. An example is the avatars in Second Life or World of Warcraft. Combined with quality text to speech, conversational AI, and the ability to have animated faces that sync lip movement to the words, you get the ability to have a very realistic and personal conversation with a bot or computer system. Especially if this is combined with generative AI to create photorealistic avatars.

Gestures, movement, and body pose

We interact with the world through bodily motion. We move, we act on our environment, physically. A computer system that can detect and interpret our movements or the orientation of our body in space can transform that into a means to control systems with our movement in natural ways.

Facial expression and emotion

So much of the communications between people is non-verbal. Facial expressions and body language add subtle emphasis and convey emotion and meaning non-verbally. Computers that only pay attention to the literal words we say and write miss out on a lot of what’s going on with us. Computer sensors combined with machine learning software are starting to be able to recognize the outer signs of our body language and facial expressions to infer our emotional states. With this information, a computer system can adjust its approach based on this information. There is a whole field of affective computing related to this.

Haptics

Haptics (touch based interfaces) are yet another element to make interfacing more natural by simulating the texture and force feedbacks and resistances you would get interacting with real objects.

VR and AR

Virtual reality would be another example of a natural user interface. The person interacts with a virtual world through head and body movements, receiving visual feedback via some sort of helmet screen. This is a technology going back some decades, but is becoming more affordable and feasible now. An example of a mass product is the Oculus Rift by company OculusVR (In the news of late for having been acquired by Facebook).

Another example is augmented reality, as in Microsoft’s HoloLens. Here, important contextual information is projected within the user’s field of view to give continuously present information.

Combinations

NUIs can also be combinations of these different types of technology. For example, the combination of speech and body/hand gestures was used in the Microsoft Xbox Kinect sensor. Microsoft opened this sensor with free APIs and SDK for developing NUI-enabled software for Windows using the Kinect for Windows v2 sensor. The Kinect v2 was a sensor that was previously sold as an optional peripheral for the Xbox and which was a bundled part of the Xbox One gaming and home entertainment console.

http://www.youtube.com/watch?v=Hi5kMNfgDS4

This particular device featured two cameras for stereo machine vision with depth perception. Software in the device can make out limbs, facial expressions, hand gestures, limb and finger movements, face movements, facial expressions, even the pulse of the user, and use these as inputs for control. Multiple microphones were present for noise cancellation and for recognizing directionality of sound. There is software on board for voice recognition and for facial recognition. The user could use this to control the game by voice inputs and by moving their body and hands.

This represented a more natural way to interact and brings to life some of these models of human-computer interaction forseen by science fiction earlier. It is not hard to forsee possible applications to training with this, especially with APIs of the device open to commercial and research development. The following links and the video below give some sense of what is being done with this sensor tool.

More recently, in 2020, another iteration of this technology, the Azure Kinect Developer’s Kit was released, putting similar functionalities to the Kinect v2 sensor, adding accelerometer and gyroscope sensors (to detect motion and orientation) in an overall more compact unit. The newer model also integrates more naturally with Microsoft Azure services:

http://openkinect.org/wiki/Main_Page http://www.microsoft.com/en-us/kinectforwindows/

http://createdigitalmotion.com/2013/10/microsoft-embraces-open-creative-coding-new-kinect-openframeworks-cinder-integration/

http://blogs.msdn.com/b/kinectforwindows/archive/2013/08/13/turn-any-surface-into-a-touch-screen-with-ubi-interactive-and-kinect-for-windows.aspx

http://www.youtube.com/watch?v=Iu2XH5p_hMM

The Xbox One with Kinect represented probably the hardest push in its time for mass adoption of Natural User Interface technology in the home. The related Xbox Kinect for Windows sensor allowed for games and software to be written using this device to control a computer.

http://www.microsoft.com/en-us/kinectforwindows/develop/

Another potential route forward might come in the form of the iPad or other tablet some generations down the road if/when it is possible to put something similar to the Kinect’s sensors in a tablet. The tablet would make a sophisticated control device for the TV, with the tablet mirroring to the TV screen. So this hypothetical future tablet could watch you through twin cameras, read your eye movements and facial expressions or detect hand gesture based inputs. The microphone inputs, combined with cloud services, could read speech queries or commands from you. The touch screen would detect button presses, finger or stylus drawing inputs. The accelerometer and gyro would recognize if you’re sitting or standing and in what orientation you’re holding the tablet. You could then hold the tablet in different orientations in space as a control surface or workspace. The problem with the Xbox Kinect sensor is that it watches from farther back. So it can’t yet pick up as much nuance of detail as you could with a closer camera. Cameras in an tablet could do that.

I wouldn’t be surprised to see Apple do something like this, getting everyone used to this method of interaction, and then hitting with the long-predicted Apple TV, integrating something like a Kinect sensor and slick multiple layers of Natural User Interfaces built in. Bang and bang. It could have a big impact.

Learning and Training Applications

All of this promises to really shake up how we interact with computers. And since interaction is such a key element of computer based training, this has implications for us as designers of instruction.

There are a number of foreseeable learning and training applications for this sort of technology. To name just a few examples:

  • Speech recognition and text to speech could be useful for language learning.
  • Gesture based controls could enable more lifelike interaction with 3D models, especially if using stereoscopic 3D image displays. This could potentially be used for a variety of applications in technical training:
  • Manipulating and examining virtual equipment in maintenance training.
  • Learning structure of machinery by virtual manipulation of 3d models, including assembly and disassembly. Haptic feedback outputs could even simulate the sensation of touching and working with the actual equipment.
  • In biochemistry, manipulating 3-D models of large molecules like proteins to understand their structure and active sites or to visualize biological reaction steps
  • Virtual reality could be used to simulate the operation of certain complex equipment, including running through rare or emergency scenarios.
  • For soft skills, imagine the immersiveness of a training program where you interact with a 3d character in a scenario using simply your body language and your speech. The realism is greatly heightened. Or imagine a training program that can give feedback on your body language, verbal tics like filler words, and your facial expressions while you give a simulated presentation or sales pitch.

--

--

Cyril Anderson

Technical writer in SW industry. Montréalais. Interests: Writing/teaching/storytelling, Data sci/ML/AI, math, gardening, spirituality, running, film/TV.