An intelligent digital assistant is a software service, possibly coupled with a specialized hardware device, such as a smart speaker, or simply a feature offered on a general purpose computing device such as a personal computer, tablet, smartphone, or wearable computer (such as a digital wristwatch), which offers some interesting set of the abilities of a traditional, human assistant, most notably answering questions and performing tasks using voice and natural language processing (NLP) backed by artificial intelligence (AI).
Examples include Amazon Alexa/Echo, Apple Siri, Google Assistant, and Microsoft Cortana.
A previous paper detailed the traditional role of human assistants: What is an Assistant?
This informal paper will briefly explore the nature and capabilities of intelligent digital assistants.
Note that the concept of an intelligent digital assistant is alternatively referred to by terms such as:
- Digital assistant
- Intelligent personal assistant
- Intelligent virtual assistant
- Personal digital assistant
- Virtual assistant
- Virtual digital assistant
- Voice-enabled digital assistant
Technically, a digital assistant does not need to use voice or even natural language, but in the context of this paper, the term digital assistant will be used as a shorthand for intelligent digital assistant and presume that it is voice-enabled with natural language processing.
What is the purpose of a digital assistant? As Google puts it:
- Find info and get things done
Those are the twin purposes:
- Request information.
- Perform tasks.
The job of any good assistant, machine or human.
Key distinguishing features
Not that traditional digital devices and Internet services didn’t already serve those same purposes, but now, the new devices and services focus on voice-enabled natural language interaction:
- Voice input.
- Natural language processing (NLP).
- Voice output.
Two other distinguishing qualities are that execution of requests can be based on not only the raw input request or command from the user, but also:
- Personal data of the user.
- Past history of usage by the user.
That’s where machine learning can come into play.
And, as with most devices and services, personal preferences of the user will be taken into account.
This generic list of features of digital assistants is not intended to be absolutely comprehensive, but should be fairly representative:
- Voice-enabled, voice control, voice interaction, voice queries.
- Natural language interaction. Commands. Results.
- Find information. Weather. Traffic. News.
- Answer questions. Digital encyclopedia.
- Make recommendations.
- Perform simple actions around the home, controlling devices. Home automation.
- Media control. Selecting content, controlling volume. Music. Audio. Video. Movies. TV shows.
- Make and take phone calls.
- Send and receive messages.
- Chat. Converse with the machine.
- Foreign language translation.
- Dictionary lookup.
- Managing to-do lists.
- Setting alarms, timers, reminders, and alerts.
- Ordering take out for delivery.
- Concierge functions. Reservations. Tickets. Services.
- Access specialized Internet services. Open-ended, modules developed by third parties.
- Proactive. Perform tasks or provide information without being explicitly asked. To only a limited extent today.
- Support for multiple users on a single device. For example, Google Assistant Voice Match. Him vs. her.
- Personalization. Adaptation. Responses and actions take the user’s data (personal data, preferences, usage history) into account, rather than purely canned responses.
How intelligent are they?
The nature of intelligence, especially in the context of artificial intelligence (AI), is a fairly complex topic, which is explored in much greater depth in my paper Untangling the Definitions of Artificial Intelligence, Machine Intelligence, and Machine Learning.
For the purposes of this paper, the following points can be made about intelligence and intelligent digital agents:
- Weak AI is the current state of the art.
- Some aspects of human-level intelligence are employed, but only in a very limited sense.
- Strong AI or General Intelligence is well beyond the current state of the art.
- Human-level intelligence in any general sense won’t be available any time soon.
- A fair amount of intelligence in digital computing is simply rules, patterns, and heuristics rather than any deep understanding of the deep, human meaning of concepts.
- Natural language processing (NLP), the ability to analyze voice input, parse natural language requests, and synthesize voice output is the primary reason current, voice-activated digital assistants are referred to as intelligent.
- Beyond the NLP, most of the functions performed by these digital assistants can be performed by a non-AI computer interface and web-based services.
- Few of the functions performed by these digital assistants require AI per se.
- Some amount of machine learning is employed in the services performed in the cloud, but only in terms of weak AI, not strong AI. The machine is recognizing simple patterns, but not concepts or what they mean in any deep, human sense.
- The voice match feature is interesting, but once again either solely a matter of heuristics or weak AI rather than human-level intelligence. Even dogs can recognize voices.
- It’s not clear if any of the current crop of intelligent digital assistants recognize and respond to tone of voice, something that even dogs can do.
In short, the current crop of intelligent digital assistants exhibit some significant qualities normally associated with intelligence, and even seem human-like, but only in a fairly minimal and superficial sense. It is certainly better than nothing, but just a start rather than anywhere near the finish line.
The Big Four
Although there have been digital assistants in the past and there are smaller and niche players, the Big Four of the current wave of products include:
- Amazon Alexa/Echo
- Apple Siri
- Microsoft Cortana
- Google Assistant
Samsung Bixby is a new entrant in the market.
It is beyond the scope of this paper to delve into specific product features or recommendations for such products.
The wikipedia pages for the Big Four:
- Amazon Alexa/Echo — Echo devices
- Apple Siri
- Microsoft Cortana
- Google Assistant — Google Home smart speakers
The company web pages for the Big Four products/services:
Connected intelligence, Internet-enabled
A key aspect of the design of this latest wave of digital assistants is that they are services running on servers in the cloud, where most of the AI capabilities are in the cloud, with the connected device seen and used by the user simply serving as an input and output device.
Privacy, security, and personal data
Since these digital assistants are online and all relevant user data is online, there are some significant privacy, security, ownership, and ethical issues. This paper won’t delve into this important topic deeply, but simply note some of the top concerns:
- Who actually owns the user’s data and records of all requests and actions made by the user?
- What exactly can and can’t the digital assistant vendors do with any of that user data?
- Can the vendor give any third-parties access to that user data?
- How secure is that user data, really? Says who?
- Are user interactions with digital assistants vulnerable to man in the middle attacks or using malware installed in the user device?
- How often is security and privacy of user data audited, and by what technical means?
- What level of technical skill might be sufficient to hack into user data?
- Might government, foreign government, or intelligence services possess the technical skills and means to hack user data?
- What assurances does a user have that vendor staff could not theoretically hack user data as an inside job? For financial gain for to pursue a social or political agenda.
- Can any of that user data be sold?
- Does the user have any way to get access to all of the data on them?
- Can a user move their data, including complete usage history to another vendor or different type of device?
- Does the user have any way to scrub or delete some or all of the data on them?
- Is there a retention policy for user data?
- What rights does the user retain or forfeit with regard to court orders to access their data? Both criminal and civil.
- How vigorously will vendors defend the rights of the user in the face of court orders? Says who?
- In what legal jurisdiction(s) does the the user data reside? Servers and data centers.
- Does the user have any control or ability to select a jurisdiction? Especially with regard to court orders and actions of law enforcement in those jurisdictions.
- Might the user data be kept in more than one legal jurisdiction? Multiple copies or distributed between servers in different data centers.
- Is location data given the same protection as interaction data?
- Can a user shield their location even if their interaction data is accessed, such as through a court order?
- Can a parent or legal guardian get access to user data of children or relatives?
- Can a user allow another user to access their data?
- Can users share data?
Software and hardware
The software for the various digital assistants is capable of running on a wide range of hardware platforms:
- Desktop computers
- Laptop computers
- Tablet computers
- Smart wristwatches
- Wearable computers
- Smart speakers
- Smart TVs
- Smart appliances, smart kitchen appliances
As mentioned in the previous section, the real intelligence is off in the cloud, with the user’s device or computer used only to communicate with the cloud-based services.
Smart speakers are the rage right now, with Amazon Echo, Google Home, and soon Apple HomePod.
It’s a bit of a misnomer to say that the speakers themselves are smart since the actual speakers are simply output devices and the real smarts is driven by microphones included in the same physical box as the speakers.
The microphones pick up your voice and send it off to servers in the cloud to be processed by the actual AI algorithms, before sending audio back to the actual speakers for you to listen to the result.
As with any new and evolving technology, the terminology around intelligent digital assistants is fluid, in a state of flux, and still unsettled.
All of the following terms are roughly equivalent to intelligent digital assistant, or at least used as if equivalent despite nuances of differences:
- AI assistant
- AI digital workforce platform
- AI voice assistant
- AI-powered virtual agent
- AI-powered voice assistant
- Artificial intelligence voice assistant
- Artificial-intelligence assistant
- Artificially intelligent assistant
- Connected assistant
- Connected intelligent assistant
- Digital agent
- Digital assistant
- Digital virtual assistant
- Digital voice assistant
- Intelligent assistant
- Intelligent digital assistant
- Intelligent personal assistant
- Intelligent virtual assistant
- Personal AI assistant
- Personal assistant
- Personal assistant voice apps
- Personal digital assistant
- Smart assistant
- Smart digital assistant
- Virtual assistance
- Virtual assistant
- Virtual customer assistant
- Virtual digital assistant
- Virtual personal assistant
- Voice AI capabilities
- Voice AI–capable device
- Voice assistant
- Voice-enabled digital assistant
- Voice-powered digital assistant
Not all bots, chatbots, socialbots, or digital or virtual assistants are necessarily voice-activated or use voice response. They may use text.
Not all bots or socialbots recognize natural language. They may simply act in a way that mimics human behavior using a variety of heuristics such as recognizing keywords that are significant for the particular subject matter domain which the bot is designed for.
Also see the online customer service section.
Some other terms that might sometimes be used to refer to digital assistants:
- Digital agent
- Intelligent agent
- Software agent
What is the proper term?
Alas, there is no single, widely acknowledged proper term for the products and services covered by this paper. To wit, here are the common characterizations of the Big Four products:
- Apple Siri — intelligent personal assistant.
- Amazon Alexa/Echo — intelligent personal assistant.
- Google Assistant — virtual personal assistant.
- Microsoft Cortana — virtual assistant.
Those are the terms used in the respective Wikipedia articles for those products and services.
Given how fluid and unsettled the use of the terminology is, this paper arbitrarily settled on the use of the term intelligent digital assistant, or digital assistant for convenience and conciseness when the context is reasonably clear.
Personal digital assistant
The term personal digital assistant or PDA seems like such a natural candidate to use for these new devices and services, but the term is already taken or at least was taken, as exemplified by the classic Palm Pilot PDA device that was so popular back in the late 1990’s and early 2000’s, until smartphones with similar capabilities eclipsed the handheld personal information management market.
It’s primary function was contact management with names, phone numbers, addresses, and notes. A vest-pocket rolodex and notebook, to be used in conjunction with a non-smart cell phone. No question/answer or task capabilities. Actually, there were a variety of apps, games, and the like that could be downloaded to the device, but nothing like a voice or natural language interface for those functions.
Maybe with time the term will be reclaimed as a synonym for intelligent digital assistant.
In fact, the current web page for Microsoft Cortana uses the term at one point:
- Cortana is your truly personal digital assistant.
Although on a support page for Cortana they use the term digital agent:
- Cortana is your digital agent.
Thus illustrating how fluid and unsettled the terminology is for this new product/service category.
Tasks vs. goals
The current crop of digital assistants are quite amazing, but still quite limited. Despite their AI features, they still can’t compete with many of the qualities of an old-fashioned human assistant.
In particular, digital assistants are task-oriented rather than goal-oriented.
As discussed in the preceding paper, What is an Assistant?, tasks are relatively simple operations that may require a lot of effort, but generally do not require much in the way of complex reasoning, judgment, careful decision, and planning, while goals are more complex collections of tasks that require some significant level of complex reasoning, judgment, careful decision, and planning.
Granted, as that paper pointed out, much of the work of many assistants really is simply task-oriented, but more specialized or capable assistants are capable of goal-oriented work.
The AI in the current wave of digital assistants has barely enough capability to parse basic natural language and recognize an interesting but rather limited set of patterns of meaning, well short of the more complex meaning of the more advanced capabilities of human assistants.
A task is specified by detailing the operations to be performed. How to achieve the objective.
A goal is specified by stating the objective to be achieved. The objective itself rather than the details of how to achieve the objective. In fact, and generally, the specific tasks needed to achieve a goal might not be known in detail in advance and only become apparent as work towards the goal progresses.
Current digital assistants are generally performing a single operation at a time. Google do this. Alexa do that. One question or task at a time.
Tasks generally don’t require much deep thought, just slogging through the work.
Goals tend to require deeper, more careful, and more insightful thought. And planning.
Current digital assistants can handle relatively simple tasks, but not more complex tasks or complex reasoning.
As discussed in my AI paper Untangling the Definitions of Artificial Intelligence, Machine Intelligence, and Machine Learning, these digital assistants offer weak AI, but are well short of strong AI.
Current digital assistants have only limited capabilities at best for being proactive, doing things for you without being explicitly asked. Reminders and alerts, and learning from personal data and usage is about the best they can currently muster.
That said, future iterations of digital assistants are likely to become much more proactive, even to the point of providing us with information and services before we are even consciously aware that we might want or need them.
But that’s the future, not the present.
Online customer service
Many websites now feature some level of online customer service. Sometimes this is simply online chat with a real, human customer service representative, but more and more website operators are using AI-based chat using natural language.
This is a close cousin to the technology utilized by the kind of intelligent digital assistants covered by this paper, but websites are focused more on commercial customer service types of questions and tasks rather than consumer-oriented questions and tasks.
That said, a website chat may offer more and deeper insight into narrow niches of your online life than one of the general purpose intelligent digital assistants.
Plugin modules for websites and services
The intelligent digital assistant vendors are currently offering support for developers so that websites can in theory develop plugin modules that would permit the general purpose intelligent digital assistants to have more access to more aspects of the online services that users utilize.
That’s not common today, but will likely become more common as adoption of intelligent digital assistants grows, not unlike the fact that many websites also offer apps for smartphones.
Even before the advent of driverless vehicles, cars in recent years have incorporated quite a few smart features that automate actions that previously had to be done manually by the human driver and do involve some degree and sensing and judgment by the vehicle itself. Whether or not these features constitute intelligence per se is a matter of debate, but at a minimum they do assist the driver, so in a very real sense they can be considered a digital assistant, so it’s no great stretch to consider them an intelligent digital assistant, especially when these features perform the kind of proactive tasks that even home digital assistants do not currently muster.
Driverless vehicles are too new and unproven to draw any strong conclusions about yet. In fact, part of the problem is that they still are relatively dumb, limited, and more focused on heuristics and other weak AI capabilities rather than anything even remotely resembling strong AI.
But in coming years, AI, smart car features, and driverless vehicles will each be evolving so that it is not that big a stretch to consider cars of the future to be intelligent digital assistants. After all, personal transportation is a personal service, traditionally performed by a human assistant called a driver.
One nit is that the term virtual assistant is ambiguous. In addition to referring to one of these new voice-activated digital assistants, the term also refers to a human assistant who works remotely, such as from home or for a third-party contractor. That latter usage is common for job listings.
The one question a digital assistant can’t answer
None of the Big Four, or any other connected-assistant can answer the following question:
- Why can’t I connect to the Internet?
Why not? Because answering any question requires a network connection. No connection, no answer.
The long history of digital assistants is interesting but beyond the scope of this paper.
The Wikipedia has some background on personal digital assistants.
Future directions for digital assistants
For starters, the full range of capabilities of traditional human assistants are great fodder for the future of digital assistants.
A previous paper, What is an Assistant?, details many of the capabilities of traditional human assistants.
In addition, there are likely to be a wide range of enhancements that are based on the unique capabilities of digital computing which are very different from human capabilities.
Still, it is likely to be quite some time before digital assistants can surpass human assistants. But since so many consumers are unable to hire an army of human assistants, enhanced digital assistants are a promising future even at only a small fraction of human-level capabilities.
A key gating factor for the evolution of intelligent digital assistants will be the pace of advances in AI itself, which is discussed at much greater length in my paper Untangling the Definitions of Artificial Intelligence, Machine Intelligence, and Machine Learning.
Human in the loop
One prospect not exploited in the current crop of digital assistants is the ability to integrate the human into the loop, not the user, but a third party, an expert or company representative who can add value from their human intellect and subject matter expertise that current digital assistant technology can’t quite muster at this stage.
Put simply, the digital assistant would do the vast bulk of the easy tasks, falling back to human intervention only for the harder tasks.
Crowdsourcing is another way of putting people in the loop, lots of them rather than one, to answer more complex or subjective or current questions that a simple lookup or real-time data reference can’t answer. Or to perform tasks.
To the best of my knowledge, there are no digital assistants on the market using crowdsourcing to respond to questions.
There are a variety of crowdsourcing services on the Internet for tasks, but none that are integrated with the top intelligent digital assistants at this time to the best of my knowledge, but it’s probably only a matter of time before they start springing up.
There is a skill module that can be added to Amazon Alexa to invoke TaskRabbit, but the integration seems a bit primitive.
What is needed is for each of the major digital assistants to have generic features for crowdsourcing tasks that don’t require user knowledge of specific task services. Crowdsource the crowd sourcing.
Even further, the user should simply be able to state the nature of their task or problem and the digital assistant should be able to deduce what task is needed. Like:
- Alexa, my faucet is leaking. [a plumber or the building maintenance guy is needed]
- Siri, my job sucks. [career or work counseling may be needed]
- Google, my head hurts and my vision is blurry. [maybe medical referral or even 911]
Crowdsourcing is generally very open-ended and unlimited — anybody anywhere can participate, but for some activities there may be a desire or even an advantage to restrict participation to a smaller or more select group.
Call it group crowdsourcing.
It might be a group of friends or relatives. Or experts in some area. Or members of an organization. Or a selected demographic group. Or a local community, or possibly nearby communities. Any imaginable subset of everybody everywhere.
Granted, selectivity at least partially defeats the whole point of wide-open crowdsourcing — that you never really know where the real and most valuable expertise is located, but for some types of tasks that may be an acceptable choice.
The current voice-activated consumer digital assistants don’t incorporate video in their operation, but that is likely to change in the coming years. They will eventually sense motion and activity in the room, and eventually be able to recognize objects, people, and pets and incorporate that information into their actions.
Smart cars and driverless vehicles already have video capabilities.
One promising avenue for the future are specialized digital assistants, therapeutic assistants, to help people with mental and behavioral problems, to guide them towards more useful thinking and behavior, and also to monitor them and alert their mental health professionals or guardians to any problematic symptoms. And also to permit mental health professionals to guide the actions of the digital assistant as well.
Granted, this area has lots of ethical issues, a virtual minefield. Still, it does have real potential, to actually help people lead better lives.
A more wide-ranging but more ethically challenging application would be for a plug-in module for everyday digital assistants to detect when an otherwise normal user might be exhibiting mental or behavioral symptoms which should be brought to the attention of a mental health professional. In this scenario there would not be any explicit action by the user or some health authority to enable such monitoring, although the setup for the digital assistant might have a simple opt-out or opt-in configuration setting to monitor for potential health concerns.
Parents with at-risk children could more readily make a decision to explicitly enable such monitoring for their children or family members or relatives who they worry might be at risk.
Some related reading
- Cognizant report: The Imminent Age of Intelligent Digital Assistants.
- Ovum report: Virtual digital assistants to overtake world population by 2021.
- Tractica report: Virtual Digital Assistants.
- Wikipedia article on Automated personal assistant.
- Wikipedia article on Virtual assistant (artificial intelligence).
For more of my writings on artificial intelligence, see List of My Artificial Intelligence (AI) Papers.