AI and the Future of Operating Systems

At we are building the future of AI interactions and we spend a lot of time discussing what an “AI-first” operating system would look like so wanted to share some internal notes of what we think is coming next. We also shed some light on what we are working on.

What is the next paradigm shift in OSes?

The current generation of OSes and apps were all about translating the interaction vocabulary to a smaller device and a touch screen. The next generation of OSes will dominate the next 20 years of computing. It will be all about transforming the apps and the interactions to leverage the AI capabilities of the device and the data of the user to plan actions and understand intent.

Why now?

Time spent working on desktop computers has peaked, and the upgrade cycle of software and hardware is slowing as it offers minimal marginal gains in terms of productivity.

Mobile devices are not about mobility, they are about convenience. They are usually the last computing device used at night and the first in the morning, often for performing work in bed. At the office they are used also when near the main desktop. Because of this time spent on desktop and laptop devices has peaked and will never grow again while time spent on mobile devices is ever increasing. This is a major driver for the adoption of new technologies.

There is a need for better productivity tools on mobile as today it’s not possible to perform all work operations on mobile devices and as a result work is postponed until the workers are back at their desk.

There is also a correlation between screen size and personal productivity on desktop. People usually have multiple applications open at the same time so data from one app can be copied onto another while reading streams of communications data.

When transitioned to a mobile device the display surface is reduced from a >20” screen to <5” and the input mechanism goes from a full keyboard and a mouse/trackpad to a thumb.

It is because of these form factor limitations that the leading productivity apps are mostly single purpose. And the most successful ones are very specific at solving one task in the simplest way possible as opposed to having an app with an extensive set of features.

As a result the average phone user has an extensive number of single-purpose apps: email, algorithmic emails, calendars, organizing meetings, instant messaging, travel, and accommodation apps. Furthermore different apps are used in different contexts, so that a mobile device might have multiple instant messaging apps, one for family, one for friends, one for work and so on.

This causes a multiplication of functions that could be provided by the OS and a cluttering of the notification system that has no understanding of the data it is processing.

At we believe there is a better way.

The Evolution of Operating systems

Every 20 years there is a paradigm shift in OSes where the previous computing layer is abstracted and a new form of computing is adopted that gives leverage to operations.

First it was the command line (1960s)
Then it was replaced by Windowing systems (1980s)
Then it migrated to an App economy (2000s)

The transition between paradigms is always in phases: think of the advent of the GUI going from DOS + Windows and then Windows 95 where the command line interface was no longer the primary method of interaction. Or that iOS started as a fork of OS X. A new OS will probably be a fork of Android (which itself was derived from Linux.)

AI-first OS

The rapid evolution of artificial intelligence technologies over the past 5 years coupled with the push to increase productivity on mobile devices create the groundwork for transition into a new computing paradigm.

Conceptually it’s a contextual card system. Previous attempts like Windows Metro and Google Now are notable examples, and it is our belief that Windows Metro was closer to what could have been a true next-generation OS.

(the future looks a lot like Metro)

Metro was an empty UI where the cards were simple representations of existing folders and did not provide any advantage over a more traditional display but it had the drawback of changing a user workflow.

Additionally there were no predictive or contextual APIs or developer tools that took advantage of the new design.

In short it was an interesting design concept but there was not enough artificial intelligence to deliver a next-gen experience.

Enter WeaveOS is developing some of the components of an AI-first OS. It is a challenging goal that requires multiple steps, but the key advancements in technology that define a next-gen OS are:

  • Contextual cards and disappearing apps
  • Extensible semantic framework
  • User modeling
  • AI grammar

Contextual cards and disappearing apps

The main interaction with a new OS would be a card system where each card can represent a micro-format with dynamic data, contextual intention and a visualization of said data.

The cards would be 1) ranked by importance to the user and 2) by the intent based on historical data pattern of a specific context.

The shift here is going to be the disappearance of apps in a traditional sense so that a card can be “your next meeting” and it can include people, places, and documents related to it. Based on the intention of the user, the card can then invoke a transit module (roughly equivalent to today’s Citymapper app) to navigate to destination or a communications module to chat with the meeting participants and automatically alert them of a possible delay (again pulled automatically from the Citymapper API). These modules are a new kind of interface between activities, carrying context and intent across your device’s features.

The key difference is that the data of the card is held by the device and that a hypothetical transportation app is a UI visualization of that data with some specialized API processing.

This implies that apps as we know them are bound to disappear and the full product experience will happen inside a card or its notification (again, the boundary between the two becomes fuzzy as some products could be experienced as notifications especially as the screen size is further reduced by AR and smartwatches).

Key takeaway: the card, not the app, becomes the destination because the card is tied to an intention.

Extensible semantic framework

Navigation apps know what a location is, email apps know what a file or contacts are and so on. The knowledge representation of semantic data today is app-specific and only few dimensions are described in the OS (time and space usually). In an AI-first OS the model is turned on its head and the OS has an internal knowledge representation and apps derive their data structure from it.

This enables different apps to communicate natively (right now your app can call an Uber only because its developer manually added the Uber API or it uses a 3rd party like Button) and also has the side effect of enabling an OS to invoke multiple apps to work on a problem.

Taking this further enables the user to express desires like “going out for dinner” and the OS to create a plan where different apps or modules satisfy each step of the plan.

Furthermore, since the required functionalities are defined by the user’s intentions as expressed in the framework’s language, the OS could automatically look online in a “feature store” for a suitable functionality if there were none on the device.

Google Now is not able to understand what an object is unless Google adds a card about it specifically in its system (like for cricket or basketball). To avoid pitfalls like this the system needs to have an extensible semantic framework in which the OS has a number of primitives that can be used by app developers to describe new objects and expand the interaction language. Semantically describing the world is a task bound to fail if it rests with the OS creator as the edge cases are almost infinite.

Key takeaway: the OS understands all the data that it is manipulating and passes that knowledge to the apps.

User modeling

The other big trend that is already happening is the move to an OS that is modeled with the user’s data. To some extent there was always a level of customization in OSes, whether scripts or color preferences, but the next step is the system reading the user’s data, understanding it, classifying it, correlating it and then responding on the basis of it.

The user modeling is at the core of a set of predictive APIs that need to be available to developers, such as “what next” (which itself is a container for multiple possible options depending if the next action is travel, communications, reading etc). The new Google Awareness API is an example of this.

Furthermore, this generic API would allow user feedback across all activities, since the OS could explain why its making a recommendation and the user could highlight or reject some of the criteria. This would turn what is today a passive activity (receiving a recommendation) into interactive query building, bound to be much more relevant to the user’s needs.

Key takeaway: the OS will be able to understand the user, their uniqueness and be aware of the user’s current activities.

AI Grammar

All the previous key technologies described here coalesce in what we at believe is going to be the litmus test of a next generation AI-first OS: an agnostic AI grammar.

By AI grammar we mean a set of operators like What, When, and Where that can be applied to data.

Some of this grammar is already part of our daily life: every time we search on Google we are performing “What” + our search query. With the introduction of mobile devices and calendars When and Where can be answered.

A lot of progress is being made in extracting information and making plans so that How can be answered.

But all these operators are performed by a black box where the user has to accept the results of the AI. While we do not think in terms of Why when issuing commands to an AI, it becomes apparent when the query fails and we are unable to tell the system Why Not. With an architecture where the user can explain itself to the system a failed query can be turned into training data.

This is necessary when an app incorrectly anticipates what the user wants.

But it’s the combination of all these technologies that will deliver a truly transformative next-gen experience and give users a device that can understand their desires and help fulfill them.

This grammar doesn’t need to be verbal. It can be mapped onto an NLP system or it can be baked into a contextual card system. Once the system can handle a grammar for communication, it will to some extent secondary how information is surfaced to the user.

Key takeaway: the OS needs to have a symbolic AI grammar and “Why”is the operator that will unlock communicating with AIs

What is the role of

At we are developing WeaveOS, a suite of artificial intelligence technologies that can power a next generation operating system. We have years of expertise in research, AI architecture and user-centered design. If this sounds like something you would be interested in please get in touch.

PS. I wrote this post with the help of my fellow co-founders Stéphane Bura and Mikkel Birkegaard Andersen. Thanks to Brian ‘Psychochild’ Green, David Fauchier, Peter Rood and Francesca Woodhouse for reviewing an early draft.