“Ubiquitous Firefox” Revisited (2012)
Building a zoomable Web operating system.
This post is part of a series of experiments in redesigning the Web browser. This design, from 2012, evolved from the previous ones, and so it answers some of the same questions, but goes even further: How do we solve the ‘too many tabs’ problem? How do we design a zoomable Web browser? How do we build an entire operating system around the Web?
Last year, I presented my Ubiquitous Firefox concept for redesigning the browser. (Don’t worry! Reading that is not necessary for this discussion.) The discussion proved insightful and thought-provoking. Towards the end, we discussed a number of interesting modifications. Since that time, those ideas have been slowly developing in my mind, and I would like now to revisit the issue. I will first list the ideas that we will keep from the previous discussion, the lessons we learned, and the principles that will inform our design. Then I shall present a rough sketch of my new concept: a true Web operating system. There are still a number unanswered questions, and I hope the community can help me answer them and improve the concept, yielding something truly revolutionary.
Lessons & Principles
Administrative debris is bad, content is good. Administrative debris is anything that is there for the purpose of administrating your computer (buttons, toolbars, indicators, and so on — what is called ‘chrome’), rather than content (the stuff you actually care about). The interface should have as little debris as possible. Instead of using buttons and chrome to manipulate the content on the screen indirectly, we should try to design the interface so that direct manipulation of the content is the primary way of using it. Similarly, instead of popping up dialogues that appear out of nowhere and presenting information in ways that do not fit in with your mental model of what’s on the screen, information should preferably be presented inline within the way content is presented. The way to do this is with good information design. This lesson is especially important with modern touch interfaces, where screen real estate is precious and where we can finally have real direct manipulation — with our fingers!
The fewer the mental models and metaphors, the better. Modern computers have a dizzying number of different concepts that we must grasp. Just within Firefox, how many different ways are there of issuing a command? Menu bars, toolbars, context menus, and all of the above duplicated within web apps — that’s quite a lot of different ways of telling your computer to ‘do this’! Mozilla Labs’ Ubiquity attempted to unify all these ways into one mental model, and its model may yet succeed. What about the number of separate locations for different representations of pages in Firefox? The tab bar, tab history, bookmarks (including bookmarks toolbar, bookmarks menu, and unsorted bookmarks), history menu, browsing history, location bar — isn’t it too much? Let’s reduce the number of mental models we need to internalize.
The location bar still has to be replaced. Despite sensationalist headlines, I still think the location bar needs to be go. It is administrative debris that is forced in your face at all times. Worse, it doesn’t fit in well with any mental model of how pages are displayed (yes, even with tabs-on-top). (Those interested can read in detail my arguments against the location bar’s standard design.) It should be replaced by two mechanisms: (1) displaying the page title/address/metadata inline, attached to the page, where it makes sense; and (2) using a wholly separate command mechanism for telling your device where you want to go, which would not take up much screen real estate, if any at all.
Design with an upgrade path for Ubiquity in mind. Ubiquity is still brilliant and has more potential than any other mechanism for executing commands I’ve ever seen. Nonetheless, it’s not ready yet. I’ve, therefore, made sure my design made sense with or without Ubiquity, leaving a clear path for incorporating it in the future.
Unrelated pages should be separate; spawned pages should be together. Say you’re browsing in a standard browser, and you have 3 random tabs open: ABC. You then open three links from tab A in new tabs. Now your tab bar looks like this: ADEFBC. You continue this way throughout the day and you find that your tab bar is a jumble of pages with no simple way of finding the page you want or seeing what’s in front of you. This is the problem of tab proliferation. In my previous concept, we learned a general solution to this problem: when you open pages from another page, these pages should be grouped together; when you open a new page from scratch, it should be separate from other pages. In other words, A should be grouped with D, E, and F, separate from both B and C.
The Back/Forward system is broken. Similar to tab proliferation, there is a lot of redundancy between each tab’s back/forward history and and the tab bar. When you follow a link, sometimes you stay within the current tab, adding a page to the tab’s history, and sometime you open a new tab, adding a page to the tab bar. Conversely, the previous page in the tab history is sometimes one that was viewed in the same tab and is sometimes a from different tab altogether. Essentially, when you spawn new tabs, the tab bar replaces your tab history. Can’t we unify these two ways of browsing?
Open vs. Open In New Tab is not monotonous. Related to the previous issue is the choice we must make every time we click on a link: should I open it here or in a new tab? This choice creates a small delay every time you click on a link. Over time, these delays add up. They also contribute to a mental burden that builds up over time, especially every time you realize you should have opened a link in a new tab, so you must go back and lose even more time undoing that mistake. Worse, for those who are not comfortable with opening links in new tabs, the benefits of this form of browsing are out of reach. What if we removed this “choice” and optimized the interface for one form of browsing? Then the interface would be more monotonous (in a good way): you don’t have to think about using it — you just use it.
Note: The very first browser, WorldWideWeb (later renamed ‘Nexus’), actually didn’t have the previous two issues. Instead of relying on a Back button, clicking on a link created a new window. Of course, such a system quickly leads to too many windows, which is why the Back button was later created in the first place. An alternate solution to that problem, however, could have involved better window/document management.
The History Scroller is probably too much. My previous solution to the issues outlined in the previous paragraphs involved showing all spawned pages within the same tab, merged into the tab history. Although this seemingly solved the problems, it created a new widget that must be learned. This would have contributed to the proliferation of mental models: the tab bar (or Panorama) for organizing tabs, and the History Scroller for organizing tab histories. This actually isn’t very different from today, where we have the Back/Forward system in addition to the tab bar. On top of that, the tab history model fits in even less with Panorama (tab groups): it looks like you’re arranging all your pages on a flat plane but, in fact, each page you see comes with many others hidden. What if we truly flattened this hierarchy into one consistent interface?
Panorama rocks. In my previous concept, Panorama (formerly Tab Candy) was more of an after-thought, but it should not have been. Panorama is a simplified ZUI (Zooming User Interface). It has a lot of potential (nearly infinite!), but it currently has many limitations. For one, there are only two zoom levels, limiting how much you could put on screen. If you could view the canvas at any arbitrary magnification, though, you can place objects at different levels, at different relative sizes, and also place an arbitrary number of pages there. You could even use it to replace bookmarks! Secondly, Panorama competes with the tab bar as a way of organizing pages. There should be only one interface, not two.
If you haven’t done so before, I highly recommend trying out Panorama prior to moving on to the next section.
The browser is the operating system. Mozilla is already working on Firefox OS, which allows devices to boot directly into the Web, without the increasingly-unnecessary baggage of foreign applications and the operating-system environment. Firefox should be the operating system. Although I had this in mind all along, we can now focus on creating a design that will truly make this work. My goal is nothing short of creating a concept so compelling that people will want it to replace their operating systems.
Design for touchscreens first. In order to make this concept compelling as an interface for an operating system for the foreseeable future, it will be designed with touchscreens, and especially tablets, in mind as the primary environment. I will also, however, have desktops and smaller devices in mind, making sure the interface works well on most types of devices.
Keeping all of the above in mind, I will first present a design for the environment of Firefox-the-Operating-System. Then, I will present the details of how tab proliferation and related issues are addressed in this environment.
I. Panorama Enhanced
The above mockup represents a direction in which Firefox could evolve were its Panorama feature used as the basis for an entire Web-based environment (operating system interface). We see here a tablet displaying the home screen. You see pages arranged spatially at various locations and various sizes. Unlike Panorama, this is a true Zooming User Interface (ZUI): you may zoom in or out from here as much or as little as you want, and you may place objects at any zoom level. (Compare the original design that eventually led to Panorama, or a much earlier ZUI demo, all of which were created by Aza Raskin.) Once you zoom in enough for a page to take up the entire screen, you may use it like any other Web page. The interface includes the following elements:
- Command button. The interface has virtually no administrative debris — it is chromeless. This is one of the few buttons in the entire interface. Its purpose is to bring up commands or actions for you to run. In most software, commands are exposed in different ways, each using a different conceptual model: menu bars, buttons, drop-down menus, and right-click menus. Worse, all these actions are placed in different spots all over the screen, forcing you to hunt and peck for the command you want. With the Command button, however, all commands get added here in a unified manner. In the beginning, it would work similarly to the current Firefox button. It should, over time, be evolved to invoke a Ubiquity-like command system. These commands would work on any page you like (unlike today, where commands are stuck within each application). People should be able to add or remove commands here as they please. If you hold down the button, you can use your own voice to command the system. For example: select some text, hold down the button, say ‘highlight this’, let go, and your selected text is now highlighted. Ideally, this would be a hardware button somewhere on the device, but here is where it would go otherwise.
- Semantic zooming. Consider how many different representations objects can have in most interfaces. First, you have the actual content of that object, viewed full-screen or in a window. Besides that, you also have icons of various sizes to represent that object. Sometimes you have thumbnails instead of icons. Besides that, in some environments, you have “widgets” — smaller versions of objects, larger and more interactive than an icon but smaller than the full thing. On top of all that, these different representations tend to work within very different interfaces. This is all an artifact of ancient computers, which could not handle displaying anything more than icons when working with multiple objects. Why not have only one interface and one type of representation — the content itself? Content should represent itself.
How do we do that? In the zooming environment, objects may appear at any arbitrary size. To see more detail, zoom in. To see less detail and more objects, zoom out. But simply making pages smaller is not always the best way to show them at smaller sizes. Instead, pages change their representation automatically based on zoom level. Think about zooming in and out in Google Maps (another ZUI): small-scale details appear when zooming in, but they are replaced by large-scale details when zooming out. In the mockup, Gmail has changed its appearance based on its size on the screen. It is still interactive. This is semantic zooming. A standard will need to be drafted to allow any site to take advantage of this feature.
- Continuing the above line of reasoning, if a page appears even smaller, it is reduced to an icon, optionally with a badge (such as the unread counts on the Google Reader and Twitter examples in the mockup). This allows you to place all your pages at the ideal sizes for you to get the best overview of information.
- System area. In this environment, besides the two buttons, everything else is content — even this system area. Everything is a page. (Compare this to the Unix philosophy of ‘everything is a file’.) If you zoom in on the Wi-Fi icon, you get a full page with an interface for managing your wireless connection, thanks to semantic zooming. The same goes for the other objects in the system area. In fact, if you wished, you could replace the standard system objects with alternatives of your own choosing, because these are merely pages. The pages all appear inside a page group (similar to a tab group in Panorama). The background for this group is dark by default to indicate that it’s the system area, but the background could be changed for any group as well.
- Find button. The Find button is the only other button in the entire interface, besides the Command button. Its purpose is to take you to the phrase you seek within open pages. It searches every page in view, highlights the results, and takes you to the next one. This is like
Ctrl+Fin Firefox, except that it can quickly get you anywhere in the entire environment, not only within one page. Its function is so crucial that it deserves its own button, rather than appearing as another command in the list of commands. Like the Command button, holding down this button allows you to speak your search, allowing you to get to it as quickly as possible. For example: hold down the button, say ‘bunnies’, let go, and your screen jumps to the next occurrence of ‘bunnies’. This would also ideally be a hardware button.
- Stacks. This feature was proudly inspired by webOS’s stacks (and anticipates “Eel” (and bares a striking resemblance to Elastic Windows), but improves upon that feature greatly. First, pages can be stacked on top of each other. The most recently-viewed page in the stack is visible on top. Pages may be stacked together manually, but that is not the normal way stacks are created. Whenever you tap on a link, it opens as a new page behind the current page, creating a stack if there isn’t one already. Later, I will explain how stacks work in more detail. It is this simple feature that will end up being the solution for tab proliferation, along with the other problems I described earlier.
- Inline downloads. Since everything is a page, and this environment is your entire operating system, all objects go here. So, when you want to download a file, it merely appears as another page. Until the file is fully downloaded, a progress indicator appears. In the mockup, a video is being downloaded, which may also be played right there. In fact, every page that is open is saved to the device. You thus never have to worry about saving anything or downloading it to make it available offline. You never have to download a file manually so it’s available outside your browser. If it’s open, it’s saved/downloaded; and you could just move it elsewhere in the environment or organize it if you wish (even while it’s still downloading). There is no distinction between opening and downloading. Once you delete a page, however, the system is free to overwrite its space in the future as necessary.
- New page. The way to create a new page is to touch any free spot in the environment, at any zoom level you like. Since there is no location bar for entering URLs and page titles, this is the only way to open a new page. When you tap on an area (indicated by the blue circle in the mockup), a New Page icon appears at that spot to let you know that a new page will be created at that spot. When you let go, a new page is created at that spot and is immediately zoomed in, showing you something similar to Mobile Firefox’s Awesome Screen, which will allow you to open anything you desire right there. Since this is the only way to create a brand new page, it should be harder to lose track of where you put that page. Finally, together with stacks, we have a real division between related pages and unrelated pages: spawned pages are always appended to the same stack, keeping related pages together, and new pages are always created alone in a new area, keeping unrelated pages separate.
- All your stuff. Since everything is a page, and since this is the entire environment, all your files should appear here. Since much of your stuff is online, those files should also appear here. Thus, all your files, whether on the local drive or in various online services, automatically show up here. In the mockup, we see a page group that automatically shows all your documents in various locations. The page icons there indicate their origin. These pages may be rearranged or moved around in the environment, like any other page. Moreover, since all pages are downloaded (because opening = downloading), all these documents are automatically available offline.
I propose that a new standard be created to allow one to install services so that all their objects are automatically synchronized with your system and made available to you. Outside this environment, such as in other browsers, these objects would show up as smart bookmark folders, automatically managed for you. With this system, your browser could give you instant access to any object, such as documents, images, messages, conversations, contacts — anything. And they would all appear as pages, since everything is a page.
- This is another page group. This one was manually created. Unlike in Panorama, pages are not automatically rearranged when pages are added or removed from a group. Objects never move on their own. Maintaining this rule reinforces your spatial memory. You know that you can put your stuff anywhere you like and it will always appear exactly where you left it. You can, however, set a group to sort automatically for you, just like you can have a group that automatically shows certain types of pages. Also unlike in Panorama, groups must be created manually. The interface would still be completely usable if you never create any groups at all.
- History. If you zoom in here, you should see all the pages you’ve ever visited, including all deleted pages, arranged in some sort of logical way indicating historical order and stacks. Any deleted page may be visually located and restored from here. Since this is a system area, its background is dark.
To recap: how many separate metaphors and paradigms have we simplified into one? Instead of icons, widgets, tabs, and windows, we have semantically-zoomable pages. Instead of bookmarking, you simply leave a page around in the environment, and possibly move it to a better location (like with objects in the real world) and better zoom level (like you wish you could do with objects in the real world). Instead of downloading, we just open (and leave the new page around if we want to keep this “download”). Finally, instead of both the tab bar and individual tab histories, we have stacks. Pages are the sole metaphor.
II. Stacks: An Alternative to Tabs
Say you created a new page by tapping anywhere in your zoomable canvas, such as was done in the tablet mockup. What happens next? Everything zooms in on the area where the new page will be created, at the spot you chose…
- Create a page. You then get something like the Awesome Screen — a page designed to get you to where you want as fast as possible, including a spot to type in text. This page appears partially zoomed out so as to introduce you to the stack view. This is the zoom level where the current page and other pages in its stack will appear together, and where the interaction is designed for working within the stack of pages. Showing the stack view at this point will serve to reinforce the idea that you can always zoom out to this view. Once you’ve chosen your destination, the new page loads in place, replacing the Awesome Screen in its entirety (including the input area). When the page has loaded, the view is zoomed in even further, showing you the page and absolutely nothing else: 100% content.
- Open a link. When you click on a link, the view will zoom out slightly to the stack view and a new page appears next to the current one. In this environment, there is never any choice between opening in the current tab and opening in a new tab, because all new pages open as separate pages. Moreover, all new pages open in the background. This behaviour allows you to continue browsing your page while you wait for the new one to load, so you don’t get this annoying delay in between pages like you do in normal browsers. Moreover, the current page (the frontmost one in the stack) is still interactive in the stack view, so you can continue to use it and even to open up more pages from it. You could also zoom back in and concentrate only on the current page.
The stack view also brings us inline page info overlaid at the top of the page. This info always appears over the top page in the stack view, but nowhere else.
- Follow the link. At any point you wish, you can tap on the new page that the link opened, allowing you to follow that link as soon as you wish, without delay.
Note that the interaction here uses one of the few tap actions in the entire interfaces. In general, in touch interfaces, it makes more sense to use swiping rather than tapping. For one, tapping is too easy to do accidentally. More importantly, tapping usually means pressing some button, which is not direct manipulation. In order to achieve direct manipulation, the interface needs to let you move objects around, stretch them, throw them — generally actions that require forms of swiping. In this instance, tapping is allowed because we don’t want to force people to spend more time than necessary to follow a link; it needs to be as fast as it is in current interfaces. Moreover, tapping is acceptable here because it is being used for selecting objects, as that feels more like direct manipulation than tapping a button.
- Go “back”. In order to go back to the previous page, you use a long edge swipe from the left inwards. Edge swipes are important in our interface: when you swipe from the edge of the screen inwards, you are indicating that you want to drag into view something from beyond the visible area of the canvas. Swiping inwards from the left across the screen is an indication that you want to bring the previous page in the stack into view. You are literally pulling that page onto the screen, and this gesture will get animated as such (perhaps something like this.) Similarly, a long edge swipe from the right moves you forward in the stack.
Note that this isn’t really going back and forward in the same way as you do within a tab history, nor is this the same as moving left or right along a tab bar. This is a better: you’re moving back and forth within your history, but only within related pages (those that ultimately descend from the first page you created in the stack), and only within pages that you wanted to keep around. Moreover, the visual metaphor here is far more obvious than either having tab history hidden behind a Back button or even tabs on a tab bar.
A similar swipe from either of the top corners will bring up either the Command or Find interfaces at any time without the need to zoom out.
- Open a new link. If you tap on a new link now, it will open immediately to the right of the current page. It does not appear at the end of the stack, because you’ve already read that other page. This prevents new pages from appearing at the end of a thick stack, which would make it difficult to keep your train of thought. This is similar to how background tabs are arranged in most browsers: new tabs appear immediately to the right of the current one, not at the end of the tab bar, so you can immediately go to the tab you’re most likely to need.
- And another link. If you tap on yet another link, that page opens behind the other unread page. So, the page for Link 2 appears first, followed by Link 3, followed by all other pages. Thus, you can open up a bunch of links from a page and, when you’re ready to read them, you get to read them in order. Pages that were not opened from the current page or that were previously read will not get in the way. This is also similar to how background tabs are arranged in most browsers.
- Remove a page. Throw a page up from the bottom to remove it from the stack (similar to webOS). There should be some animation that fits within the visual metaphor (though I’m not sure what it should be). Undo will always be possible. In this instance, the next page in the stack is immediately zoomed in, because it was waiting unread. Thus, this is how you follow a link when you don’t want to keep the parent page around: tap on the link, swipe up from the bottom, and that’s it!
Although this seems like more work than simply clicking on a link, these two actions will quickly feel like one quick gesture. Unlike clicking on links, you never have to wait several seconds while a new page loads, during which the previous page becomes unresponsive unpredictably. Not only that, but you don’t have to waste time wondering if you should just click on a link or open it in a new tab: you just tap on the link and then, while that page is loading and not yet useful, you can decide if you want to keep the current page around. Unlike using middle-clicks or a contextual menu, this interaction should be obvious and simple to anyone using this system, both advanced and novice.
This gesture will also work as a long edge swipe from the bottom when a page is full-screen. This allows the following scenario: you have pages opened in a stack (in reading order, naturally); as you read a page, you keep swiping up to scroll down; when you get to the very bottom of the page, you continue swiping up; this final swipe pulls the page up even more, throwing it away; the next page in the stack slides in, full-screen, allowing you then to read that page. The same upward swipe thus allows you both to read individual pages and to throw them away in order to move to the next page. This means that reading through a stack of pages is all done through the same gesture.
- Zoom out to the stack. At any point, you can always zoom out from your page to see the stack of pages. Not only that, but you can always do a short edge swipe from any side in order to bring the context into view without switching pages. This additional gesture is necessary because the pinch gesture may sometimes be used by the current page, so you need a guaranteed system-wide way to zoom out from your page.
Note the reopen command embedded in the page info of the purple page. Since this page was spawned by the white page, which is now gone, this command appears here so you can always bring back a parent page in case you want to restore it. Tapping on this command will make the white page slide back into its previous location. Since this command always appears when needed, you need not worry about swiping away pages that you think you don’t need but might need in the future — you can always bring them back easily.
- Browse the stack. In the stack view, you can swipe back and forth in the stack to flip between pages. Thus, in order to flip through the pages to the right of the current one, you can do a short edge swipe from the right followed by a few more similar swipes until you find the page you’re looking for. This all works together to feel like one gesture.
- Zoom out to the overview. In order to see more than just the current stack, just zoom out (by pinching), as little or as much as you wish. Since this is a ZUI, zooming in and zooming out are always the primary way to navigate this space. No other metaphors are necessary.
Did you notice how we solved the tab proliferation problem? The solution here is conceptually identical to the one in my original Ubiquitous Firefox concept, but this one is far more discoverable and obvious (and hopefully more enjoyable). Moreover, this one has the advantage of only keeping around pages you care about. By always placing new pages in a fresh space, you never have unrelated pages grouped together. By presenting all spawned pages together in a logical order in one stack, that stack in essence becomes your reading list, ordered the way you naturally would want it. By merging tab history and the tab bar, we’ve made it a much simpler task to find your page within a stack of related pages. And this was all done by removing metaphors until we were left with just one: the page.
- How should the stack be visually designed? I had a lot of trouble trying to figure this out. It needs to fit within the general visual metaphor (a 2-dimensional infinite space with flat objects placed on it). It needs to be tight enough so that you can see many pages at a glance without needing to flip through the stack too much. It needs to work for very large stacks as well. And it should generally be fun and easy to work with, while also looking pretty in the overview. After all, if it looks too messy, people might have a strong urge to “clean” it up when they don’t really want to.
- Flipping a page upwards to throw it away may be fine, but is it consistent with the visual metaphor of this interface? After all, there is a physical space above the stack, so people might find it confusing that throwing a page upwards deletes it, rather than moving it to that area. Perhaps a good visual design could remedy such confusion. Even then, is it still consistent with the metaphor? After all, there’s a real “physical” location for old pages: the History area at the bottom-right corner, and that location is not at the top. So, how should deleting a page be visually presented, and is there a better gesture/metaphor? And how should deleting a page from the overview work?
- What about multitasking? Since this environment is meant to replace all others, we need the ability to show multiple pages on the screen at one time. I think some research is needed to figure out which use cases are actually necessary. My suspicion in that only two forms of multitasking are necessary to cover 99% of needs: side-by-side tiled windows, and a floating window in the corner. In any case, how should such a view be invoked? Is there some sort of relatively-logical gesture that could be used, or should it simply be left in the Command menu? We can leave this question for the future, but it’s something to remember as necessary for a real operating system.
- Are there any other problems? Any scenarios I’ve missed? Any comments on anything else?
Very little of all of this is genuinely new. Quite a lot of the concepts here are based on The Humane Interface (and most of this should be familiar to anyone who has read this book), together with design patterns found in many different digital environments (such as webOS). True zooming interfaces have been around in research form for a long time, but have never quite made it to the consumer space in any large form, outside specialized applications (such as Google Maps). I’ve spent several years thinking about how a ZUI browser would work (including one mockup), but I’ve never come across a truly satisfying answer. It was only once I finally got an idea of what the essential problems with browsers were and some possible solutions that I finally figured out how a zooming browser might look — one that has a real chance of succeeding. In such an environment, we no longer need the classic Desktop Metaphor, no windows, icons, menus, or pointers, no opening or saving, no downloading, no applications, and no other outdated counter-productive concepts. Nor do we need the metaphors that Web browsers have had for a long time: the Back button, bookmarks, and tabs. (Wouldn’t it be great if we could get rid of Reload too?) Finally, we don’t need chrome any more either. All we have is content. Everything is unified in this environment: the browser is the operating system. (And, unlike Firefox OS, there is no odd browser-app–within–browser-operating-system.)
Let’s make it work!
Afterword: Making It Work
Thanks to a fruitful and insightful discussion, I’d like to revisit the concept with some answers to the questions I asked at the end of my proposal, along with some new ideas to modify and compliment the design. I will also discuss how we might implement this system, and what needs to done now.
Stack Design. Instead of the physically-confusing way in which I originally placed cards in a stack, the stack should look more like a left-handed fan of cards, the topmost one at the left.
Additionally, the fan would be parted in the middle, with the currently-focused page sticking out. Thus, two pages are always fully visible: the first page and the focused page. Since the first page is often the page that originated the entire stack, and since that page is often a Web app (such as when you’ve opened many links from Gmail), having this page visible is useful.
How do we make sure the stack isn’t too cluttered? Semantic zooming. In the canvas view, the stack is compressed tightly all within the space of the original single page. As you zoom in, the stack magically fans out within that space, revealing more detail, as the view seamlessly transitions to stack view. Initially, when zooming in to stack view, you see all the pages in the stack. From there, you can continue to zoom in, until you have reached the point where the system seamlessly switches to page view.
Making stack view zoomable (like with canvas view and page view) has several benefits. First, large stacks become easier to browse. Instead of slowly flipping through each page until you find the one you want, you can just zoom out a little to get more of an overview of the entire stack, arranged in a way ideal for that zoom level. Imagine if zooming out from page view (or zooming in on a stack from the canvas) showed you the stack arranged as an easy-to-browse grid (kind of like zooming in on an album on the iPad). This need not be the exact visual design, but the point is the same: semantic zooming allows us to make the stack as easy to browse as the overview. With proper design, we don’t need to introduce a button or gesture in order to show a different view of the stack.
A second benefit is what happens when you open a new page from a link. Instead of the display zooming out to show the entire stack, we zoom out only enough to show the new page’s placement within the stack. This ensures that the current page is given the maximum screen real-estate, leaving it usable even without zooming back in. Similarly, the currently-focused page in the stack should be made usable as soon as the stack view’s zoom level is deep enough such that we see the page and only a few nearby pages.
History Area. In order to solve the history area issue along with some others, we first need to redesign the system area. The system area should appear as a strip across the top of the canvas. All system objects should appear there. The history area should be placed in the middle of the system strip, rather than at the corner of the content area. The rest of the space below the system strip is 100% reserved for your content, organized as you wish, without limitation. This solves the page-deletion problem, because we can visualize throwing a page upwards as the history object sliding in from the top of the screen while the deleted page gets smaller and is made history, as it were. It should also be possible to throw a page (or stack or group) away from the overview (like on Android). The system strip should also include a command area by the Command button. If you zoom in on that area of the system strip, you can manage your commands from there. Search should probably also be moved there as another command.
This separation of content and system information allows us to present the environment as a clear physical hierarchy. The system strip and the content are are both subsets of everything that belongs to the user. Zooming out from here will show you other users along with other devices on the network. Perhaps zooming out from there will also show you other computers elsewhere.
Besides cleaning up the system area, we also introduce a new element: a status bar that appears at the top whenever we are not in page view. Although this kind of breaks the no–administrative-debris rule, this is a necessity due to the items within it: Command button at the top left (as before), and notifications at the right end of the bar. These notifications can come from any open page, both system pages (like the ones that display time, connection, and volume status) and regular pages (such as your messages). When you access one of these items by pulling it down from the top, the item should increase in size and slide down, appearing as a page overlaid on the screen. These pages can also be swiped away just like a page in a stack. Finally, notifications can temporarily appear on the status bar, sliding down the status bar into view if you’re in page view.
Multitasking. My hunch is that the main types of multitasking that we should concentrate on are side-by-side tiles and a small floating window in the corner, though research is needed. Each window is a separate viewport onto the environment. Theoretically, two people could use two viewports simultaneously, each viewing a separate user account. What we absolutely do not want is partially-overlapping windows (nor tabs).
There were several ideas about how we can facilitate multitasking. First, if you drag a page to any edge of the screen, you create a tile on that side displaying that page zoomed in. The other half of the screen will hold another tile displaying whatever zoom level you were at before this action. If your drag a page to a corner instead, you get a floating window in that corner.
Alternatively, we can take advantage of multitouch to allow you to create a “paradox” for the system: hold a page in place (at any zoom level) and then perform an action that would have moved the page. For example, tap and hold on the screen in page view with one finger and perform an edge swipe with another finger; the system must then interpret this gesture as displaying the page in one side of the screen while showing you stack view in the other side. Similarly, you can hold a page in place in stack view but then tap on another page. Or, you can tap on two pages at the same time.
Another possibility is to use a special gesture, such as dragging the screen from the centre outwards with both hands, as if you were ripping the screen into two pieces. This proposal, however, is the least discoverable. Alternatively, we could simply add a New Window command. Such a method would much more discoverable, but would not follow the direct manipulation feel of the other proposals (nor would it be as much fun). Whatever method is chosen in the end, it should be discoverable, quick, consistent with how the system works (especially with direct manipulation), and, importantly, fun (so it’s not a mental burden whenever you think about multitasking).
Special Groups. As I mentioned briefly in the concept outline, you should be able to install online services such that all their objects automatically appear in you environment. For example, if you install the Google Drive services, all your Drive documents appear as pages automatically synchronized within a special group. We can take this idea further: what if we allowed online services to escape the boundaries of the page? Web apps could then manipulate their own groups to display their content in any way they like. So, for example, Flickr could give you not a page, but a whole group that displayed all your photos in a way optimized for photos, together with proper controls or whatever else is desired for that specific task. A music service, on the other hand would organize your music in a wholly different manner. The system could take advantage of this functionality by showing you your history of pages in a manner optimized for viewing your history and restoring old pages. Meanwhile, the main unit of content (pages) from these services would continue to be exposed to the system, allowing you to search for these pages and to copy them elsewhere. Thus, the system is built with extensibility in mind, allowing you to customize it for different types of content beyond the base design of the ZUI.
The following is my best estimation for how this system should be implemented. Ultimately, though, implementation will be up to the programmers. (I hope my description makes sense to the programmers out there.)
At first, the system should be coded as an application, rather than solely as an operating system. Doing so will allow us to gain as much exposure as possible. The core of the application should be written such that it can also easily be loaded as an entire operating system environment (such as on top of Firefox OS’s kernel). The application should mostly be written as a Web app. It should be available as an Open Web App, so that it can be run anywhere Firefox runs: desktop operating systems, Android, and Firefox OS. Hopefully, with some extra code, the app should also run in iOS.
On the filesystem level, each page should automatically be saved either as a single file (perhaps
.html), as a special folder (similar to
.app), or as an archived folder (similar to OpenDocument). All metadata related to the page, such as location on the canvas and the page’s state, are saved along with it in the filesystem. Groups would simply appear as folders. If a page appears in more than one location, we use a hard link for each instance. Previous versions of each page are also stored, so that you do not lose content when refreshing a page. If the user already has files (such as when running this system as an app), they would appear as pages within the environment. Ultimately, it should be easy for anyone with more direct access to the filesystem to copy pages and to share them with other operating systems.
From there, one possible goal is to get this system incorporated into Firefox OS (as well as the Firefox browser). Firefox OS’s system of apps could co-exist within our environment: when creating a new page, installed apps will be some of the objects suggested by the New Page screen. Moreover, apps could all appear in the canvas as pages. Beyond Firefox (OS), perhaps the design could even get incorporated into other Web-based systems, like Open webOS or Chrome OS.
The Next Step
I, with countless others, have been frustrated by the inhumane design of most digital environments for far too long. Alas, it is only recently that I got a more concrete idea of how to fix this mess. For me, there is a moral imperative to try to give the world a better alternative to the interfaces that confuse people and scare them away from computers. The time has come to do something about it.
In order to make this happen, I need someone to help me create an interactive mockup of the concept so that people can play around with it, fall in love with it, and want to help turning it into a reality. I also need developers to help code the actual app. More importantly, I need eyes: as many people as possible should see this. Ideally, I would love to generate interest within Mozilla.
Finally, for any potential employers out there, I am available to work on such a project (and available for UX jobs as well).
Now, let’s make it work!