Will Mixed Reality Replace Phone Calls?
Mixed Reality conferencing and the future of collaboration
In the summer of 1995 I was lucky enough to participate in an amazing event that changed my life forever. As I put on a Virtual Reality (VR) head mounted display at the University of Washington I was virtually transported from Seattle across the Pacific to join other people in a small virtual tea room in Japan. I could talk to the others in the room, turn to look at them with my virtual face and wave at them with a disembodied virtual hand. This was the HIT Lab’s Greenspace project  and it was the first time that VR had been used to create a trans-pacific shared virtual world.
At the time the hardware cost over a million dollars and the communications bill for the few days the link was open was tens of thousands of dollars. Now, twenty years later, VR conferencing is becoming more commonplace. Companies such as High Fidelity, Sansar and Facebook are all developing collaborative VR spaces. Altspace VR regularly has over 30,000 monthly active users of its shared VR platform, and collaborative VR applications like Rec Room and Big Screen are growing in popularity.
One of the big attractions of using VR for conferencing is that it allows people to use some of the same communication cues they use in face to face settings. Not only can they speak to one another, but users have virtual bodies that allow them to turn to face each other, shake hands, and make a rich range of non-verbal communication gestures. Finally they can interact with virtual environment around them, such as pointing to and talking about objects in the space, or playing games together. Rec Room players can play table tennis, while people in Facebook Spaces can sketch together in 3D. This means that there can be a much higher degree of Social Presence than in traditional audio or video conferencing.
Collaborative Mixed Reality — The Next Telephone?
VR conferencing has a lot of potential, and companies are investing ten’s of millions of dollars into collaborative VR platforms. However, Mixed Reality conferencing may have even more impact, because of it’s connection back to the real world.
Mixed Reality (MR) is broadly defined as technology that mixes real and virtual worlds. Unlike VR, which separates people from the real world, MR tries to enhance interaction in the real world or bring elements of the real world into VR environments.
The benefits of using Mixed Reality for remote collaboration include:
- enabling people to get help from remote users on real world tasks
- bringing remote virtual people into a user’s real space
- supporting transitions from shared AR to VR views
- using MR imagery to provide enhanced remote communication cues
- enabling users to share viewpoints and see through each others eyes
- overcoming the seam between task space and communication space
- supporting natural spatial cues for remote collaboration
Perhaps the biggest benefit is that MR conferencing is typically focused on sharing the view of a user’s workspace with a remote collaborator, not their face as in traditional videoconferencing. For many real world tasks, such as remote maintenance it’s more important to see what the person is working on rather than their face when you’re talking to them. Using this capability, there are a large number of domains where MR conferencing could be applied, from being used for remote expertise assistance in industry, to providing medical support in the operating theatre, and enhancing shared gaming, among others.
Early MR Collaboration Systems
Nearly 20 years ago, I helped to develop one of the first collaborative MR systems. This was an Augmented Reality (AR) conferencing application that placed live virtual video avatars of remote people in a user’s real environment . The user could turn over real name cards and see virtual people appearing in front of them. This was viewed in a head mounted display enabling the user to have a hands-free collaborative experience. The main benefit was that it moved remote conferencing from a computer screen into the real world. We found that seeing a remote collaborator as a life-sized virtual video avatar in the real workspace provided a much high degree of Social Presence than seeing them on a computer screen.
I distinctly remember one study when we were comparing AR conferencing to a monitor based video conferencing system. One of the people using the video conferencing system moved themselves very close to the monitor, assuming that this would help the remote person hear them better. However when they tried the AR system and saw a life sized virtual head facing them, they immediately moved back and gave them the same personal space as they would in a face to face conversation. This unconscious act was a strong indication of the increased feeling of Social Presence provided by the AR application.
Around the same time we also showed the benefit of spatial cues in MR conferencing by developing WearCom . This was a wearable AR conferencing application that allowed multiple virtual people to appear around a person using a wearable computer and head worn display. In this case we used spatial audio to locate the voices of the people from their virtual avatars. Just like in face to face conversation in a crowded party, we found that using spatial audio enabled people to easily disambiguate between multiple speakers, even when they were saying almost the same thing.
Although this research showed a lot of promise, one problem was that the video avatars of the remote people were flat rectangles. If you looked at the people from side on they would disappear, and they certainly didn’t feel like they were 3D virtual people in your space. This problem was solved a couple of years later with the 3D Live System , where we used multiple cameras to capture people and enable them to be viewed from any angle. This created the illusion that a live 3D virtual person was standing in your real space. Once captured, the virtual copy of the real person could be streamed live into either AR or VR environments. This enabled the remote collaborators to use the same full body movement and gestures as in face to face collaboration.
The term Mixed Reality describes a continuum of interface technology from the purely real world to a fully immersive virtual environment. Most collaborative systems exist at discrete points on this continuum, such as face to face collaboration in the real world, or VR conferencing in an immersive space. However MR conferencing can enable people to transition along the MR continuum. For example, the MagicBook project was an interface that supported collaboration in a face to face, AR or VR view , or mixture of these. Using this system two people could read a normal book, but they could also look at the book pages through a handheld display and see AR content popping out of them. When a user sees an AR scene that is interesting, they can flick a switch on the handle and transition into an immersive VR experience. When in a VR scene the user can look up and see their partner in the real world looking down at them as a giant virtual head in the sky. In this way the MagicBook supported seamless transitions between shared real world, AR and VR experiences.
These research prototypes showed that MR technology can be used to seamlessly place virtual collaborators into a users real world, and provide unique collaborative experiences. Unlike VR conferencing, the MR interface enhances real world collaboration, and can enable a user to get help with real world tasks. Tests with these systems and others have found that people have more natural collaboration with an MR interface, and feel a much higher degree of Social Presence . MR Conferencing can be much more like face to face collaboration than video conferencing is.
Current MR Conferencing
The emergence of a new generation of AR and VR displays have led to the creation of new collaborative MR experiences. Microsoft with their HoloLens head mounted display have been showing an MR version of Skype. A user can position a virtual Skype window anywhere in space and see live video from a remote collaborator. At the same time the HoloLens camera can stream live video to the remote collaborator, so that they can see the local user’s workspace and add AR annotations to it to help them complete real world tasks. In this way the remote collaborator can feel like they are seeing through the eyes of the HoloLens user.
However, just like the AR conferencing application of nearly 20 years earlier, Skype on HoloLens places live video into flat virtual rectangle. The Microsoft Holoportation project overcomes this by using multiple depth sensing camera to capture and live stream a 3D virtual model of the remote user into the local user’s real world. Combining this with HoloLens, means that the local user can see a life-sized virtual copy of a remote user in their real world. Just like the earlier 3D Live system, Holoportation allows people to research how full body communication cues can be transmitted into remote real spaces.
There are also a number of startups beginning to work in this space. Mimesys describes themselves as providing the first holographic meeting platform, which can capture real people and bring them together into shared virtual spaces. Similarly, DoubleMe’s Holoportal provides a lightweight version of Microsoft’s Holoportation for shared meetings. These are both focused on capturing people, while Envisage AR is focusing on capturing a user’s real surroundings and sharing it with remote collaborators. Over the next few years expect to see a lot more activity in this space.
These systems show that MR conferencing experiences can be delivered on todays AR and VR commercial platforms. As the displays, tracking technology, capture systems and bandwidth continue to improve, the MR conferencing systems will continue to get better and better.
There are a number of developments occurring that will continue to improve MR conferencing and enable people to collaborate together more effectively than ever before. In particular, there are three important technology trends:
- Natural Collaboration: As networking speeds increase it is possible to send more and more communication cues than ever before (e.g. video replaces audio only), leading to more natural collaboration.
- Experience Capture: Technology is being developed that enables people to capture more of their experiences and surroundings than ever before, e.g. going from photography to 3D scene capture.
- Implicit Understanding: Computers can now understand more about users and their surroundings than ever before. This allows them to recognise implicit behaviour, such as where a person is looking.
At the junction of these three trends is a research area we call Empathic Computing. This is research that is focusing on developing systems that allow us to share what we are seeing, hearing and feeling with others. Unlike traditional conferencing tools, Empathic systems are designed to enable you to more deeply understand your collaborator’s viewpoint, seeing through their eyes, hearing what they are hearing, and to some extent knowing what they are feeling.
Our Empathy Glasses  give one example of the type of MR collaborative experience that can be developed from an Empathic Computing perspective. This is an AR display that can also recognise face expression and track eye gaze. When a person wears them a video of their workspace is sent to a remote collaborator as well as their face expression and eye gaze information. The remote collaborator can see what the local user is seeing, but also knows exactly where they are looking and when they are feeling confused or unhappy about the task they are doing. This is one of the first wearable collaborative systems that shares eye gaze in a head worn AR display. Using it we are just beginning to explore the impact of Empathic technology on remote collaboration.
In twenty years using a mobile phone to connect with friends will feel as old fashioned as using a landline does today. By then Mixed Reality technology will enable people to see virtual copies of our friends in the real world with them, or enable them to see through their friends eyes and help them on real world tasks.
The VR and AR systems of today show glimpses of what is possible with Mixed Reality. MR systems can share rich communication cues in the real world and enable remote people to collaborate in ways that was never before possible. Next time you make a phone call, imagine being able to see what your friends are seeing, hearing what they are hearing, and feel what they are feeling..
 Mandeville, J., Furness, T., Kawahata, M., Campbell, D., Danset, P., Dahl, A., … & Schwartz, P. (1995, October). Greenspace: Creating a distributed virtual environment for global applications. In Proceedings of IEEE Networked Virtual Reality Workshop.
 Szalavári, Z., Schmalstieg, D., Fuhrmann, A., & Gervautz, M. (1998). “Studierstube”: An environment for collaboration in augmented reality. Virtual Reality, 3(1), 37–48.
 Kato, H., & Billinghurst, M. (1999). Marker tracking and hmd calibration for a video-based augmented reality conferencing system. In Augmented Reality, 1999.(IWAR’99) Proceedings. 2nd IEEE and ACM International Workshop on(pp. 85–94). IEEE.
 Billinghurst, M., Bowskill, J., Jessop, M., & Morphett, J. (1998, October). A wearable spatial conferencing space. In Wearable Computers, 1998. Digest of Papers. Second International Symposium on (pp. 76–83). IEEE.
 Prince, S., Cheok, A. D., Farbiz, F., Williamson, T., Johnson, N., Billinghurst, M., & Kato, H. (2002). 3d live: Real time captured content for mixed reality. In Mixed and Augmented Reality, 2002. ISMAR 2002. Proceedings. International Symposium on (pp. 7–317). IEEE.
 Billinghurst, M., & Kato, H. (2002). Collaborative augmented reality. Communications of the ACM, 45(7), 64–70.
 Billinghurst, M., Kato, H., & Poupyrev, I. (2001). The MagicBook: a transitional AR interface. Computers & Graphics, 25(5), 745–753.
 Masai, K., Kunze, K., & Billinghurst, M. (2016, May). Empathy Glasses. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (pp. 1257–1263). ACM.