Tzina: Symphony of Longing — Conclusions

15 min readDec 17, 2016

Tzina: Symphony of Longing is WebVR documentary for the Chrome Browser / HTC Vive. It takes place on the main city square in downtown Tel Aviv — “Dizengof square”, also named “Tzina square”, after the wife of Tel Aviv’s first city mayor, Meir Dizengof. The documentary tells the story of different individuals that inhabit the square regularly, spending most of their day sitting on the benches that surround its circular structure, with the monumental fountain sculpture in its center. Whether it is because they are homeless, poor or just lonely, they all find themselves there, pondering about lost loves and missed opportunities.

Shirin Anlen, the director of the film, and a friend, was able to find common grounds with the people of the square, and decided to tell their story to the world. An additional motive for the creation of the film was the fact that the square as we know it, which is built high above the street, was about to be demolished and then leveled down to the street’s level. There’s no way of telling what would come up with the people that currently occupy the space, so in fact there is an archival value to the making of the film.

Being an avid new media pioneer as Shirin is, she wanted to use nothing less than the newest most experimental technologies that exist for interactive storytelling. She assembled a team of very talented and multi-national people, recruited me to be the lead developer, and we set off on this WebVR journey.

Project Outcome

I am very proud of the result that we have achieved during less than 6 months of development (That followed about 6 months of research and shooting that I was not part of).

The film is both a Vive and Web experience, it contains a modeled and greatly enriched version of the square with 3D point cloud scans of the trees and passers of the square.

It contains about over 45 minutes of footage with 10 interviewed characters that were shot using DepthKit depth shooting technology, edited, rotoscoped and then embedded into the virtual world. Each character projects an animation that was tailor made for its dialog. We were also able to implement a multi-user feature in which viewers of the film can see other viewers walking in real time as pigeons. An interaction mechanism was implemented that allows the viewer to change between different times of the day by gazing at one of the suns that are blazing over the square. In VR mode, the content rotates toward viewers, instead of them having to teleport around the square.

The film was selected for the Doclab exhibition of IDFA — International documentary film festival of Amsterdam. We barely managed to complete the project in time for the festival, so the experience was not always fluid, but some of the feedback that we got was really positive and assured us that we were able to deliver the desired message of the film.

Reflections

I would like to divide my reflections and conclusions from this project into 3 distinct parts:

Technological
Organizational
VR Storytelling

Technological conclusions

WebVR

The idea to work with WebVR was proposed even before I joined the project, but I fully supported it. In retrospect, I did not really know what I was getting myself into, but the important fact is that I was working with open standards and software that is in line with my ideology. Choosing to develop strictly for the browser, using only open source technologies, not only drastically increases the exposure of the audience to the experience, it also contributes to the dispersion of the technology to the general public and moves toward a state in which more and more people will be able to produce content. Indeed, during the development process, I was already able to give back to the community by reporting bugs and suggesting fixes. The complete source of the experience is also publicly available. Proprietary engines such as Unreal and Unity do provide options to export the content to WebGL, but at this time they are still very immature and lacking in that respect.

When working with WebVR, that is connecting an HMD such as the HTC Vive to the browser, one has to use an experimental version of Chromium or Firefox. At the time of development, Chromium was more advanced and robust than Firefox so it was chosen to be the target platform. The provided builds however, were still very experimental, and the more we were pushing the limits of the platform, the more issues we came across. Every month or so, a new version of Chromium WebVR was released, mostly maintained by Brandon Jones. The new versions did fix issues, but also frequently introduced new bugs and breaking changes to the API. To this date, the two most recent versions of Chromium available offers a dilemma — The September version has a bug in which the browser crashes every time the Vive loses its sensor tracking. The October version fixes that bug, but introduces memory leaks that cause the browser to crash with an “Out of memory” exception after a reload of the experience. We have opted to use the September version. Using the experimental Chromium version also introduced substantial obstacles in terms of marketing the project, because curators have to put on an extra effort in downloading the chrome version and enabling WebVR. However, WebVR is destined to land in the upcoming version of Chrome very soon. All in all, I am pleased with our choice to use WebVR, as we were able to push the envelope on the technology and create something truly cutting edge.

3D Engine

Once the decision was made to use open source software, the choice of a web 3d engine was narrowed down to two possible engines: Three.JS and Babylon.JS. In retrospect, it may have been worthwhile to perform a more extensive examination of Babylon.JS. I chose Three.JS for three main reasons:

Very big community and example database, including people working closely with WebVR.
More lightweight, less monolithic, not affiliated with large corporations (Babylon.JS is affiliated with Microsoft).
I have worked with it before and the deadline was tight.

When it comes to frameworks, I am rather fixated on issues such as modularity, lightness and flexibility. The size and capacity of Babylon.JS startled me. However, I think many of the issues that I dealt with could have been avoided by using Babylon.JS. It would be true to say that Three.JS is not really designed to support large scale projects such as Tzina. When it comes to performance, the lightness of Three.JS is where it shines. I could have full control over the rendering loop and embed my own modules, such as fast collision detection using Box-Intersect and FBO based particle engines. The latter might actually be a good enough reason on its own to stick with Three.JS. I have been polishing my Three.JS-FBO skills over the last couple of years, and using FBO allowed us to render, animate and morph millions of particles without losing any FPS, and as I will get to later, particles really excel in VR.

However, Three.JS lacks optimization for garbage collection and requires manual disposing of most of the resources after they are used. Babylon.JS seems to be optimized for garbage collection, a very important aspect when using Javascript.

In terms of workflow, we started off by using Unity as the designer’s editor of choice and then exported to Three.JS using an exporter from the asset store. That turned out to be a mistake. The exporter was buggy, missed features and later became unsupported by the developer, many properties seemed visually different in Unity than in Three.JS and the exported JSON files were bloated and messy. On top of that it also cost $35. A better choice would have been to use the Blender exporter, which we did end up doing for several modifications more toward to the end. Unfortunately, we had to resort multiple times into manually editing the exported JSON files. Three.JS does have its own editor in development, but it is at a very early stage. Babylon.JS has a very comprehensive and impressive looking editor, but until recently, there wasn’t really a way to connect the editor with custom code, thus limiting the editor only to basic prototyping. Now, there seems to be a way to use Typescript and Visual Studio Code alongside the web based editor to get a full featured coding environment. This already raises concerns with some tight coupling to Microsoft produces. What I ended up doing, was to simulate my own editor on top of Three.JS using the dat.gui library. I can’t say it is a very scalable solution, but we managed to achieve a rather efficient and semi-automatic pipeline toward the end. Going from Blender to Three.JS through dat.gui may require a lot of supporting development, but I ended up with a completely corporate free, pure Javascript ES6 semi-large scale development platform.

Performance

It didn’t take long before I realized that the main technical challenge of the project would be the performance. With VR, a drop to even 5 frames below the venerated 90FPS has major implications over the viewer’s experience. I would argue that my main lesson from the battle over 90FPS is actually an organizational one, but here are some of the tips that I have learned:

Video to Texture: When using “DepthKit”, the result of the depth shoot is a webm video file that contains RGBD data, that is both RGB color and depth image. I will not go into detail in this paper, but to get a reasonable quality, we had to hack the DepthKit format and shaders so we can generate a decent looking mesh in real time from the RGBD video. However, it seems that the process of converting an HTML5 <video> texture into a WebGL texture is a heavy duty challenge that has caused lots of performance issues in browsers, especially chrome. We have found that:

The performance between different Chrome versions is not consistent. The Sep29 WebVR Chromium seemed to have the best results.
We have gained a significant improvement by making sure that the video file’s resolution conforms to Power of 2.
VP8 encoding yielded better performance than VP9. There wasn’t any noticeable improvement in varying the compression ratio and video file size.
Starting Windows 10 RS1, hardware accelerated VPx decodin is available and does improve performance.
Manually decreasing the FPS of some videos that do not need to be viewed in full rate has improved performance.
Whenever we could, we paused other videos while the viewer is focusing on one video.

FBO: It’s a no brainer that using GPU shaders anywhere possible would most likely increase performance, but as previously mentioned, using FBO for particle simulations allowed us to generate and animate thousands of particles easily.

Potree: The project makes extensive use of 3d point clouds. The standard THREE.Points/PointsMaterial that are bundled with Three.JS are in no way scalable. Instead, we opted for the Potree library that can efficiently present thousands of points in real time.

Collision Detection: As noted earlier, the use of BoxIntersect for collision between objects was relatively light on performance. I also consolidated the use of RayCasting into one collision manager and tried to avoid when not necessary. With the dat.gui system, I developed an interface for manual adjustment of object box colliders.

VSync off: As a general tip, but seems to work well for VR, it is wise to turn off VSync in the GPU settings to get a performance boost.

Organizational conclusions

Branches

In this project, I took the role of the lead developer. Other than developing the core platform for the experience, I was also in charge of doing integrations between other coders and designers. Code-wise, after some trial and error, I was able to create a standard for Three.JS ES6 objects that the other developers had to conform to. Each module was developed autonomously and integrated smoothly into the platform. However, in retrospect, I was not strict enough regarding the branching policy and ended up doing a lot of debugging to find the causes of performance hits.

For example, we had one branch named design, which our design director used for basically everything, from setting up trees, to the texturing the floor to updating the model of the square. At one sleepless night, close to the opening of the festival I pulled the recent changes from the design branch. To my dismay, the FPS dropped significantly, but I did not know why. I had to first go over the merge to see all of the things that were changed, and then use dat.gui and code changes to repeatedly turn off and revert changes until the performance went back to normal. Instead, what I should have done was lead a strict performance regime in which every feature/change hast its own branch and is not integrated before testing that the performance remains the same. I should add though, that even if I have had that, the simple measure of actual FPS is not always good enough, because sometimes you only see the FPS drop after an accumulation of multiple resources. More accurate statistics are required to assess the performance hit of every feature. The Debug Layer of Babylon.JS looks like an appropriate statistics panel:

VR First

Whether it was lack of resource, lack of knowledge or simply fear of the unknown, we developed the project “Desktop First”. This means that most of the content was developed as a web experience, and only then we went on to test it on the vive. That was obviously a mistake. Just as there is now the “Mobile First” paradigm, there should be a “VR First” paradigm. The logic behind it is simply that it’s much easier to adapt a VR experience to the Web than vice-versa. Here are just some of requirements that a VR experience has, but are not crucial on the desktop:

VR Would need at least 90 FPS while Desktop needs at least 60.
VR cannot display overlay HTML and instead you need to project a plane on the camera, which can work on the Desktop as well.
VR needs realistic scale adjustments while Desktop viewers don’t necessarily notice.
VR cannot perform nauseating camera or world movements while on the desktop it’s possible.
In VR, it’s crucial to have positional audio while on the desktop it’s mostly a benefit.
VR has no mouse or keyboard.
In VR you may not use collision detection for movement and must allow the player to move through objects, on the desktop usually it’s possible to go with either strategy.

Having said that, there are some things that would work well on VR but might not work so well on the desktop. I would still argue that the by taking the “VR First” strategy, the Desktop experience would be far less hindered than the VR experience would be by taking the “Desktop First” approach. I will talk further about VR specific elements in the next section.

VR Storytelling conclusions

Exploration of VR Technology has only just begun, but it is already clear that it has large implications over a whole variety of fields and industries. In the realm of storytelling and documentaries, it is clear that VR has the potential to increase our empathy with the protagonists, our immersion into the story and the story’s affect on us. Of course, with great power comes great responsibility, and the main challenge is getting viewers to experience the content the way you would like them to, while still maintaining the level of freedom and immersion. I will specify several ‘Tips & Trick’ that proved useful in Tzina and may be used successfully in other experiences and documentaries:

Depth shooting for empathy

There are several ways to create a 3D representation of a human being in a virtual world. Perhaps the current most realistic way is to use complex 360 video shooting in a specially crafted studio, and project the result in the virtual or augmented world, as done in Microsoft’s Hololens.

Another way would be to model and rig an animated character and texture it using scans of the real person. The animation could then be recorded using motion capture or tuned manually.

The method that we used is perhaps the most ‘ad hoc’, as it can be used in the open field without a studio. Using the DepthKit software — an SLR camera and a Kinect, the characters were recorded along with their depth data into a combined video file. The video was then edited and sent to rotoscope. Finally in the 3D engine, the depth data was translated into a 3D mesh in real time. This makes great use of the WebVR platform, as the entire character’s data streamed from the web server in real time from webm files. Since we used only one kinect, we could only get 180 degrees of depth. The back side of the character stayed open, a shortcoming which we integrated into the platform by filling the back side with animations. According to the reactions we received from the viewers, even if the characters were very noisy, they were still very human and viewers did not experience the effect of the ‘Uncanny Valley’ and were able to bond with the characters.

However, we did have to work a lot with the Vive until we found the perfect scale for the characters that wouldn’t seem strange. That work usually had to be done in couples, where one person is wearing the HMD and another is on the computer, adjusting the scale. Modern engines might have a better solution for that using the Vive’s controllers as design tools.

Interestingly, we also had depth shooting of dogs but as viewers we were much less sensitive to their scale.

It was the director’s decision that all of the characters would be sitting (On a bench, the fountain’s edge or on the floor), so a person standing would in fact be looking at them from above, not really being able to catch the face of the character. While not all the viewers’ did, every viewer who realized that it’s possible to just sit on the room’s floor and decided to do so, felt great satisfaction and empathy from the situation that was created. It seems that the act of sitting down in order to level yourself with the character created a bonding effect. We still need to find more creative ways to get viewers to sit on the floor. For example, placing some attraction, a moving or shiny object that will arouse the viewer’s curiosity to sit down. Also it would help to set an actual comfortable carpet in the room, because many viewers are simply not comfortable with sitting on the floor.

Confusion regarding the speaking character

Each chapter of Tzina contains a few depth shot moving characters, and also some scanned “extras” presented as still point clouds. We were quick to realize that it’s difficult for the viewers to understand which character is currently speaking. At first, we didn’t even use positional 3D audio, so the viewers were quite baffled. Adding 3D sound did reduce the confusion, but not completely. There is still room for improvement in making the 3D sound more realistic, but it may be required to add some visual indicator for the character that is currently speaking.

Viewer exploration and surprise

A true immersive and interactive experience lets the viewers explore and experience the world by themselves, and not feel like they are being led or dragged. The content would then naturally reveal itself to them, using subtle signals to direct the viewer’s attention. The element of surprise has a great effect on the viewer’s experience. VR users are able to approach objects very closely, look inside the and behind them. In Tzina we left small gestures, such as text inside a building, or a heart inside a point cloud, and also larger gestures such as animated objects hiding on the back side of the speaking characters. It is quite fine that not every viewer would be able to notice those ‘easter eggs’, but the ones who do are greatly rewarded.

360 immersion and particles

It is basic to say that a VR experience should make good use of the 360 view and surround the viewer with elements. However, during the making of Tzina we came to realize that particles and small particle-like elements that are floating around you, had a particularity magical and mesmerizing effect on the viewer. Particles have always been a great element in games and animation, but the effect is amplified in VR.

Limitation of movement

One of the most obvious limitation of VR is that even with the most modern HMD, the HTC Vive, it’s not possible to move beyond a 3 by 3 meters room. If the experience takes place in a space that is larger than that, some alternative solution of movement needs to be implemented. One of the most common and intuitive solutions is teleportation using the controllers. However, using the teleport method takes away from the immersion and breaks the subjectivity of the viewer existing in the space, here and now. To avoid using teleportation, one can use ‘dolly’ style movements such as moving platforms or sitting in a car. In Tzina, we were able to try out the method of moving the content toward the user instead of the other way around. The viewer would first choose to change the current time of day, I.e the current episode by gazing at one of the 5 suns in the sky. Then, the benches, with the new characters sitting on them, would rotate inside the circular square toward the viewer. That way we were able to create a sensation of change and progress, without the viewer having to actually move. We received good feedback from viewers about this method.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — Tzina: Symphony of Longing is currently available for private viewing (with a password) at http://tzina.space

Credits:

Director & Producer: Shirin Anlen // Design Director: Ziv Schneider // Technical Director: Or Fleisher // Lead Developer: Avner Peled // Creative Coder: Laura Juo-Hsin Chen // Script Editor: Udi Ben-Arie