Creating the Microsoft Edge DevTools 3D View

Published in

Web Dev @ Microsoft

15 min readMar 3, 2020

Last month we released 3D View, a tool that allows you to see the content of your web application from a different perspective, find areas of deep DOM nesting, and help with z-index debugging. It was inspired by ‘Tilt’ from Mozilla, and we are so glad it’s back.

Why and how did we build it?

Coming up with a prototype

When I joined the Edge DevTools team last year, we had the opportunity to dedicate 10% of our time to come up with ideas for new features to work on. I was looking back at the things that excited me about web development in the first place, and the Tilt extension by Firefox came to mind. I was in college when I first saw it and was completely blown away. Sadly it was deprecated, so I thought it would be cool to prototype something similar for Chromium and bring it back.

The main idea was to represent the structure of the DOM Tree in 3D. Creating a box for each node and stacking them on top of each other following their parent-children relationships. This would create a city-like landscape where tall columns, like buildings, represent areas of deep nesting. A user could then move the camera around to reveal elements that otherwise were hard to find. I just needed JavaScript and a WebGL framework.

Back in 2018, I attended a generative art meetup where the folks from Babylon.js presented some of their projects, so when I was trying to decide what library to use for my prototype, I went ahead and gave it a try. So far, I had done all my previous side projects with three.js, another amazing WebGL library by Ricardo Cabello.

To my surprise Babylon.js was easy to get started with, and their visualizations looked beautiful. Sadly, I cannot say the same about my first attempts with the 3D View project.

Prototype 1: A browser extension

I was very new to the Chromium codebase and the idea of creating an entirely new tool seemed intimidating. Instead, I took a shortcut and made an extension.

This extension used a content script that was injected into the website and ran a function that accesses the body element and recursively iterates through all its children, getting their computed styles and bounding rect in the process.

The script looked something like this:

Content script for the early browser extension

Later on, I used chrome.runtime.sendMessage to pass this structure from the website to the extension where all the logic to create the 3D Scene existed. The main steps of the scene script were the following:

Create a box for each element, using its corresponding width and height from the computed styles.
Position that box with the x, y values from the bounding rect. The vertical position in the stack was determined by the nesting level of the tree.
Use that same level value to calculate a different shade of red to apply to the material.

The result looked like a bonfire. 🔥

3D View — Browser extension prototype showing a heatmap of facebook.com

The weird orientation of the geometry and the use of color made it look like Lego blocks stacked on top of each other. I showed it to a couple of friends, and it was hard for them to understand it was meant to represent a webpage.

Btw. We published the prototype code on GitHub if you are curious to see how the very first proof of concept was made. 👀

A good solution for this initial confusion is to take an image of the website and apply it as a texture to the elements in the scene. That way, a user can correlate each box to an element on-screen. To do so, I started to play around with the CDP (Chrome DevTools Protocol) call to get a screenshot.

The screen capture model exposes a method called captureScreenshot which returns a base64 encoded string with the data of the image. This is great because we can use that to initialize a texture and apply it to the material. The next step was to show the image on only one side of the box. To do that, you can construct a multi-material. This allows you to define a different material for each side of a box. I showed the texture on the top face and a color on the other five.

Well, it was… something.

3D View — Browser extension prototype testing multi-material with color and textures

These first explorations were good because they allowed me to validate the idea, but at the same time, I started to notice the limits of what it could do. For example, I wanted to link certain actions inside the scene to changes on the DevTools themselves such as a box selection that would trigger an overlay on the webpage or a selection inside the Elements panel.

I realized that I needed to deep dive into the code to make it work.
Time to go back and start again.👨‍💻

Prototype 2: A panel inside DevTools

As it turns out, the tools are easy to extend and creating a new panel wasn’t that bad. I added the required module.json describing the new panel and its scripts, re-used some of the 3D scene logic from the previous extension, and modified the build scripts to include these new files. After a couple of days, we had something working.

3D View prototype — panel inside DevTools

The previous extension prototype could handle simple websites like the front page of facebook.com but had performance issues with large websites like cnn.com or reddit.com. To improve the load time, I needed a better way to get the computed styles of all the elements in one call.

I found out about the CDP call to getSnapshot. It returns a structure with 3 flattened arrays, including the list of nodes, layout properties, and styles. I wrote a parser to traverse the data and create a tree structure that kept only the information needed to calculate the node’s size and position in 3D space.

To create the start animation, I used a simple timeout function to delay the creation of boxes from level to level.

From prototype to production

After spending some time working on a passion project it is easy to get attached to it, but at some point, we need to make the hard choice of letting it go. I think it boils down to the question, “is this a useful feature?”

Tools that add value

It was the middle of 2019, and I had something called “3D DOM View” (because I’m bad with names). The visualization looked pretty good and I liked to spend time just exploring and spinning around the scene; it was fun.

As I mentioned in the crbug issue, I had some doubts. I was mainly worried that just because I found something fun to play with it didn’t mean that it solved a real problem. I was also hesitant to add yet another graph to the DevTools, which some users already found to be too complex.

We started to think about what other problems we could solve using this paradigm. Visualizing the stacking context seem like a good direction.

Debugging z-index has been historically difficult since a lot of developers think of the browser as having a single global stacking context instead of a hierarchy of them. Tracking which context is enclosing your element is complicated. Most people, including me, end up appending new nodes to the document root, assigning them an absolute position and an incredibly high z-index value (like 999999) to ensure that they are shown at the top of everything else.

Having this visualization available would help clear the confusion.

By the way, MDN has a great set of articles explaining how the stacking context works.

3D View — Early demo showing stacking context and z-index values

I was able to refactor the previous DOM visualization class into more general modules that could handle data coming from different sources. Getting the data for stacking context and z-index values was easy because I could modify the computedStyleWhitelist parameter from the getSnapshot method and get those properties.

After convincing myself and other people that this was a tool worth investing in, we needed to make sure that we were contributing high-quality code back to the community and that the tool was easy to use.

3D View — debugging stacking contexts of facebook.com and google.com

Tools with reliable performance

Developers use DevTools in all kinds of devices (desktop, laptops, mobile, all with different specs); we need to be mindful about our use of resources and create an experience that everyone can use.

Getting performance to work right was a team effort with our friends at Babylon.js. I was lucky to work across the street from their offices and that @sebavanjs was always quick to respond to my calls even when he was working on the other side of the world.

We started by running the Devtools Performance tool (it’s a good perk to be able to use our own tools to debug new features). It turned out that each animation frame was causing a lot of computation in CPU and taking ~79ms.

They suggested a couple of things: For starters, to stop using the glow layer to highlight a node and instead to just change the emissive color of the mesh. Freezing materials and reducing world matrices computations by calling freezeWorldMatrix on each mesh. Here is a good article with tips and tricks for optimizing your scene.

These initial changes helped to reduce CPU time and made the animation faster, but we were no closer to reaching 60 FPS. I was also heavily using cloned materials for each mesh which increased our use of memory.

Working with instances

Sebastien called one day to help us look for ways to increase the performance. He told us about the introduction of instances in the latest version of Babylon; in their own words:

Instances are an excellent way to use hardware accelerated rendering to draw a huge number of identical meshes.

Given that the whole scene in the 3D View is basically a bunch of boxes, we could benefit from using instancing. At the end of the day, when Babylon.js is communicating with WebGL, it makes a drawElements() call for every mesh registered in the scene. This means that visualizing a site like reddit.com would make thousands of calls to WebGL per frame. Using instances, we could register only one mesh in the scene and then create an instance of it for each node. In theory, this would translate to making a single call to drawInstances() that would take care of rendering all the boxes. Even though an instance has the same geometry as its root mesh we can still modify its position, scaling and color using custom buffer data.

We started experimenting with a basic scene inside the playground just to make sure that all the use cases were covered. We were excited to find that in just half an hour Sebastien was able to translate almost all our requirements to work with instances. 🧙‍♂️

For the other cases, where meshes use a multi-material with a texture, I found a great example in the forum that uses a shader material to receive unique faceUV coordinates for each box via instance buffers. I modified it to fit our use case, you can see it here.

Testing the improvements

After rewriting the code it was time to compare the two approaches. There is a very handy browser extension called Spector.js that captures all the available information from a frame and shows a list of commands with their associated visual states. I used it to inspect the commands issued by each scene.

The extension, as is, only works for WebGL scenes inside a normal webpage, to debug content inside the DevTools I had to build my own version of the tools that included the source files of Spector.js with some modifications to start capturing commands after the 3D View was opened. The results were impressive.

Left: Without instancing, thousands of draw calls (one per mesh) | Right: Only 3 draw calls

The pictures above show the comparison between the commands needed to render one frame with and without instances.

On the left, no instances; There are thousands of commands to construct a frame as it goes one box at a time calling drawElements().
On the right, using instances. Each frame requires only 3 calls. The first draws all the boxes with a single call to drawInstances() , a second one draws the environment helper box, and a third one for the GUI controls.

With these changes we also decrease our use of materials; since now each box relies on its buffer data to control their color.

Creating benchmarks

To test the performance of 3D View across multiple machines we needed a reliable source of content. We could do this this by asking all the engineers on our team to install the tools and navigate to the same website. The caveat is that, because of personalized content, ads and experiments, a website will usually display different content depending on the user, location and device used to browse the page.

To have a consistent test page across devices I created two very simple websites that populate the same amount of elements in different size, color and position each time.

Overview of the system architecture

How is it all connected?

To make the 3D view extensible, we created a more agnostic renderer that can display different data visualizations, and we introduced the concept of providers. A provider is a combination of UI controls and logic. They know how to get, parse, and render different sets of data.

Selecting the corresponding tab triggers a scene swap. Both scenes (one for DOM and one for Z-Index) reuse the same canvas element from the main view and the same render engine. They are paused when their tab is not selected, meaning they stop listening to events and stop updating to avoid unnecessary work.

Scenes

A scene has the logic that creates different visualizations. Each provider has a 3D Scene, and despite being different, all scenes share some common logic. For example, the code to set up cameras, create empty environments, and reset view controls.

This is the basic structure of the scene used by the DOM provider with an explanation of its most important methods.

Create scene
The purpose is to produce an empty scene with proper dimensions as fast as possible. This will be rendered in the Main view while the getSnapshot function is running. It uses the body’s metrics to create a dummy mesh. This mesh box is important since its size helps the environment helper create a skybox big enough to encapsulate the entire final model. The helper mesh will be deleted as soon as the real content data is received and the initialize scene function is run.

Initialize texture
The function receives a base64 png image produced by the ScreenCaptureModel. It uses that data to create a new BABYLON.Texture and sets its wrap property to clamp.

this._texture.wrapU = BABYLON.Constants.TEXTURE_CLAMP_ADDRESSMODE;
this._texture.wrapV = BABYLON.Constants.TEXTURE_CLAMP_ADDRESSMODE;

Initialize scene
This is where the magic happens. The function receives a map of all the boxes at each level and iterates it to create a box in the scene for each element. Two important calculations happen at this time:

Position & size
An element comes with an x, y position in pixels relative to the screen. It also has a width and a height. The current key of the map tells us the level. We combine all of this to create the coordinates in the scene.
- When positioning the new box we also need to translate the mesh by ½ width and ½ height because BABYLON.js coordinates start at the center of the mesh, not on the top left corner.
UV Coordinates
Face UV coordinates tells the material what portion of the texture needs to be applied to it.
We start with one big screenshot of the web page, the one we got during Initialize texture. We reuse this texture for all the boxes, but we don’t want to show the same whole picture everywhere. Each box should display the corresponding texture of its bounding box. The uv value is the normalized version of the x, y position of the element in respect to the screenshot.

Reacting to events
Mouse actions are defined in the mesh action manager as follows:
· Mouse over swaps to the highlighted material and starts the overlay.
· Mouse out resets the material to its previous value.
· Mouse click sets the highlighted material to the mesh and also communicates the selection to the overlay agent.

When a new node is selected on the Elements tool:
· If the selected node is a box, it sets its material to highlighted material.
· Non-DOM-rendering elements like <style/> or <script/> are filtered out.

Team effort

I can’t emphasize enough how bringing this prototype to life wouldn’t have been possible without all the talented people who work on Edge. Thank you for making our experimentation and feedback systems possible. Also, it wouldn’t have been possible without the opportunity to utilize other OSS tools, and being able to work closely with a team of designers who are always striving build great user experiences.

Always improving

We don’t expect to be perfect right off the bat, we depend on user feedback to prove that our ideas work in the real world. Luckily, having short release cycles and multiple channels [Canary, Dev, Beta] allows us to iterate over the design and improve on our features faster.

Our PM @hiamerica conducted user studies to test for usability. Combining the results of these studies with the feedback received from the community we came up with some new features: on-screen camera controls, better element highlighting, and a simplified UI on the left panel.

The on-screen camera controls allow the user to zoom and pan the camera. This can also be done with the mouse and the keyboard.

Hovering over boxes in the 3D scene now triggers an overlay over the corresponding element on the webpage, it also highlights the correct node on the tree from the Elements panel.

The many options from our UI left the customer confused. We removed the different color options for the z-index visualization and re-arranged the input controls in an order that made more sense.

Areas to keep working on

Here are some of the things on our radar, what would you like to see implemented?

· Updating in real time to DOM structure changes.
· Responding to animations and other CSS updates.
· Make the tool more discoverable.
· Save color preferences in settings.
· Integrate the Layers tool so it uses the same rendering engine.
· Screen texture needs to be full-screen and respond to scaling.

DevTools as a place to learn

Most of the things that I ended up using at work I didn’t learn at school, I learned them by doing.

Back in college, I had a couple of part-time jobs developing websites; I’m not going to lie, I was pretty bad at them. When a client asked me to build something, I’d confidently say “Sure I can do that” then run back to the library and spend hours reading blogs about the subject because I had no idea what I was doing. I became very acquainted with MDN docs, W3Schools, and StackOverflow.

I remember when I discovered the styles panel in Chrome DevTools, suddenly the box model made sense; the way padding, margin, and borders interact with the size of an element was clear. On top of that, being able to manipulate CSS values in real time and see the results on the screen closed the gap between the theory I read on blogs and the implementation I was writing in code.

I think DevTools has an opportunity to bring teachable elements on how the web works to new developers, and we hope this tool can be helpful with some of that.

Get more information

Thank you for reading! Leave us a comment or reach us in twitter @EdgeDevTools / @jose_luisleal

Follow the wizards from @babylonjs!

All the information about this project is public; read more about it here:
📝Explainer 🐞Issue 💻Blog

GitHub / Play with the prototype code
https://github.com/MicrosoftEdge/DevToolsSamples/tree/master/3DView

Spector.js / Debug your WebGL scene
https://github.com/BabylonJS/Spector.js