Rending a web page: Construction of the DOM, the CSSOM, and the Render Tree

Brooke Yalof
5 min readJan 24, 2019

--

Understanding the request-response cycle is one thing, but if you are interested in optimizing the speed at which your website is displayed, it is important to take a look at what is happening under the hood when your browser receives an HTML file.

Constructing the DOM

Process of parsing an HTML file

When a browser receives the HTML markup that contains the content of a web page, it computes these bytes into a Document Object Model. The Document Object Model, DOM, is a tree structure that specifies how all of the elements on a page relate to one another.

The construction of the DOM takes place in several stages. When the browser makes an initial request to a URL, it reads raw bytes of HTML off of a network and converts these bytes to characters. Then, based on the W3C HTML Standard, it is able to parse this string of characters into meaningful tokens (for example html , body and div). Each of these tokens has been ascribed a set of rules, which the browser is able to make sense of when parsing the HTML.

After the bytes have been tokenized, they are lexed (read: they are converted into nodes). These nodes or objects define their properties and rules. And, with this information, the DOM is able to be constructed.

Because the HTML markup defines relationships between the different tags, the created objects are linked in a tree structure, where the document is the root node, with many child nodes.

The DOM captures the tree structure and content of the web page; however, it does not provide any information about how the page should be rendered, that’s where the CSS Object Model, CSSOM, comes in.

Constructing the CSSOM

In constructing the DOM, the browser reads the head tag before the contents of the body tag. Inside the head tag, there’s usually a link to a stylesheet, which will define rules about how to render the page.

By default, the CSSOM is render-blocking. This means that the browser will request the CSS resource and process it before it renders in your browser. This is important: Imagine if CSS weren’t render blocking, then nodes would be constructed and painted before the browser had time to process the request for the layout resources, and all of our webpages would like sort of like this:

This is called a Flash of Unstyled Content. If the CSSOM were not render-blocking, all webpages would look something like this until the CSSOM finished constructing in the background.

The process of creating a CSSOM is very similar to creating a DOM, in that the content must be parsed from bytes, to characters, to tokens, then nodes and finally we arrive at a CSSOM. The tree structure, constructed here, helps determine exactly which styles will be applied to each node — the browser starts with the most general rule and then recursively processes the computed styles by traversing the nodes of the CSSOM.

Bringing it all Together: The Render Tree

The render tree is a combination of the DOM and CSSOM, and it provides information about all of the visible nodes that appear on a web page and how to paint the pixels onto the site.

How, exactly, does the render tree get created?

It starts at the root of the DOM tree and traverses through each of the visible nodes. Visible is the key word here, as elements that have a display property set to none, or elements that appear inside of the head will not be part of this initial render tree structure.

Then, for each of these visible nodes, the render tree looks up the applicable style rules and applies them.

Finally, the visible nodes are rendered to the page.

The Box Model

There are many types of CSS rules that will be a part of the CSSOM. One such type that is central to the browser’s interpretation of the render tree is positioning. The positioning of elements on a page has to do with the Box Model, which captures the exact position of where each element should appear on the page.

Every node on the DOM has its own box. The inner-most area inside the box is its content. On top of that, you can apply padding, then a border then a margin. The padding determines the space between an element’s content and its border, whereas the margin determines the space between an element’s border and the next-closest element.

Once this positioning has taken effect, the browser then paints this layout onto a web page, which is what you see when you hit a URL!

Optimization

How long it takes to construct the render tree depends on the size of the HTML file that needs to be rendered, as well as the applied styles and the device that the site is being loaded on. The larger the document, the more complicated the styles, the longer it’ll take to render to the screen.

In order to optimize the load-time of a web page, we need to minimize what’s called the critical path. The critical path is the combination of all of the critical resources, these are the resources that block a web page from loading to the DOM.

  1. Minimize the number of critical resources, or mark them as async.
  2. Optimize the number of critical bytes, and the number of times that the screen needs to repainted (on scroll, for example).
  3. Download all other resources as quickly as possible, in order to shorten the critical path length.

There are plenty of ways to optimize critical bytes, from being selective about what CSS styles you apply (box-shadow can be pretty expensive) to correctly placing your script tags — and I’ll dive deeper into this in my next post :)

Sources:

Google’s Web Fundamentals by Ilya Grigorik

--

--