The Science Behind the DOM

How Web Browsers render HTML

Edwin Walela
Webtips
5 min readJun 9, 2020

--

Photo by Jeremy Bishop from Pexels

As web developers, we work with browsers day in day out, constantly hitting refresh to see the result of the changes we’ve done to our stylesheets or HTML files. Rarely do we pause to ask ourselves:

How is the browser transforming my HTML tags and creating such a beautiful UI?”

The year 1995 saw the introduction of Javascript and JScript by Netscape’s Netscape Navigator and Microsoft’s Internet Explorer respectively. The two browsers had inbuilt Javascript Engines which could interpret scripting languages into machine code. Javascript and JScript provided developers the ability to access and manipulate webpages — the birth of interactive websites.

A few years later, the standardization of scripting languages (ECMAScript), brought about a specification that defined a uniform method of access and manipulation of both HTML and XML webpages — regardless of the language being used.

Interactivity meant that the webpage had to be dynamic. User actions could change the appearance of elements on the screen. Clicking a button could display an alert message and form input could be validated before submission.

To achieve this, browsers needed an efficient way of accessing elements on a webpage.

Inheritance vs Interfaces

To better understand the science behind the DOM and its construction let’s remind ourselves what inheritance and interfaces are.

An interface is a data type that specifies functions and attributes a class/object which implements it should have.

Source: W3Schools

When a class ‘implements’ an interface, the class ought to define the functions specified in the interface. However, when a class ‘extends’ another class, the methods and attributes of the inherited class (parent class) are included in the subclass (child class).

Inheritance in Java

Back to the browser.

Once a HTML file is sent back from a server, the browser reads the file and creates Javascript objects to represent each HTML element.

HTML elements represented as Javascript Objects

So what is DOM?

The Document Object Model is the representation of the objects that describe the structure and content of an HTML document.

Put simply — an object-oriented representation of a webpage.

The DOM defines interfaces for the objects which make up the HTML elements. DOM is not a language but a specification that describes how the browser should create objects to represent HTML elements and the functions that should be made available to developers to manipulate these elements.

It’s through the DOM API that we are provided with methods to add, remove and change the appearance of elements on a webpage. Since the elements are objects, we can call various methods on them which allow us to manipulate them.

Javascript’s DOM API Implementation

The interfaces define functions which we usually use to access and manipulate webpages. For instance: the function addEventListener is defined in the Element interface, the style attribute used for inline CSS is defined in the HTMLElement interface and the method attribute we use to state a form’s HTTP method is defined in the HTMLFormElement interface.

Interfaces defined by the DOM API

Let’s look at the structure of an object created by a browser to represent this it on a webpage:

The object created to represent a form element will implement the HTMLFormElement interface, which implements the HTMLElement interface, which implements the Element interface, which implements the Node interface. The Node interface is the key base-interface which most elements implement. Hence all elements are considered to be Nodes.

HTML Form element’s interfaces

That’s a lot of inheritance.

Inheritance enforces uniformity of the methods and attributes elements posses. For instance the id and class properties of HTML elements are defined in the Element interface. That's why all elements have the id and class property since they implement this interface.

The functions we call to select and manipulate elements on a webpage — getElementById, addEventListener and many more — are defined by the interfaces provided by the DOM. And these functions are available to the elements through the implementation of the above-mentioned DOM interfaces.

So when a HTML file sent from the server, using its inbuilt implementation of the DOM API (commonly written in Javascript), the browser creates Nodes for each element in the HTML document.

Raw HTML Document

The heading element (h1) will be represented by creating an object which implements the HTMLHeadingElement interface. The form element will be represented by creating an object which implements the HTMLFormElement interface. The Input elements will be represented by creating 2 separate objects, both implementing the HTMLInputElement interface and so on.

The browser then needs to replicate the nested structure of the HTML document using the Nodes created.

It achieves this by creating a tree-structure called the DOM Tree.

The DOM Tree

The entire DOM Tree is saved as the document object.

By creating the DOM Tree, the browser now has an efficient method for selecting, adding, removing and changing the contents of elements on a webpage.

Adding an element to the DOM Tree

The introduction of the DOM redefined how websites are built paving way for the development of interactive websites. Most browsers implement the DOM API using Javascript hence why it might seem that the DOM API is written in Javascript.

With Chrome, Mozilla and other browsers implementing the DOM using Javascript, we can have the following approximation equation:

API ≈ DOM + Javascript

The API is the functions we call to manipulate the webpage, the DOM is the interface that specifies the functions and attributes that should be made available to the developers and Javascript is the language that the browser uses to manipulate the webpage.

However, the API is language agnostic. It only defines the functions that the objects created by a browser should implement. Be it HTML or XML.

Source: MDN Web Docs

Conclusion

The DOM Tree construction is only the first step a browser takes when loading a webpage. CSS has to be loaded and a tree constructed for it. A Render Tree is also constructed. Then lastly the browser starts painting the elements on the screen.

So many trees.

When working with Front-End frameworks and libraries, most of these functionalities are abstracted. For instance, React has its own DOM, the virtual DOM, on top of the browser’s DOM, providing a more efficient and simpler approach to manipulation of webpages.

You tell React what state you want the UI to be in, and it makes sure the DOM matches that state. This abstracts out the attribute manipulation, event handling, and manual DOM updating that you would otherwise have to use to build your app. — React Docs

Want to learn more about the browser rendering cycle?

Thanks for taking the time to read this. I appreciate it.

--

--

Edwin Walela
Webtips

Writing is a way of building relationships. Just because they are invisible doesn’t mean they are not there. | Web development | Cryptography | Everything Tech.