Introduction

In the previous HTML DOM — Introduction chapter, we got a highly-detailed introduction to the Document Object Model in JavaScript, including ideas such as its capabilities, its significance, and most interesting its history since the year 1995.

Now in this chapter, given that we know what exactly is the DOM, we'll start to explore how to work with it. We'll cover things such as the DOM tree, which is logical representation of the relationship between all the elements in the document; the most common interfaces in the DOM API; the document method; the getElementById() element-selection method; and much more.

By the end of this chapter, we'll have gained a solid ground on how the DOM abstracts a document into a system of objects where we could easily access any given element, and then later on, even configure its characteristics.

The DOM tree

Recall from the last HTML DOM — Introduction chapter that the DOM is simply an object-based model of an HTML document.

Typically, this model is depicted graphically in the form of a tree, following from the hierarchical nature of HTML. This tree representation is best understood with the help of an example.

Consider the following HTML document:

<!DOCTYPE html>
<html>
<head>
   <meta charset="utf-8">
   <title>Learning the DOM</title>
</head>
<body>
   <!--A simple comment-->
   <h1>Learning the <b>DOM</b></h1>
   <p>The DOM is simple.</p>
</body>
</html>

It is merely a textual piece of code, that's it.

If we now convert this into a DOM tree, we'd get the following:

A simple DOM tree representation.
A simple DOM tree representation.

It's now time to take note of some important terms, as detailed below.

Tree terminology

Each of the rectangles drawn here are known as nodes, or vertices (from graph terminology). The topmost node is where the tree begins and is called the root node. In the case above, the root is the whole HTML document.

The nodes on the next level with edges connecting them with the root node are children of the root node. That's simply because they all are the first things within the document. Similarly, the document node is called the parent of the <html> node.

Nodes with the same parent are siblings. Hence, <html> and the doctype declaration nodes are both siblings of each other. Simple.

Moving on, the children of <html> are <head> and <body>. The <body> node has three further children nodes <h1>, <p> and <div>. The <p> node has one single child.

See how the tree data structure helps us visualize the nesting of the HTML document rightaway. That's the benefit of using a tree — it goes really well with the idea of elements nested inside other elements in HTML.

It's worth noting however that the tree shown in the figure above is a very simple one, with many abstractions. What browsers typically create is much more detailed and complex than this one.

Let's see that complicated version as well.

Below shown is the detailed DOM tree of the HTML code above:

A detailed DOM tree representation.
A detailed DOM tree representation.

At first glance, there are two things to note:

  1. The size of the tree is much larger than before.
  2. Nodes don't just represent elements; they represent text and comments as well, in addition to the whole document and the doctype declaration.

Now the first question that comes to the mind is that where do all these text nodes come from? Well, it's simple. Do you notice the whitespace between every pair of tags in the HTML source code above?

That whitespace gets converted into text nodes of the DOM tree by the browser engine. The literal textual content within the elements <h1>, <p> and <b> is also represented as individual text nodes.

The comment <!--A simple comment-->, in <body>, is converted into a separate comment node.

The tree shown here is surely complex, but still based on simple elementary ideas. The same notion of parents, children and siblings obviously applies here as well.

For instance, the text node 'DOM' is the child of <b>. Similarly, <title> is the parent of the text node 'Learning the DOM'. The comment node is a sibling of the <h1> element, which is a sibling of the <p> element. The <html> has a total of 3 children. And so on and so forth.

Moving on, if we're on any element or node in the DOM tree shown above, we can navigate our way to any other node by means of tree traversal.

Tree traversal simply means going up/down and/or left/right on the tree through the edges between adjacent nodes.

Said another way, tree traversal is simply to 'move' across a tree by means of navigating across the links between nodes. This kind of activity is routinely done in JavaScript programs.

We'll explore tree traversal in the next HTML DOM — Selecting Elements chapter, in detail.

To summarize it, the DOM tree is an essential concept for every JavaScript developer to understand.

Not only does it help us to appreciate the relationship between different elements on an HTML page, but also helps us in getting familiar with the very basic operations of one of the most useful data structures in computer science, i.e. trees.

Important HTML DOM classes

By this point, we are all on the same page regarding one idea — that the DOM is an API to programatically work with the structure and content of HTML/XML documents. And if we specifically consider the HTML DOM, then it's also about working with the style of the documents.

This doesn't however mean that the DOM is a standalone class that exposes various properties and methods on its own. Rather, it's a mixture of various classes that all together define a system of interaction with a given document.

In this section, we aim to discuss all those classes briefly, including the ones meant specifically for the HTML DOM.

Starting with the most obvious thing, each node in the DOM tree shown above is represented as Node.

Furthermore, each element node is represent by Element.

As for the text and comment nodes, they are represented as Text and Comment, respectively. Both Text and Comment extend the CharacterData class which denotes textual information in a document.

Attributes have their own Attr class, however it's not used much.

Talking about the main document:

  • Document represents the whole document — the root node of the DOM tree.
  • HTMLDocument extends Document and simply represents an HTML document.

Anything that has to be done with the DOM has to essentially originate from a Document instance. In the context of a browser, this instance is the global document object; more on that later in this chapter.

There is even a separate class to represent a lightweight version of a document — the DocumentFragment class.

The <!DOCTYPE html> declaration in an HTML document and similar declarations in XML documents are represented by the DocumentType class.

Coming to HTML documents, HTML elements are represented by the HTMLElement class. Shown below is a brief overview of some of the HTML-specific classes in the DOM API:

ClassPurpose
HTMLElementRepresents an HTML element node.
HTMLHtmlElementRepresents the <html> element node.
HTMLHeadElementRepresents the <head> element node.
HTMLBodyElementRepresents the <body> element node.
HTMLDivElementRepresents a <div> element node.
HTMLSpanElementRepresents a <span> element node.
HTMLParagraphElementRepresents a <p> element node.
HTMLFormElementRepresents a <form> element node.
HTMLInputElementRepresents an <input> element node.
HTMLAudioElementRepresents an <audio> element node.
HTMLVideoElementRepresents a <video> element node.
......

See how there is a class for almost every individual tag in HTML. And even notice the naming — each class name is preceded with 'HTML' to rightaway indicate that the class is meant for HTML documents only.

Moving on, when we'll see how to select a list of given elements in a document, we'll come across the NodeList and HTMLCollection classes.

They both create array-like instances, enabling bracket notation access for each of the contained items. However, they aren't actually arrays (i.e. inheriting from the Array class) and hence don't have access to array methods such as indexOf(), slice(), etc.

To obtain array method access on such an array-like list, we can either perform conversion of the list to an array or invoke the array methods using the call() or apply() function methods. We'll explore all these details in the next chapter.

Class hierarchy

After learning about the various interfaces involved in the DOM API, the next logical step is to get familiar with their relationship with one another, i.e. the complete hierarchy of the classes.

This helps us understand the DOM API much better, and at the same time also appreciate the level of abstractions possible in object-oriented programming. Each class inherits from another class and in this way creates a nice abstraction of the underylying document without repeating much information.

Anyways, so let's get to the real discussion.

Every node on a DOM tree is a Node instance, which inherits from the EventTarget class.

All elements, represented by Element, are nodes as well, likewise the Element class inherits from Node. Text and comment nodes, represented by Text and Comment, respectively, are both sequences of text that inherit from the super class CharacterData, which in turn inherits from Node.

The main Document class is also a Node. The same goes for the DocumentType class — it also inherits from Node.

Now let's turn our attention to HTML documents.

The main document is represented by the HTMLDocument class which extends Document.

For every element in the document, we have the HTMLElement class which extends Element. As we saw above, there is a separate class for every single element on an HTML document, and each of these classes extends HTMLElement.

For example, <head> is represented by HTMLHeadElement, which extends HTMLElement, which extends Element.

If we were to graphically show all these relationships in one single diagram, we'd get the following.

Hierarchy of classes in the DOM.
Hierarchy of classes in the DOM.

The question is: are these class hierarchies important to know?

Well, clearly this extra knowledge won't magically turn anyone into an expert programmer; however, the hierarchies would definitely help one in better understanding how object-oriented programs work and ideally how should they be designed.

The beautiful architecture of the DOM is a testimony to the powerful, flexible and simple nature of object-oriented designing.

The DOMString interface

There is one peculiar interface mentioned in the DOM standard that we don't actually get to see in JavaScript and that confuses developers about the real string class in the language. That is DOMString.

In this section, we aim to address many of the questions and confusions related to DOMString in JavaScript.

First of all, DOMString is an interface meant to represent strings in the DOM API.

It is defined in the DOM standard in the following way:

The DOMString type corresponds to the set of all possible sequences of code units [16 bit unsigned integer code units]. Such sequences are commonly interpreted as UTF-16 encoded strings...

Now you might be thinking that what's the purpose of a separate type in the DOM standard for strings? After all, JavaScript has a string type itself as well.

Well, recall that DOM doesn't just get implemented in JavaScript. Back in the day, there were multiple languages implementing it just on the web, namely VBScript, JScript and Java. And also recall that the goal of the DOM standard is to create a platform- and language-agnostic interface.

With DOMString, it achieves an interoperable string class to use across different platforms and languages.

For instance, if we want to implement a DOM library in Python, we can't use its native string type, which implements the UTF-8 encoding scheme. Instead, we'll first have to create a separate DOMString class in Python with mappings to its native string type and then use this class throughout the DOM API.

This will make sure that the API works exactly the same in Python as it does in JavaScript in the browser.

Said another way, DOMString helps implementers of the DOM standard to achieve a consistent string interface across various platforms and languages.

If there was no DOMString class, each language implementing the DOM API would've had its own string type being used in various properties and methods of the API, leading to various inconsistencies in the final implementation, and likewise possible bugs.

Having said that, now we might ask that what's the difference between DOMString and String in JavaScript? Well, this deserves a separate snippet.

Difference between DOMString and String in JavaScript

As stated in the DOM spec, DOMString is a sequence of 16-bit code unit, representing a UTF-16 string. If you recall from the JavaScript String — Unicode chapter, a JavaScript string, represented by the String class, is also the same thing.

There is just no need to implement a separate DOMString type in JavaScript for strings in the DOM API — the String class is functionally equivalent to DOMString.

And so what this means is that there is almost no difference between DOMString and StringDOMString maps directly to String.

This is also the reason why there is literally no such class as DOMString in JavaScript — simply because there's no need for it.

For e.g. if we go to the console and type in DOMString, we get the following error:

DOMString
Uncaught ReferenceError: DOMString is not defined at <anonymous>:1:1

This confirms that nothing such as DOMString exists in the global context in JavaScript.

The document object

It won't be wrong to say that the most used object in JavaScript programs working with the DOM is the document object. It's the cornerstone of the DOM API — without it, nothing is possible.

The document object exists in the global context as a property of window, which means that we could either access it as window.document or simply as document.

For an HTML document, which is typically the case, it's an HTMLDocument instance inheriting from Document, or else it is simply a Document instance.

It's the root node in the DOM tree of every single document — everything else in the tree originates from it.

Many classes in the DOM API are meaningless without an associated document. Consequently, their instances are returned by factory methods on the document object, not by directly instantiating objects out of those classes.

For instance, creating a new element node for an HTML document is meaningless if we don't have a document — likewise, instead of allowing something like new HTMLElement(), the DOM API asks us to write document.createElement().

Throughout the rest of this course, almost every single code snippet that we'll see would have the document object being used. It's that important.

A quick example

We'll end this chapter with a small and simple example of how to work with the DOM API.

The idea is that we are given an HTML document and have to change the content of the <h1> tag to 'Hello, from JavaScript!'.

Here's the document:

<!DOCTYPE html>
<html>
<head>
   <meta charset="utf-8">
   <title>Working with the DOM</title>
</head>
<body>
   <h1 id="heading">Learning the DOM</h1>
</body>
</html>

Notice the id attribute given to <h1> here. This will be used later on when selecting the element from JavaScript.

Now let's come to the main part, i.e. the JavaScript.

The getElementById() method of the document object takes a string as argument representing an ID in the HTML document and then selects the first element with that very ID. It returns back an HTMLElement instance that has a whole lot of properties and methods available.

One of them is innerHTML. It represents all the HTML content that's inside that element, in the form of a string.

Technically, innerHTML is an accessor property with a getter and setter function.

  • When accessed in a 'get' context, it returns a string containing all the content within the given element, including HTML tags.
  • When accessed in a 'set' context, it parses the given string as HTML source code, and once parsed, puts the content inside the element in the DOM tree.

In our case, since we need to change the content of <h1>, we'll first need to access it based on its id attribute and then set it innerHTML to 'Hello, from JavaScript!'.

This is accomplished below:

// First, select the element.
var h1Element = document.getElementById('heading');

// Next, change its content.
h1Element.innerHTML = 'Hello, from JavaScript!';

One very important thing to take care of while working with the DOM is the placement of the script in the HTML document.

Ideally, we should place scripts accessing the DOM at the end of <body>, unless they are deferred manually (using the deferred attribute) or access the DOM only after the page, or just the DOM tree, completely loads.

Here's the previous HTML document with the script added:

<!DOCTYPE html>
<html>
<head>
   <meta charset="utf-8">
   <title>Working with the DOM</title>
</head>
<body>
   <h1 id="heading">Learning the DOM</h1>

   <script>
      // First, select the element.
      var h1Element = document.getElementById('heading');

      // Next, change its content.
      h1Element.innerHTML = 'Hello, from JavaScript!';
   </script>
</body>
</html>

If we load up this HTML page, the script gets to perform its action, just as expected.

Live Example

If we place the script in the <head> element or, in general, before the <h1> element, we'd get an error, at least in this case:

<!DOCTYPE html>
<html>
<head>
   <meta charset="utf-8">
   <title>Working with the DOM</title>

   <script>
      // First, select the element.
      var h1Element = document.getElementById('heading');

      // Next, change its content.
      h1Element.innerHTML = 'Hello, from JavaScript!';
   </script>
</head>
<body>
   <h1 id="heading">Learning the DOM</h1>
</body>
</html>
Uncaught TypeError: Cannot set properties of null (setting 'innerHTML') at dom-simple-example-2:11:24

Live Example

Now the question is: what's the reason for this error?

Well, it depends on the way browsers parse HTML, execute JavaScript, and build the corresponding DOM tree. The discussion below elaborates everything in detail.

How browsers parse HTML and execute the scripts therein?

As soon as the browser completely receives an HTML document, it starts to read and parse it, i.e. understand the elements used, their nesting, what attributes do they have, and that whether there are any illegal expressions in the source code. This is the first stage.

Once the parsing is done, the browser then starts to build a DOM tree using the parsed document. This is the second stage.

The way this DOM tree is built is also pretty simple. The parsed version of the HTML document is read, starting with the very first thing, i.e. the document itself, and every single thing therein is added to the DOM tree turn-by-turn in the same order in which it exists in the document's source code.

For instance, first the document's node (which is the root node) is added to the DOM tree, followed by the addition of the <!DOCTYPE html> declaration node, followed by the <html> node, followed by the <head> node, followed by everything inside <head>, followed by the <body> node, and so on and so forth.

Now if we add JavaScript into the equation, things get a little bit interesting.

During this DOM-tree-building stage, if a <script> is encountered by the browser, it pauses the construction of the DOM tree right at that point to entertain the script. The JavaScript engine is switched on to address the script.

The JavaScript engine reads the script, understands what's happening in there, and then finally executes it. While executing any script, the DOM tree is always available to the script by means of the document object. However, this DOM tree is only limited to whatever has been built upto the point of the script's execution.

For instance, if the <script> appears inside <head> immediately after <title> which is the first element in <head>, the DOM tree built upto that point would simply contain the document node, followed by the <!DOCTYPE html> declaration node, followed by the <html> node, followed by the <head> node, followed by the <title> node, and finally followed by the current <script> node.

There might be thousands of things following the current <script> element in the HTML source code, but they won't be included in the DOM tree while the browser is executing the script.

Once the script completes, the browser resumes construction of the DOM tree right from where it left before.

But what's the point of this pause behavior while building the DOM tree?

Well, if we think about this for a while, it won't seem that complicated to answer. When a <script> is executed, it might go on and modify the DOM tree available right at that point.

The benefit of pausing the construction of the tree while the script executes is that any changes made by the script to the DOM tree would eventually be there once the tree resumes its construction.

If there was no pause behavior when encountering a <script> element, the whole DOM tree would've been available to every single <script> on the document. The problem with this approach would be that if a script changes the DOM tree, those changes won't be observed by any subsequent elements.

Wasn't this simple?

In the next HTML DOM — Selecting Elements chapter, we'll learn about more than a handful of ways to select a particular element or a collection of elements from an HTML document and then process it in a number of ways.