Introduction

Going back in time when JavaScript was launched in Netscape Navigator 2.0, it brought forward a remarkable idea of being able to access certain parts of the HTML document, programatically.

This idea was what we today refer to as the DOM, though the DOM today is much more sophisticated, complex and larger than the DOM of Navigator.

Anyone — like literally anyone — who is a frontend developer, has to know about the DOM. In fact, most of the time, JavaScript developers are working with the DOM in one way or the other — it's such a common piece of technology.

In this chapter, we'll understand what exactly is the DOM and that what significance does it hold in the JavaScript language in the browser. We'll also see the capabilities of DOM, i.e. what can we do with it. In the end, we'll take a deep dive into the history of DOM as we use it today, starting from the early days of Netscape Navigator and Microsoft Internet Explorer.

In short, this chapter is crucial in order to understand any following ones; hence, try your level best to comprehend every single detail to its core.

What is the DOM?

To start with, DOM stands for Document Object Model. In simple terms,

The Document Object Model is an interface that allows us to program the structure and content of XML and HTML documents, at the very core.

Using DOM in JavaScript, we can very easily and intuitively retrieve or change the content of given elements in the document or modify the document's structure and/or styles, as a whole.

Getting the content of the first <h1> element; removing the last <p> element from <body>; changing the text of <title>; adding a new <script> element in <head>; modifying the styles of <strong>; replacing the second-last <div> with a <span> — all of this is the DOM in action.

But that 'what exactly is the DOM' is a much more complex question.

Let's answer it step-by-step...

If we consider the phrase 'Document Object Model', it's actually quite a good name to explain the underlying concept. It says that the DOM is an object-based model of a document.

To elaborate it further, DOM is a model, i.e. a representation, of a document, XML or HTML, based on objects that can obviously be worked with programatically.

Say we have the following HTML:

<!DOCTYPE html>
<html>
<head>
   <title>Learning the DOM</title>
</head>
<body>
   <p>The DOM (Document Object Model) is amazing!</p>
</body>
</html>

The DOM is what's responsible for converting this textual markup into a system of objects that we can easily work with in a programming language, most commonly JavaScript.

Since this system revolves around objects, it can be rightfully said that the DOM is an interface that's meant for object-oriented languages, such as JavaScript, Python, PHP, Java, etc.

The DOM API defines a whole bunch of classes, that further define properties and methods, along with the relationships between those classes in this whole system of objects. It's these properties and methods that allow us to retrieve information from a document and also allow us to modify it.

Hence, we say the DOM is API — a programming interface — that's nothing more than how a given XML or HTML document can be accessed programatically.

Moving on, the DOM is a W3C standard, amongst others such as HTML, CSS, XML and so on.

Quoting from 'What is the Document Object Model?' section of the last-published W3C DOM standard:

The Document Object Model (DOM) is an application programming interface (API) for valid HTML and well-formed XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated.

Just a few lines later, the specification further adds that:

With the Document Object Model, programmers can build documents, navigate their structure, and add, modify, or delete elements and content.

Emphasizing on it once again, the DOM is simply an interface to represent an HTML/XML document. It's not a programming language, neither a framework, nor a library — just an API.

In addition to this, the DOM is platform- and language-agnostic.

The motivation behind the W3C DOM specification was to create a platform-neutral interface that could easily be implemented in any language — the DOM wasn't meant for just one particular language, such as JavaScript.

As the spec states:

As a W3C specification, one important objective for the Document Object Model is to provide a standard programming interface that can be used in a wide variety of environments and applications. The DOM is designed to be used with any programming language.

Remember that when the first DOM spec was released in 1998, we didn't just have JavaScript on the browser end. We also had VBScript and JScript (in Internet Explorer) and even Java. Drafting DOM for JavaScript only would've clearly been an impractical approach.

This language-agnostic notion of DOM has allowed other platforms to easily implement it. For instance, we can use DOM libraries created for languages such as PHP or Python to work with HTML/XML documents just like we'd work with them in JavaScript.

Isn't this amazing?

One extremely important thing to note in this regard is that the DOM is NOT part of JavaScript. If we look up into the ECMAScript 2023 spec, there is no mention of the 'DOM'.

That's simply because the DOM is an interface which browsers develop and henceforth provide within the runtime environment of their respective JavaScript engine. In other words, the DOM is merely another API amongst the tons of APIs available in JavaScript in the context of a browser.

The core JavaScript language is just what's defined in the ECMAScript spec — nothing more than that.

Capabilities and significance of the DOM

As we'll see throughout the rest of this unit, there is a lot that could be done with an HTML document using the DOM API.

Let's take a quick glimpse of it right now.

In particular, the DOM can be used to:

Select given elements from the document,
Add or remove elements from the document,
Replace a given element with another element,
Change the content of an element,
Add or remove attributes from an element,
Modify the existing attributes of an element,
Change the styles of an element,
Add further styles to an element,

and so on...

This, however, is a really generic overview of the different things that can be done using the DOM.

Taking the help of some specific examples, we can say that the DOM API can be used to:

Load scripts, load stylesheets, load images, and load media programatically using JavaScript.
Expand or collapse dropdown menus.
Play/pause audio or video files.
Automate the addition of elements to the document via a looping construct (such as for).
Sort table columns in ascending order.
Perform form input validation.
Create text editing programs.
Create graphic design software using just JavaScript.
...and the list goes on and on and on.

The JavaScript we know today and use all day long is the one where we can query our HTML documents, select given elements, retrieve their content, change their styles, modify their attributes, add new elements, and so on.

We sparingly use JavaScript without interacting with the DOM in some way. Almost all websites that push the limits of innovation with JavaScript use new technologies of the language along with the power of the DOM to create amazing innovations.

Recall that JavaScript is termed as a language that can add interactivity to web pages. That 'interactivity' notion comes straight from the DOM.

Without the DOM, there would be no interactivity on web pages, no excitement to JavaScript, just about nothing! Yes, really.

With the DOM, we are able to add, modify and/or remove stuff from an HTML/XML document with loads and loads of properties and methods to consider.

In short, JavaScript is known with its DOM API.

Core DOM vs. HTML DOM

The W3C DOM specification, first released in 1998 as Level 1 Document Object Model Specification, was divided into three broad categories, one of which was called the Core DOM.

The Core DOM was that part of the specification that every single DOM implementation back then had to abide by. It defined all the core features of a DOM implementation, common to both HTML and XML documents.

For instance, if we wanted to create a DOM implementation in PHP, then we had to follow the whole ecosystem as specified in the Core DOM.

The Core DOM specified the functionality to get and change the structure and content of HTML and XML documents. There wasn't any notion about configuring the styles of given elements on a web page, quite reasonably, since it was more or less an HTML-specific feature.

In the browser environment however, we typically mostly dealt, and still deal, with HTML documents and the Core DOM didn't specifically target HTML, or even XML.

For HTML, and even XML, there existed separate documents.

The one covering HTML-specific DOM features was referred to as the HTML DOM. The HTML DOM described interfaces and their features that solely applied to HTML documents on the web.

However, as the job of standardizing and drafting specifications for the DOM shifted more so to WHATWG (Web Hypertext Application Technology Working Group), a community of people devoted to standardizing various web technologies, the scenario changed.

The three ramifications of the W3C DOM specification — namely the Core DOM, the HTML DOM and the XML DOM — were amalgamated more or less into one single document, with some aspects left for other standards to address.

If we head over to the latest WHATWG DOM standard right now, we see that the Core DOM functionalities (from the W3C spec) in addition to HTML-specific and XML-specific functionalities are all under one roof. Now, there is no Core DOM or HTML DOM section in the standard.

The WHATWG DOM standard document is not peaches and cream for anyone wanting to implement DOM in a particular programming language. It's highly technical and precisely stated, likewise every single detail matters in the final implementation.

Anyways, the takeaway of this discussion is that the standalone term 'DOM' should refer to generic DOM functionalities, i.e. those that apply to both HTML and XML documents, whereas the term 'HTML DOM' should refer to the DOM along with HTML-specific features.

Even though the latest standard has all of these separate parts as one single unit (due to some reasons), we must at least be aware of the fact that there are distinctions between them.

As per the name of this unit, i.e 'HTML DOM', we'll be dealing with HTML documents and the DOM API specifically meant for them.

History of the DOM

So where did the DOM begin?

In 1995, JavaScript was publicly announced as part of the Netscape Navigator 2.0 browser to be launched later on. The browser was officially released on March 1996 and with it people got access to JavaScript for the very first time.

Along with this nascent scripting language, Netscape also introduced a set of features in their browsers that allowed JavaScript programs to access certain elements of the HTML document, specifically form, inputs, images and links, as objects and then work with them.

This set of features has today grown into what we know as the DOM.

Back then, the DOM had only limited capabilities. For instance, we could retrieve the value entered in an input field, perform elementary validation routines on it, maybe even write stuff back to an input element, and so on. There was nothing much we could do apart from working around <form> and <input> elements, in addition to a couple others.

When Internet Explorer came into the market, it had to implement this Netscape DOM exactly how it was in Netscape. It would've been impractical to not do so, otherwise every web page designed for Navigator and opened up in Internet Explorer would've ended up in a plethora of errors sayin that there is nothing such as the DOM!

This early implementation of the DOM on Netscape and even on IE eventually became known as the Legacy DOM, or the DOM Level 0. Legacy DOM was never formally defined as a standalone standard specification.

However, with the standardization of CSS in 1996, the rise of JavaScript, and the need for more cool features in this scripting language, DHTML became the talk of the town. Developers were talking about taking the interactivity of web pages to next levels, close to the interactivity and capability of native desktop applications.

DHTML, back then, was merely a means of referring to the modern ways to create interactive web pages powered by JavaScript and the DOM API.

In this regard, Netscape and Microsoft took their own proprietary approaches. Netscape created an idea of layers that could be graphically moved across the HTML document, and also a proposal of a JavaScript-centric interface to work with the styles of given elements, called JSS (JavaScript Style Sheets).

As for Microsoft, it didn't complicate things much. It put forward a very simple, yet powerful idea — the DOM should allow programmers to have access to every single element on an HTML page, not just form, images or links. Moreover, the styling would be done to elements from JavaScript as per the CSS standard and using a couple of simple interfaces.

Overall, Microsoft's idea was very practical and soon people began using IE more and more, and even more. Many thing could be done in IE that couldn't be done in Netscape, or at least not that easily.

The DOM architecture of IE was extremely ambitious, yet based on a very simple idea — provide access to literally everything.

During this time, the DOM features created by IE and Netscape eventually became known as the Intermediate DOM.

Following the rising inconsistencies and incompatibilities of the DOM features between Navigator and IE, the standardization body W3C (World Wide Web Consortium) began working on drafting a standard DOM specification that both the vendors could mutually agree on and thereby implement in their respective browsers to achieve consistency and compatibility.

The model they chose for the DOM API was almost the same as that of Microsoft, i.e. the DOM should provide access to literally every single element in the HTML document.

As Scott Isaacs, one of the architects of DHTML on IE, said in this interview,

... Microsoft DHTML is basically functionally equivalent to this [W3C] model. There are just some property naming differences that can be easily resolved.

This is a testimony to the fact that the DOM ideology of IE was far more practical and powerful than that of Netscape.

In 1998, almost a year after the release of Netscape Navigator 4.0 and IE 4.0, W3C published the first ever formal specification of the DOM. This became known as the DOM Level 1. It included separate documents for the Core DOM and for the HTML DOM.

Soon both Netscape and Microsoft began implementing this specification in their respective browsers.

From here onwards, two further major specifications, related to the Core DOM, were published by W3C:

DOM Level 2 Core Specification, published in November 2000.
DOM Level 3 Core Specification, published in April 2004.

Then later on, WHATWG (Web Hypertext Application Technology Working Group) took care of drafting and publishing DOM specifications.

To date, WHATWG maintains a live DOM standard that's updated regularly to keep everything fresh and sound.

HTML DOM - Introduction