Course: JavaScript

Progress (0%)

HTML DOM - Documents

Chapter 51 41 mins

Learning outcomes:

  1. The Document interface
  2. Properties for quick access of certain nodes
  3. Working with the document's title
  4. Properties mirroring the values of certain HTTP headers
  5. Document stream methods — write(), open() and close()

Introduction

Uptil this point in this unit, we've learnt a great deal of information regarding the DOM API, i.e. its purpose; its history, some of its vital concepts such as those of nodes, elements, attributes, document fragments; how to query elements; and much more.

But still, we are short of one equally important and core concept of the DOM API — one without which we couldn't even use the API. That concept is the Document interface. The entry point into the DOM API has been the Document interface and will obviously remain so.

It won't be wrong to say that it's the most important interface for any developer, who is interested in working with the DOM, to know. In this chapter, we'll explore the Document interface thoroughly, consider the document object which is an instance of this interface and available on window, see a huge array of its properties and methods, and much more.

The Document interface

Do you recall the document object that we've been using extensively throughout this whole HTML DOM unit? Well, it's based on the Document interface.

As stated before, the Document interface is the entry point into the DOM API. Whether you need to select an element, create a new node, change an attribute's value, remove a given node, everything has to essentially go through the Document interface.

The DOM API could be thought of as a big room and the Document interface as the door to enter into that room. Whenever you need to go in the room, you need the door. At the same time, also note that the door is also a part of the room, i.e. the Document interface is also part of the DOM API.

Analogies are useful often times.

Now, let's come to the real point — what exactly is the Document interface, apart from being an entry point into the DOM.

As the name suggests:

The Document interface represents an HTML/XML document.

It's the model for an HTML/XML document. Likewise, to operate in any way with an HTML/XML document, we ought to use its corresponding Document instance. No question in that.

The Document interface defines properties and methods to work with HTML/XML documents. Based on the type of the current document under inspection, certain properties and methods don't work.

For instance, the Document interface contains the method write(), which we'll explore in detail below. When the current document is an HTML document, write() allows us to write some content to the document. However, when it's an XML document, calling write() throws an error.

Anyways, shown below are some of the properties on Document, a fraction of which we'll explore in this chapter (shown in bold typeface):

PropertyPurpose
childrenAn HTMLCollection containing all the children elements of the document.
childElementCountThe length of children.
URLThe fully-qualified URL of the document.
doctypeThe document type node of the document.
documentElementThe root element of the document. For HTML documents, this is the <html> element.
headThe <head> element of the HTML document.
bodyThe <body> element of the HTML document.
imagesAn HTMLCollection containing all the <img> elements in the HTML document.
linksAn HTMLCollection containing all the <link> elements in the HTML document.
scriptsAn HTMLCollection containing all the <scripts> elements in the HTML document.
formsAn HTMLCollection containing all the <form> elements in the HTML document.
titleEnables us to get and set the title of the HTML document.
referrerA string containing the value of the Referer HTTP header associated with the document.
lastModifiedA string containing the value of the Last-Modified HTTP header associated with the document, if there is any, or else the current date.
contentTypeA string containing the value of the Content-Type HTTP header associated with the document.
cookieEnables the retrieval and setup of given HTTP cookies on the current domain.

You might have noticed one thing weird over here that some properties exist on the Document interface which we've seen under the Element interface before, e.g children, childElementCount, etc.

How is this the case?

Well, let's understand it carefully.

How do Element properties show up on Document?

First off, recall that Document inherits from Node and Node inherits from EventTarget (which then inherits from Object, ending the prototype chain).

So there is absolutely no way that an Element property could make it to Document.

What you see above is, according to the spec, a consequence of something called mixins. A mixin is simply to copy given properties to given objects, i.e. to mix them into the given object directly, instead of having them been inherited.

We don't need to worry about exact implementation details as to how JavaScript engines create these mixins, or whether they even create them or not. We just need to be aware of the fact that Document properties such as children, childElementCount, etc. are separate from the ones with the same name on Element.

Obviously, since the naming is the same, the properties have the exact same underlying behavior. But the property getter and setter functions stored internally, in memory, aren't the same. The good news is that we can even confirm this with the help of a very simple piece of code.

In the following snippet, we retrieve the getter of children from both the interfaces, Document and Element, directly via their prototype properties, and then compare them for identity using ===:

Object.getOwnPropertyDescriptor(Document.prototype, 'children').get === Object.getOwnPropertyDescriptor(Element.prototype, 'children').get
false

Since the return value of the comparison operation is false, it's clear that the children getters of both these interfaces aren't the same, and consequently, even the children property isn't the same.

The semantics of all such properties on Document, that are otherwise available on Element as well, are the exact same as the semantics of the similar properties on Element.

Likewise, we won't be going over these properties in this chapter again. To learn more about them, please refer to the chapter HTML DOM — Elements.

Alright, with this out of the way, let's start exploring the rest of the properties of Document (the ones shown in bold typeface).

Document's URL

It's a very common thing these days to retrieve the URL of a given HTML document as single-page applications, analytic libraries, and other sophisticated URL-processing mechanisms are prevalent.

In this regard, we can use the URL property of the given Document instance to obtain the URL of the document.

Another property, semantically identical to URL, is the documentURI property. But since it's more to type, we'll stick to URL.

document.URL isn't strictly the only way to retrieve the URL of a given HTML document. As we'll see in the chapter JavaScript Location Interface, the location object, available on window, allows us to work with the document's URL in multiple ways.

Let's see URL in action.

The shortest and simplest way to do this is to open up the console for the current document and then inspect the value of document.URL. This is done below:

document.URL
'https://www.codeguage.com/courses/js/html-dom-documents'

As can be seen in the return value of the given expression, the URL of the current page is logged just as it's shown in the address bar above.

Try adding a hash (#) at the end of the URL in the address bar above, and then inspect document.URL again. This time, the returned value would contain the hash as well.

Quick access to certain nodes

As we've seen in the chapter HTML DOM — Nodes, there are a couple of properties on Node instances to traverse up/down and left/right in the DOM tree from a given Node instance.

Since Document inherits from Node, all these properties are available on it as well. For instance, given the code below, we can easily access the <!DOCTYPE html> node by referring to the first child node of document. Similarly, we can access <html> by accessing the second child node of document.

Technically, all nodes in a given HTML document can be reached solely via these traversing properties. However, the Document interface provides a handful of shortcut properties to directly obtain certain nodes, or collections of nodes.

doctype

The doctype property returns the document type node associated with the current HTML/XML document. As expected, it works on both HTML and XML documents.

An example is illustrated below:

document.doctype instanceof DocumentType
true

documentElement

The documentElement property returns back the root element of the document. For HTML documents, this is the <html> element.

Root element vs. root node

Note that there is fine distinction between the root element of a given HTML/XML document and the root node of its corresponding DOM tree.

The root node of the DOM tree would always be a Document instance whose children include the document type node and the root element node of the document.

The root element, on the otherhand, would always be the element that contains all other elements in the document. As stated before, the root element of HTML documents is <html>.

Simple?

In the following snippet, we access document.documentElement, and then inspect a couple of its properties:

document.documentElement.nodeName
'HTML'
document.documentElement.parentNode.nodeName
'#document'
document.documentElement === document.childNodes[1]
true
document.documentElement === document.querySelector('html')
true

body

The body property returns back the <body> element of the given HTML document. Not surprisingly, it's only meant for HTML documents.

Since the body property is an HTML-specific feature (not common to HTML and XML documents), it's defined in the WHATWG HTML standard (which defines HTML-specific features), not in the WHATWG DOM standard (which defines the core API used across HTML and XML documents).

Below we inspect document.body:

document.body.nodeName
'BODY'
document.body.parentNode.nodeName
'HTML'
document.body === document.documentElement.children[1]
true
document.body === document.querySelector('body')
true

images

Recall from the chapter HTML DOM Introduction, the very first stages of the DOM API, in the Netscape Navigator browser, provided access to only a few set of elements, for e.g. anchors, forms and images.

This was referred to as the Legacy DOM, or DOM Level 0, and is still supported in modern browsers for backwards compatibility reasons.

The following properties of the Document interface are all remnants of the legacy DOM: images, scripts, links, forms. In this section, we start off with exploring images.

The images property returns back an HTMLCollection containing all of the <img> elements in the underlying HTML document. Practically, it's the same as calling getElementsByTagName('img') on the document, but obviously much shorted when written.

Let's take a look at an example.

In the following HTML code, we have three <img> elements:

<img src="image-1.jpeg" alt="A normal-sized image">
<img src="large-image-1.png" alt="A large-sized image">
<img src="large-image-2.svg" alt="A large-sized image">

The JavaScript below logs the src of each of these images, by iterating over document.images and retrieving the value of the src attribute from each element:

var imageElements = document.images;

for (var i = 0, len = imageElements.length; i < len; i++) {
   console.log(imageElements.getAttribute('src'));
}

Here's the console output:

image-1.jpeg large-image-1.png large-image-2.svg

links

The links property returns back an HTMLCollection containing all the links (i.e. <a>) in the HTML document. More specifically, it only contains those <a> elements that have the href property set.

There is a property of the Document interface similar to links which we should watch out for when working with the interface. The snippet below elaborates on it.

links vs anchors

Apart from links, there is another property on the Document interface, called anchors that might seem familiar to links.

At first glance, one might be tempted to think that anchors returns back a collection of all the anchor (<a>) elements in the HTML document and that links returns back a collection of all the <link> elements.

However, this is just NOT the case.

anchors is a deprecated legacy feature of the legacy DOM (too much legacy here). Back in the day, it used to return a collection of all the <a> elements containing the name attribute. Even today, most browsers support it, but only for backwards compatibility.

It's recommended that the property not be used at all in newer code.

Let's take an example.

Consider the following HTML markup:

<a href="home">Home</a>
<a href="about">About</a>
<a href="contact">Contact</a>

In the code below, we go over the document.links collection and convert the href attribute of each <a> element into a data-href attribute:

var anchorElements = document.links;

while (anchorElements.length) {
   var href = anchorElements[0].getAttribute('href');
   anchorElements[0].setAttribute('data-href', href);
   anchorElements[0].removeAttribute('href');
}

After the execution of this code, the HTML document would look something as follows:

<a data-href="home">Home</a>
<a data-href="about">About</a>
<a data-href="contact">Contact</a>

Notice how we don't use a for loop to iterate over document.links here. This is because the returned HTMLCollection is live.

As soon as we remove the href attribute from a given <a> element, the HTMLCollection instance anchorElements would no longer have that element in it. Therefore, we directly access the first <a> element of the collection — after the processing, the next <a> element would take up this position.

Using a for loop here, as we used in the previous code snippets, would lead to undesired results.

Document's title

Retrieving the title of an HTML document is a farily simple task.

We can get it and even set it by first selecting the <title> element in the document's <head> and then getting or setting its textContent property, respectively.

An example is illustrated below:

<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <title>Working with title</title>
</head>
<body>
<script>
   var title = document.querySelector('title').textContent;
   document.write(`<h1>Title: <i>${title}</i></h1>`);
</script>
</body>
</html>

Live Example

As can be seen, we manually select the <title> element and then retrieve its textContent property to get the title of the document. This is then finally output to the document.

Now as you might agree, this whole procedure can be a little too much just for the sake of obtaining the document's title or changing it to some other value.

A much shorter and simpler way is to use the title property of the Document interface.

In particular, when the title property is set, the value of the <title> element is set to the provided value. And when it's get, the value of the <title> element is retrieved.

Precisely speaking, there is more to the title property than what's described above. When title is set, if there is no <title> element in <head>, it's created and then its textContent set. Similarly, when title is get, if there is no <title> element in head, '' is returned.
The title property was introduced into the DOM API with the DOM Level 2 standard.

Simple?

Shown below is an example. It's the same as before, just this time document.title is used instead of approaching the <title> element manually:

<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <title>Working with document.title</title>
</head>
<body>
<script>
   document.write(`<h1>Title: <i>${document.title}</i></h1>`);
</script>
</body>
</html>

Live Example

Let's take another example, this time changing the title of the document.

In the code below, we set document.title to change the document's title from 'Working with document.title' to 'Setting document.title':

<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <title>Working with document.title</title>
</head>
<body>
<p>Look at the top of the browser window and notice the title of this document.</p>
<script>
   document.title = 'Setting document.title';
</script>
</body>
</html>

In the link below, notice the title of the document as shown at the top of the browser window.

Live Example

Document headers

When an HTML/XML document is requested and thereby received in a browser, there is a great deal of HTTP headers obtained in the response. The DOM API exposes some of these headers via a handful of its properties, as we shall see in this section.

referrer

The referrer property of the Document interface returns back the value of the Referer HTTP request header associated with the document.

And now what's Referer header?

Well, the Referer request header, sent to a server upon requesting for a given resource, holds the URL of the page where the given HTTP request originated. By 'originated' we mean that a link on that page was clicked to dispatch an HTTP request.

The spelling of Referer, without two 'r's, is not a typo here — it's the exact same spelling used in the HTTP specification.

Note that if a given document is opened up directly by entering its URL manually into the address bar, or by opening it up via the bookmarks manager, or via the home page of a browser with shortcut links, it would have NO Referer HTTP header. And in this case, the referrer property would return ''.

Consider the code below:

document.write(`Referrer: <code>${document.referrer}</code>`);

It's just meant to display the value of document.referrer out on the document.

Here's a link to this page:

Live Example

The moment we click on it, and the page consequently loaded, we'll see the URL of the current page displayed there.

Try copying the URL of the link above and then directly navigating to it in a new tab, by entering the copied address in the address bar. This time, there shouldn't be any output in the document since document.referrer is ''.

contentType

The contentType property holds the value of the Content-Type HTTP response header received for the underlying document.

The Content-Type header represents the MIME (Multipurpose Internet Mail Extensions) type of the underlying resource. In other words, it simply tells us about the type of the resource, i.e. whether it's a text file, an HTML file, a JPEG image, an MP3 audio file, a binary object, etc.

It is sent by the server to the client, and is then relayed back to the DOM API via the contentType property of the Document interface.

For HTML documents, contentType (and the Content-Type header) is the string 'text/html'. For XML documents, it's the string 'text/xml'.

In the snippet below, we retrieve the contentType of the current resource, which is an HTML document. Ideally, the return value should be 'text/html':

document.contentType
'text/html'

And it indeed is 'text/html'.

lastModified

The lastModified property returns back the value of the Last-Modified HTTP response header for the given document.

If there is no such header in the response received for the document, lastModified returns back the current date, in the form of a string.

Keep in mind that lastModified is NOT a Date instance; instead, it's a string whose format is 'MM/DD/YY hh:mm:ss'.

In the following snippet, we log the lastModified property of the current document:

document.lastModified
''

Since the current document doesn't have any Last-Modified header in its HTTP response, lastModified returns the current date.

cookie

Perhaps, one of the most useful of these four properties that's frequently used in almost all web applications these days is cookie.

The cookie property of the Document interface allows us to get the values of existing HTTP cookies as well as create new cookies for the underlying document's domain.

When get, cookie relays back the value of the Cookie HTTP response header. And when set, it triggers the browser to process the given value and then either create a new cookie or delete an old one with the same name, depending on the value.

We'll explore cookies in detail in the chapter JavaScript Cookies.

Document stream methods

The DOM API provides three kind-of-near legacy methods to work with a document as a stream of data. They are write(), open() and close().

The write() method is simply meant to write a piece of content to the document stream; open() is meant to open up a new document stream where write() can operate; and close() is meant to close an open document stream.

Now before we proceed any further in understanding each of these three methods, it's paramount to understand that as per the WHATWG HTML specification, it's advised NOT to use them. They have inconsistent behavior across different implementations, and sometimes inconsistent behavior within a given implementation as well.

To write data to the document, we must always use the methods that we discussed in the previous chapters. write() might be used ONLY in simple demonstrative cases, just like we've been doing at times in this course; it's NOT meant to be used in production code.

Alright, with this clear, let's now understand each of these methods in detail.

write()

The write() method is used to write some HTML markup to the document stream.

If it's called while the current document is still loading, it effectively writes the given piece of content right next to the <script> element where it was written.

Otherwise, if it's called after the document is finished loading, it opens up an entirely new document stream, clearing up all of the content in the document, and then writes the given piece of text to the stream.

Here's its syntax:

documentObject.write(html)

The html argument is a string containing the HTML markup to write to the document stream. It's parsed for nodes, just like innerHTML.

Let's see a couple of examples of the usage of write().

In the following code, we use document.write() to write a piece of content inside the <b> tag where the <script> is written:

<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <title>Working with document.write()</title>
</head>
<body>
   Language: <b><script>document.write('JavaScript')</script></b>
</body>
</html>

The invocation of write() here falls into the former case as mentioned above, i.e. it happens while the document is still loading. Thus, the provided value is parsed and then the respective nodes inserted right after the <script> element.

Here's the output produced:

Language: JavaScript

We could even go a step further and inspect the DOM tree produced by the code above. The figure below shows the DOM tree:

Inspecting the DOM tree after executing document.write()

Notice the text 'JavaScript' (as highlighted) right after the <script> element. This is the result of the document.write() call in the code above.

open()

The open() method is used to open up a new document stream for writing.

Here's the syntax of open():

documentObject.open()

Opening up a new stream simply means that the previous stream and the entire DOM tree associated with it are both flushed out, leaving us with a new and empty stream.

When document.write() is called for the first time after the current document is loaded, it automatically calls the routine used by document.open(). Thus a new document stream is opened up and all the content of the document is erased.

Calling open() during the loading phase of the document simply has no effect.

Thus, the code below:

<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <title>Working with document.write()</title>
</head>
<body>
<script>
document.open(); // Has no effect. document.write('JavaScript'); </script> </body> </html>

is identical to the code below, without the call to document.open():

<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <title>Working with document.write()</title>
</head>
<body>
<script>
   document.write('JavaScript');
</script>
</body>
</html>

close()

The close() method is used to close an open document stream.

It requires no arguments, as shown in the syntax below:

documentObject.close()

When write() is called on a closed document stream, it automatically invokes the open() method.

So technically, if we need to open/close a document stream, we only need to use one of the two methods open() and close().

For instance, if we wish to clear away the content of the document, once it has loaded, and then add some other content, we have two options:

  • Call document.open() followed by calling document.write() to write the other content.
  • Call document.close() followed by calling document.write() to write the other content.

Akin to open(), the close() method also has no effect when it's invoked during the loading phase of an HTML document.

"I created Codeguage to save you from falling into the same learning conundrums that I fell into."

— Bilal Adnan, Founder of Codeguage