HTML Serialization Exercise - HTML DOM

Objective

Create a function to replicate the behavior of innerHTML when it's retrieved.

Description

In the chapter HTML DOM — Elements, we came across the innerHTML property of element nodes. When retrieved, it returns back the HTML content inside the element node. When set, it replaces the content inside the element node with the given value.

The algorithm used when we get innerHTML is referred to as the HTML serialization algorithm.

Serialization, in this context, means to go from an object version of a node to its corresponding string-based version.

The algorithm used when we set innerHTML is referred to as the HTML parsing algorithm.

It is a bit more complex than serialization as we have to analyze the provided value (to set innerHTML to) for correct grammar and a bunch of other things such as attributes, their values starting and ending tags, comments, textual pieces of content, and so on.

In this exercise, you have to create a method getInnerHTML(), of the Element interface, that replicates the HTML serialization algorithm, as used by innerHTML when we access the property in a get context.

You must reverse-engineer the algorithm, i.e. see how innerHTML works on a handful of examples, and then implement the method to work in the same way.

View Solution

New file

Inside the directory you created for this course on JavaScript, create a new folder called Exercise-47-HTML-Serialization and put the .html solution files for this exercise within it.

Solution

Let's start by setting up the wireframe of the method:

Element.prototype.getInnerHTML = function() {
   // Code to go here.
}

In the method getInnerHTML(), we just need to go over each of the child nodes of the given element node, serialize it, concatenate the result to an accumulator variable, and then return this variable in the end.

Simple.

In the code below, we lay out the general setup required for the method:

Element.prototype.getInnerHTML = function() {
   var html = '';

   for (var i = 0, len = this.childNodes.length; i < len; i++) {
      var node = this.childNodes[i];

      // Code to go here.
   }

   return html;
}

The variable html is an accumulator variable meant to hold the value to be returned at the end of the execution of getInnerHTML(). The for loop is meant to iterate over each child node of element, stored in the variable node, serialize it, and finally concatenate the serialized string to html.

As for the serialization of each node, we need to tackle different cases here.

That is, we need a different serialization mechanism for an element node, a different one for a text node, and a different one for a comment node.

For an element node, implementing the Element interface, we just need to recursively call getInnerHTML() in order to obtain the serialized version of the entire content inside the node and then add a couple more stuff to it.

This stuff is the starting and ending tag of the element node which requires us to access the tagName and attributes properties of the node.

This case, which is the hardest to implement amongst all three, is addressed below:

Element.prototype.getInnerHTML = function() {
   var html = '';

   for (var i = 0, len = this.childNodes.length; i < len; i++) {
      var node = this.childNodes[i];

      if (node instanceof Element) {
         var tagName = node.tagName.toLowerCase();

         // Get an array of all the Attr nodes of the current node.
         var attributesArray = Array.prototype.slice.call(node.attributes);

         // Map each Attr node to the string 'name="value"', where
         // name is the name of the Attr node and value is its value,
         // and then join the mapped array using ' ' as the delimiter.
         var attributesStr = attributesArray.map(function(attribute) {
            return attribute.name + '="' + attribute.value + '"';
         }).join(' ');

         // If there are no attributes, attributesStr must be empty.
         // Otherwise, it must have a space at its start so that we can
         // easily concatenate it with tagName.
         attributesStr = attributesStr === '' ? '' : (' ' + attributesStr);

         html += '<' + tagName + attributesStr + '>' +
                 node.getInnerHTML() +
                 '</' + tagName + '>';
      }
   }

   return html;
}

So far, so good.

For a text node, we just need to return its nodeValue, i.e. the text content associated with the node.

This second case is addressed below:

Element.prototype.getInnerHTML = function() {
   var html = '';

   for (var i = 0, len = this.childNodes.length; i < len; i++) {
      var node = this.childNodes[i];

      if (node instanceof Element) {
         // ...
      }

      else if (node instanceof Text) {
         html += node.nodeValue;
      }
   }

   return html;
}

Great still.

Moving on, for a comment node, we just need to join its nodeValue with '' to form a comment tag.

This third and last case is addressed below:

Element.prototype.getInnerHTML = function() {
   var html = '';

   for (var i = 0, len = this.childNodes.length; i < len; i++) {
      var node = this.childNodes[i];

      if (node instanceof Element) {
         var tagName = node.tagName.toLowerCase();
         var attributesArray = Array.prototype.slice.call(node.attributes);
         var attributesStr = attributesArray.map(function(attribute) {
            return attribute.name + '="' + attribute.value + '"';
         }).join(' ');
         attributesStr = attributesStr === '' ? '' : (' ' + attributesStr);

         html += '<' + tagName + attributesStr + '>' +
                 node.getInnerHTML() +
                 '</' + tagName + '>';
      }

      else if (node instanceof Text) {
         html += node.nodeValue;
      }

      else {
         html += '<!--' + node.nodeValue + '-->';
      }
   }

   return html;
}

This completes our method.

Let's now test it. Ideally, the return value of getInnerHTML() should be the same as innerHTML.

Consider the following HTML markup:

<div id="main">
   <p>A paragraph</p>
   <div data-id="100" class="text-blue">A div</div>
   <!--A comment-->
</div>

In the console snippet below, we log the value of getInnerHTML() and innerHTML on the #main element node:

var mainElement = document.getElementById('main')

undefined

mainElement.innerHTML

'
   <p>A paragraph</p>
   <div data-id="100" class="text-blue">A div</div>
   <!--A comment-->
'

mainElement.getInnerHTML()

'
   <p>A paragraph</p>
   <div data-id="100" class="text-blue">A div</div>
   <!--A comment-->
'

mainElement.innerHTML === mainElement.getInnerHTML()

true

Depending on the browser, the console output displayed upon retrieving innerHTML is different.

As can be seen in the snippet above, clearly our getInnerHTML() method works similar to innerHTML.

That's a lot more than just amazing.

Exercise: HTML Serialization

Objective

Description

New file

Solution

More courses

Python

HTML