HTML Entities
Learning outcomes:
- What are HTML entities
- Named and numeric entities
- Commonly used named entities
- Numeric entities
What are entities?
Let's say we want to add the less-than symbol, <
, literally inside a <p>
element (or for that matter, in any element). How could we do so?
Well, based on what we've learnt so far, we might go on and do the following:
<p>2 < 3</p>
2 < 3
This works, at least in this case, but it is NOT considered a good practice at all.
The <
character is reserved for a special purpose by HTML, i.e. to denote the beginning of a tag, and should therefore not be used as it is in code.
To better understand this, let's say we want to denote the text 'This is <code>' literally in HTML. We couldn't do the following:
<p>This is <code></p>
since <code>
will be treated as an element, NOT literally as the text '<code>'.
So what to do now?
The correct way to literally denote a character such as <
or >
in HTML, that is otherwise reserved for a special purpose, is to use an HTML entity.
An entity begins with an ampersand (&
) and ends with a semicolon (;
). Between these we put some characters to altogether represent another character.
An entity is sometimes also known as a character reference in HTML.
There are multiple ways of specifying an entity:
- Named entities, whereby a short abbreviation is used to represent the underlying character.
- Numeric entities, whereby a number representing the underlying character is expressed as a decimal integer or a hexadecimal integer.
Since they're easier to remember compared to numeric entities, we'll typically use named entities when writing HTML code.
Anyways, let's now see how to denote <
using an entity.
From mathematics, recall that the <
symbol is called the less-than symbol. In HTML, the named entity <
denotes this <
character. (You can obviously guess what 'lt' means here).
When an HTML parser encounters <
, it realizes that it's an entity. Likewise, it goes through its large collection of named entities and deduces that <
corresponds to <
, and likewise replaces <
with <
in the final output.
The source code of the page would obviously still contain <
but the rendered output would be different, containing the <
character.
Here's a quick example of using <
:
<p>2 < 3</p>
2 < 3
In the next section, we shall learn about some of the most commonly used entities in HTML.
Commonly used named entities
HTML defines a jaw-dropping amount of named entities, covering a huge variety of characters and symbols from a diverse set of languages and areas of study.
Now that we understand what exactly is an HTML entity, let's spare a couple of minutes in getting to know some of the most commonly-used ones.
Non-breaking spaces
Recall the fact that in HTML, each and every sequence of whitespace characters (spaces, tabs, newlines, etc.) is replaced with a single whitespace character.
A non-breaking space character, however, doesn't get treated as such a whitespace character in HTML even though it produces whitespace.
So what does this mean? Let's find out.
Denoted as
, a non-breaking space represents a space character that follows two basic rules:
- It does NOT get treated as a regular whitespace character, which means that it remains in the output as it is in the code.
- It does NOT allow the breaking of text (which otherwise allows a single line of text to be wrapped on to a new line if all of its text couldn't fit on one line in the output.)
Expanding upon the first rule, if we have 10
entities, we'll get exactly 10 corresponding space characters in the output — HTML doesn't strip these off.
Following is an example:
<p>Here we have 5 spaces: " "</p>
<p>Here we have 5 non-breaking spaces: " "</p>
Here we have 5 spaces: " "
Here we have 5 non-breaking spaces: " "
The sequence of normal space characters in the first <p>
gets reduced down to one single space (as per the default behavior of HTML). However, in the second <p>
, the sequence of non-breaking spaces, each denoted as
show up as they are written in the source code.
The
entity can be really handy when we just want to add an extra space or two somewhere in our HTML but don't necessarily want the power of preformatting for this.
The second rule, which is where the non-breaking space character gets its name from, means that a line of text couldn't be broken down at this character unlike how it could be broken down at a regular space.
Shown below is an example:
<p>This is an overflowing word.</p>
<p>This is an overflowing word.</p>
We have two paragraphs with the same text except for that the second one has a non-breaking space between 'overflowing' and 'word' (and hence the second paragraph couldn't be broken down at this space).
Using some CSS (which we'll explore later on in this course), we emulate the scenario of there not being sufficient width to fit the word 'word' on the same line. Take a look at both the paragraphs as follows:
This is an overflowing word.
This is an overflowing word.
To fit the entire text inside the <p>
element, the browser is configured to break the line of text upon any whitespace character.
- In the first paragraph, the space after 'overflowing' is taken to be the breaking point for the line of text, simply because it's a regular space.
- In the second paragraph, however,
denotes a non-breaking space which the browser can't break at; it instead breaks the line of text at the space following 'an' (since that's a regular space).
as grouping two words together so that they become a single word, although, obviously, that's not the case visually.Less-than (<
) and greater-than (>
)
We already saw the <
entity in the previous section above but let's quickly see it once again, along with >
.
The less-than (<
) symbol is given by the <
entity while the greater-than (>
) symbol is given by the >
entity.
If we need to use either of these symbols (<
and >
) in HTML, we must use their corresponding entities, since both the symbols are reserved for a special purpose in HTML, i.e. to denote HTML tags.
In the following code, we solve the problem discussed above, where we wanted to represent the text 'This is <code>' in HTML:
<p>This is <code></p>
Let's see the output:
Voila! Just as we wanted.
Ampersand (&
)
Suppose we want to represent literally the text '>' in HTML. How could we do this?
Well, if we write >
as it is, we'll obviously get the corresponding >
character, not the text itself, as can be seen below:
<p>Greater-than (>)</p>
Greater-than (>)
What we need to do here is to replace the ampersand (&
) from >
so that it isn't treated as an entity by the browser. And that's exactly where &
enters the game.
The ampersand (&
) character is given by the named entity &
.
Coming back to our question, to represent '>' literally in HTML, we just need to use &
in place of the &
character. In that way, the whole sequence won't be treated as an entity but rather as plain text.
The following code demonstrates this:
<p>Greater-than (&gt;)</p>
Greater-than (>)
Numeric entities
While there are hundreds and hundreds of named entities in HTML, they still don't altogether represent the complete set of characters possible in Unicode.
For that, we use another kind, one that specifies the code point (the number) associated with a given character, as a decimal or hexadecimal integer.
As we know, such entities are referred to as numeric entities.
- For a decimal representation, the code point is written as it is between the
&
and;
characters, prefixed with#
(which means that a number follows). - For a hexadecimal representation, the code point is converted to its corresponding hexadecimal integer and then written between
&
and;
, prefixed with#
and additionallyx
.
So, in general, a decimal numeric entity can be expressed as &#code;
whereas a hexadecimal numeric entity can be expressed as ode;
, where code
denotes the number representing the code point of the underlying character.
Using a numeric entity, we can express just about any character in an HTML document.
Let's take the example of the <
character.
<
has the code point 60 in Unicode, sometimes also expressed more technically as U+003C (where 003C is the hexadecimal representation of the number 60).
60 is already a decimal number, likewise, the decimal entity for <
would trivially be <
. Converting 60 to hexadecimal gives us the number 3C, hence the hexadecimal entity would be <
(or equivalently, <
, with an uppercase C
).
The code below expresses <
in three different ways:
<p>Less-than symbol: <</p>
<p>Less-than symbol: <</p>
<p>Less-than symbol: <</p>
Less-than symbol: <
Less-than symbol: <
Less-than symbol: <
Entities are amazing, aren't they.
If a character doesn't have a corresponding named entity, you can always use a numeric entity to express it in HTML, given that you know its code point.
Spread the word
Think that the content was awesome? Share it with your friends!
Join the community
Can't understand something related to the content? Get help from the community.