This article is part of the Beginner Web Developer Series. The series is targeted to people who’d like to start serious web development, as well as people who are already web developers and want to solidify their knowledge of fundamentals while possibly filling in some holes. If you find yourself tinkering with HTML, CSS, or Javascript until you sort of get it to work, this series is for you. The material in this series is closely tied to my top-rated Coursera course.
First things first. What IS an HTML Character Entity Reference?
HTML character entity reference is a special set of characters (a code), which the browser displays as a special character or a symbol, corresponding to the entity reference code.
The general format of an HTML character entity reference is
&
, followed by some code, followed by;
, without any spaces in between.
For example, if you place ©
in your HTML code, the browser will display the copyright symbol ©.
There are many applications for HTML character entity references. In this article, I will concentrate on the most common problems that HTML character entity references solve.
Reserved Characters
Like any language, HTML has a set of special characters which browsers recognize as part of the HTML language itself. For example, browsers know that when they encounter a <
character in the HTML code, they are to interpret it as a beginning of a tag.
Thus, the <
character is a reserved character. It’s reserved by the HTML language as having special meaning, i.e., signifying the beginning of a tag.
But what happens when you want to use one of those reserved characters as part of the content of your document, not as part of the HTML code that dictates the structure of the document?
We need a way to tell the browser not to interpret them as HTML code, but as regular content.
HTML character entity references to the rescue! 💪
<
, >
and &
Character References
There are 3 reserved characters that should always be substituted with their corresponding entity character references.
- Instead of
<
, use<
- Instead of
>
, use>
- Instead of
&
, use&
Example: Reserved Characters in HTML
Let’s take a look at the following HTML document (html-entities-before.html
). (This document contains one of my favorite quotes from Theodore Roosevelt):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>HTML Entities</title>
</head>
<body>
<h1>
Don't be afraid to be <then a
100% success & >more:
</h1>
<p>
"It is not the critic who counts;
not the man who points out how
the strong man stumbles, or where
the doer of deeds could have done
them better. The credit belongs to
the man who is actually in the
arena, whose face is marred by
dust and sweat and blood; who
strives valiantly; who errs, who
comes short again and again,
because there is no effort without
error and shortcoming; but who does
actually strive to do the deeds;
who knows great enthusiasms, the
great devotions; who spends himself
in a worthy cause; who at the best
knows in the end the triumph of high
achievement, and who at the worst,
if he fails, at least fails while
daring greatly, so that his place
shall never be with those cold and
timid souls who neither know victory
nor defeat."
</p>
<p>Theodore Roosevelt
1910 Copyright
</p>
</body>
</html>
Oh, what a great quote, isn’t it?! 🤔
However, besides that great quote, pay attention to the fairly weird-looking content wrapped in the <h1>
tag. (Obviously, that’s not part of the quote. I made it up just for this example.)
Let’s take a look at how this document renders in the browser:
Hmm… A good portion of our heading has disappeared!
In fact, it didn’t disappear.
The browser was interpreting the content of the <h1>
tag when it stumbled upon the <
character.
“Aha!” said the browser.
“It’s a start of a new tag!”
“But what tag is that? <then
? What is that? I guess, I better keep going to see if I can find the end of this opening <then
tag.”
“Found it! It’s the closing tag character >
, right in front of >more
”
“I have no idea how to display this <then>
tag, so I guess, I’ll just skip rendering it to the display.”
And that’s how we ended up with the heading Don't be afraid to be more:
instead of what we actually wanted.
Let’s fix this by substituting HTML character entity references instead of the reserved characters <
, >
, and &
.
The following example (html-entities-after-1.html
) shows the updated code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>HTML Entities</title>
</head>
<body>
<h1>
Don't be afraid to be <then a
100% success & >more:
</h1>
<p>
"It is not the critic who counts;
not the man who points out how
the strong man stumbles, or where
the doer of deeds could have done
[... cut out to shorten ...]
shall never be with those cold and
timid souls who neither know victory
nor defeat."
</p>
<p>Theodore Roosevelt
1910 Copyright
</p>
</body>
</html>
Here is html-entities-after-1.html
rendered in the browser:
Much better! 👍
Chrome Developer Tools Gotcha
If you are looking at the Elements tab of the Chrome Developer Tools (CDT), you may be fooled into believing that regular reserved characters (<
, >
, &
) are being used in the content, and not the HTML character entity references.
For example, take a look at the heading of html-entities-after-1.html
displayed in the CDT:
Don’t be fooled by this!
That’s just Chrome Developer Tools helping make the content more readable. In the raw HTML, the character entity references are there.
To see them, right-click anywhere on the page (without selecting any content!) and choose View Page Source menu option. The raw HTML code will be displayed as shown below:
Not On My Keyboard!
Another reason HTML character entity references exist is provide us with the ability to quickly output special characters not readily available on our keyboards.
One such character that is used quite often is the copyright symbol or ©.
The character entity reference code for © is ©
.
Let’s augment our Roosevelt quote HTML document by placing the copyright symbol as part of the copyright line at the bottom of the document (html-entities-after-2.html
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>HTML Entities</title>
</head>
<body>
<h1>
Don't be afraid to be <then a
100% success & >more:
</h1>
<p>
"It is not the critic who counts;
not the man who points out how
the strong man stumbles, or where
the doer of deeds could have done
[... cut out to shorten ...]
shall never be with those cold and
timid souls who neither know victory
nor defeat."
</p>
<p>Theodore Roosevelt
© 1910 Copyright
</p>
</body>
</html>
Now, the browser is showing the © symbol next to the year:
Much better! 😁
Non-Breaking Space
I would be remiss if I didn’t mention another very commonly used HTML character entity reference,
, also known as the non-breaking space.
Let me explain what
does with an example.
Take a look at the heading h1
of the following document (non-breaking-space-before.html
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>
Non-breaking Space Entity Reference
</title>
</head>
<body>
<h1>
ClearlyDecoded.com provides HTML, CSS, and
Javascript tutorials that are crystal clear.
The content is just great ( amazing )!
</h1>
<p>
The heading above is absolutely and totally
100% objective. It represents pure, unbiased
truth that can't be denied.
</p>
</body>
</html>
Shown below, as I decrease the width of the browser window, you can see that, word by word, the browser is wrapping the heading text onto the next line:
As you can see, the entire heading does not fit on one line. The browser wraps the heading, word by word, onto the next line.
Note that the browser does not wrap the content character by character and it shouldn’t!
How weird would it be if the line of text ended with tutorials that ar
and continued on the next line with e crystal clear
?!
But take a look at the ugly wrapping that happens when we get to the (
amazing
)
part of the heading. One of the parentheses wraps onto the next line, leaving the part enclosed in the parentheses on the previous line:
We have a dilemma. On one hand, we’d like to keep the spaces between the word amazing
and the parentheses around it. On the other hand, we need to keep the whole thing (
amazing
)
as one word, without allowing the browser to break it apart using those spaces.
Hey! It’s almost like… wait for it… we need non-breaking spaces or
!
Genius! 🤓 😂
Let’s use the non-breaking space characters instead of the regular space characters (non-breaking-space-after.html
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>
Non-breaking Space Entity Reference
</title>
</head>
<body>
<h1>
ClearlyDecoded.com provides HTML, CSS, and
Javascript tutorials that are crystal clear.
The content is just great ( amazing )!
</h1>
<p>
The heading above is absolutely and totally
100% objective. It represents pure, unbiased
truth that can't be denied.
</p>
</body>
</html>
Note that there are no regular spaces around the word ‘amazing’: ( amazing )
.
Let’s see how things wrap now:
Amazing, indeed! 😁
Common Beginner Mistake - Don’t Misuse
I’ve mentioned several times in this series of articles (e.g., Anatomy of an HTML Tag) that HTML ignores extra spaces.
You can place 100 spaces between two words in your content and the browser will still display just one space.
It would seem that the non-breaking space character entity reference gives you the power to overcome that rule.
Just place 100
references one after another, right?
Wrong!
While that will somewhat work, it’s a total misuse of this entity reference. The
is meant to do one thing: substitute regular space characters so that the browser doesn’t break up content in an undesired way. It is not meant for expressing margins within your text.
(If you wanted margins in the middle of a sentence, you would wrap that content with a <span>
tag and apply margin-left: 20px;
to that <span>
or some such, but we haven’t covered that yet in this series.)
Displaying Quotes in Attributes
The quote or "
is another character entity reference that is commonly used as part of the value of an attribute of an HTML element.
(If you don’t remember what an HTML attribute is, see my article Anatomy of an HTML Tag.)
For example, we can place a title
attribute on the <h1>
tag (html-entities-after-3.html
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>HTML Entities</title>
</head>
<body>
<h1 title=""Not the critic"">
Don't be afraid to be <then a
100% success & >more:
</h1>
<p>
"It is not the critic who counts;
not the man who points out how
the strong man stumbles, or where
the doer of deeds could have done
[... cut out to shorten ...]
shall never be with those cold and
timid souls who neither know victory
nor defeat."
</p>
<p>Theodore Roosevelt
© 1910 Copyright
</p>
</body>
</html>
Pay attention to the opening <h1>
tag. Since attribute values are usually enclosed in quotes, we can’t simply leave the quotes in the attribute value like so: title=""Not the critic""
.
The browser would consider the second quote in title=""
as the closing quote. Then, the browser will see the next set of characters, Not the critic""
, as an invalid attempt at specifying another attribute.
So, one solution is to use the "
entity reference instead of the "
character in the attribute value.
Another solution would be to use the fact that single quotes and double quotes are interchangeable in HTML. The same title
attribute can then be written as title='"Not the critic"'
.
With either solution, the by-product of specifying a title
attribute on an element is that hovering over that element shows a tooltip with the value specified in the title
attribute, as shown below.
Summary
Let’s give a quick summary of what we’ve covered in this article:
- HTML character entity references allow us to display reserved characters as part of our content
- Content with characters
<
,>
, and&
will be interpreted as HTML code by the browser and can break the HTML code, causing unwanted side-effects (like skipping part of the content) - Character entity references can be used for characters not readily found on common keyboards (e.g., © character)
- The non-breaking space character entity reference,
can be used to force the browser not to break up space-separated words when wrapping content - Repeating
should not be used for larger visual spacing between words. That’s a misuse of the non-breaking space character - HTML attribute values that need to contain quotes can either use the quote character entity reference
"
or, if feasible, interchange between the single and double quotes
Resources
- Code used in this article
- Mozzila Developer Network (MDN) Entity description
- Nice Character Entity Reference Chart
Questions?
If something is not clear about what I wrote in this article, please ask away in the comments below!