Hi, I'm Yaakov Chaikin. I teach grad web development at Johns Hopkins University and on Coursera.org (1 MILLION students & counting!).
By day, I am a software developer.
thumbnail for article on HTML Character Entity References

HTML Character Entity References

8 min read |
| by Yaakov Chaikin

This article is part of the Beginner Web Developer Series. The series is targeted to people who’d like to start serious web development, as well as people who are already web developers and want to solidify their knowledge of fundamentals while possibly filling in some holes. If you find yourself tinkering with HTML, CSS, or Javascript until you sort of get it to work, this series is for you. The material in this series is closely tied to my top-rated Coursera course.

First things first. What IS an HTML Character Entity Reference?

HTML character entity reference is a special set of characters (a code), which the browser displays as a special character or a symbol, corresponding to the entity reference code.

The general format of an HTML character entity reference is &, followed by some code, followed by ;, without any spaces in between.

For example, if you place © in your HTML code, the browser will display the copyright symbol ©.

There are many applications for HTML character entity references. In this article, I will concentrate on the most common problems that HTML character entity references solve.

Reserved Characters

Like any language, HTML has a set of special characters which browsers recognize as part of the HTML language itself. For example, browsers know that when they encounter a < character in the HTML code, they are to interpret it as a beginning of a tag.

Thus, the < character is a reserved character. It’s reserved by the HTML language as having special meaning, i.e., signifying the beginning of a tag.

But what happens when you want to use one of those reserved characters as part of the content of your document, not as part of the HTML code that dictates the structure of the document?

We need a way to tell the browser not to interpret them as HTML code, but as regular content.

HTML character entity references to the rescue! 💪

&lt;, &gt; and &amp; Character References

There are 3 reserved characters that should always be substituted with their corresponding entity character references.

  • Instead of <, use &lt;
  • Instead of >, use &gt;
  • Instead of &, use &amp;

Example: Reserved Characters in HTML

Let’s take a look at the following HTML document (html-entities-before.html). (This document contains one of my favorite quotes from Theodore Roosevelt):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>HTML Entities</title>
</head>
<body>
<h1>
  Don't be afraid to be <then a 
  100% success & >more:
</h1>
<p>
  "It is not the critic who counts;
  not the man who points out how
  the strong man stumbles, or where
  the doer of deeds could have done
  them better. The credit belongs to
  the man who is actually in the
  arena, whose face is marred by
  dust and sweat and blood; who
  strives valiantly; who errs, who
  comes short again and again,
  because there is no effort without
  error and shortcoming; but who does
  actually strive to do the deeds;
  who knows great enthusiasms, the
  great devotions; who spends himself
  in a worthy cause; who at the best
  knows in the end the triumph of high
  achievement, and who at the worst,
  if he fails, at least fails while
  daring greatly, so that his place
  shall never be with those cold and
  timid souls who neither know victory
  nor defeat."
</p>
<p>Theodore Roosevelt 
   1910 Copyright
</p>
</body>
</html>

Oh, what a great quote, isn’t it?! 🤔

However, besides that great quote, pay attention to the fairly weird-looking content wrapped in the <h1> tag. (Obviously, that’s not part of the quote. I made it up just for this example.)

Let’s take a look at how this document renders in the browser:

HTML without character references Roosevelt quote browser screenshot

Hmm… A good portion of our heading has disappeared!

In fact, it didn’t disappear.

The browser was interpreting the content of the <h1> tag when it stumbled upon the < character.

“Aha!” said the browser.

“It’s a start of a new tag!”

“But what tag is that? <then? What is that? I guess, I better keep going to see if I can find the end of this opening <then tag.”

“Found it! It’s the closing tag character >, right in front of >more

“I have no idea how to display this <then> tag, so I guess, I’ll just skip rendering it to the display.”

And that’s how we ended up with the heading Don't be afraid to be more: instead of what we actually wanted.

Let’s fix this by substituting HTML character entity references instead of the reserved characters <, >, and &.

The following example (html-entities-after-1.html) shows the updated code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>HTML Entities</title>
</head>
<body>
<h1>
  Don't be afraid to be &lt;then a 
  100% success &amp; &gt;more:
</h1>
<p>
  "It is not the critic who counts;
  not the man who points out how
  the strong man stumbles, or where
  the doer of deeds could have done
    [... cut out to shorten ...]
  shall never be with those cold and
  timid souls who neither know victory
  nor defeat."
</p>
<p>Theodore Roosevelt 
   1910 Copyright
</p>
</body>
</html>

Here is html-entities-after-1.html rendered in the browser:

HTML with some character references Roosevelt quote browser screenshot

Much better! 👍

Chrome Developer Tools Gotcha

If you are looking at the Elements tab of the Chrome Developer Tools (CDT), you may be fooled into believing that regular reserved characters (<, >, &) are being used in the content, and not the HTML character entity references.

For example, take a look at the heading of html-entities-after-1.html displayed in the CDT:

CDT not showing entity references

Don’t be fooled by this!

That’s just Chrome Developer Tools helping make the content more readable. In the raw HTML, the character entity references are there.

To see them, right-click anywhere on the page (without selecting any content!) and choose View Page Source menu option. The raw HTML code will be displayed as shown below:

View Page Source shows raw HTML with entity references

Not On My Keyboard!

Another reason HTML character entity references exist is provide us with the ability to quickly output special characters not readily available on our keyboards.

One such character that is used quite often is the copyright symbol or ©.

The character entity reference code for © is &copy;.

Let’s augment our Roosevelt quote HTML document by placing the copyright symbol as part of the copyright line at the bottom of the document (html-entities-after-2.html):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>HTML Entities</title>
</head>
<body>
<h1>
  Don't be afraid to be &lt;then a 
  100% success &amp; &gt;more:
</h1>
<p>
  "It is not the critic who counts;
  not the man who points out how
  the strong man stumbles, or where
  the doer of deeds could have done
    [... cut out to shorten ...]
  shall never be with those cold and
  timid souls who neither know victory
  nor defeat."
</p>
<p>Theodore Roosevelt 
   &copy; 1910 Copyright
</p>
</body>
</html>

Now, the browser is showing the © symbol next to the year:

Copyright HTML character entity reference screenshot

Much better! 😁

Non-Breaking Space

I would be remiss if I didn’t mention another very commonly used HTML character entity reference, &nbsp;, also known as the non-breaking space.

Let me explain what &nbsp; does with an example.

Take a look at the heading h1 of the following document (non-breaking-space-before.html):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>
    Non-breaking Space Entity Reference
  </title>
</head>
<body>
<h1>
  ClearlyDecoded.com provides HTML, CSS, and 
  Javascript tutorials that are crystal clear.
  The content is just great ( amazing )!
</h1>
<p>
  The heading above is absolutely and totally
  100% objective. It represents pure, unbiased
  truth that can't be denied.
</p>
</body>
</html>

Shown below, as I decrease the width of the browser window, you can see that, word by word, the browser is wrapping the heading text onto the next line:

Animated gif, browser wrapping words onto next line

As you can see, the entire heading does not fit on one line. The browser wraps the heading, word by word, onto the next line.

Note that the browser does not wrap the content character by character and it shouldn’t!

How weird would it be if the line of text ended with tutorials that ar and continued on the next line with e crystal clear?!

But take a look at the ugly wrapping that happens when we get to the ( amazing ) part of the heading. One of the parentheses wraps onto the next line, leaving the part enclosed in the parentheses on the previous line:

Animated gif, browser wrapping words onto next line, showing that amazing breaks up with parens

We have a dilemma. On one hand, we’d like to keep the spaces between the word amazing and the parentheses around it. On the other hand, we need to keep the whole thing ( amazing ) as one word, without allowing the browser to break it apart using those spaces.

Hey! It’s almost like… wait for it… we need non-breaking spaces or &nbsp;!

Genius! 🤓 😂

Let’s use the non-breaking space characters instead of the regular space characters (non-breaking-space-after.html):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>
    Non-breaking Space Entity Reference
  </title>
</head>
<body>
<h1>
  ClearlyDecoded.com provides HTML, CSS, and 
  Javascript tutorials that are crystal clear.
  The content is just great (&nbsp;amazing&nbsp;)!
</h1>
<p>
  The heading above is absolutely and totally
  100% objective. It represents pure, unbiased
  truth that can't be denied.
</p>
</body>
</html>

Note that there are no regular spaces around the word ‘amazing’: (&nbsp;amazing&nbsp;).

Let’s see how things wrap now:

Animated gif, browser wrapping words onto next line, showing that with non-breaking space amazing doesn't break up with parens

Amazing, indeed! 😁

Common Beginner Mistake - Don’t Misuse &nbsp;

I’ve mentioned several times in this series of articles (e.g., Anatomy of an HTML Tag) that HTML ignores extra spaces.

You can place 100 spaces between two words in your content and the browser will still display just one space.

It would seem that the non-breaking space character entity reference gives you the power to overcome that rule.

Just place 100 &nbsp; references one after another, right?

Wrong!

While that will somewhat work, it’s a total misuse of this entity reference. The &nbsp; is meant to do one thing: substitute regular space characters so that the browser doesn’t break up content in an undesired way. It is not meant for expressing margins within your text.

(If you wanted margins in the middle of a sentence, you would wrap that content with a <span> tag and apply margin-left: 20px; to that <span> or some such, but we haven’t covered that yet in this series.)

Displaying Quotes in Attributes

The quote or &quot; is another character entity reference that is commonly used as part of the value of an attribute of an HTML element.

(If you don’t remember what an HTML attribute is, see my article Anatomy of an HTML Tag.)

For example, we can place a title attribute on the <h1> tag (html-entities-after-3.html):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>HTML Entities</title>
</head>
<body>
<h1 title="&quot;Not the critic&quot;">
  Don't be afraid to be &lt;then a 
  100% success &amp; &gt;more:
</h1>
<p>
  "It is not the critic who counts;
  not the man who points out how
  the strong man stumbles, or where
  the doer of deeds could have done
    [... cut out to shorten ...]
  shall never be with those cold and
  timid souls who neither know victory
  nor defeat."
</p>
<p>Theodore Roosevelt 
   &copy; 1910 Copyright
</p>
</body>
</html>

Pay attention to the opening <h1> tag. Since attribute values are usually enclosed in quotes, we can’t simply leave the quotes in the attribute value like so: title=""Not the critic"".

The browser would consider the second quote in title="" as the closing quote. Then, the browser will see the next set of characters, Not the critic"", as an invalid attempt at specifying another attribute.

So, one solution is to use the &quot; entity reference instead of the " character in the attribute value.

Another solution would be to use the fact that single quotes and double quotes are interchangeable in HTML. The same title attribute can then be written as title='"Not the critic"'.

With either solution, the by-product of specifying a title attribute on an element is that hovering over that element shows a tooltip with the value specified in the title attribute, as shown below.

Showing title with quotes on hover

Summary

Let’s give a quick summary of what we’ve covered in this article:

  • HTML character entity references allow us to display reserved characters as part of our content
  • Content with characters <, >, and & will be interpreted as HTML code by the browser and can break the HTML code, causing unwanted side-effects (like skipping part of the content)
  • Character entity references can be used for characters not readily found on common keyboards (e.g., © character)
  • The non-breaking space character entity reference, &nbsp; can be used to force the browser not to break up space-separated words when wrapping content
  • Repeating &nbsp; should not be used for larger visual spacing between words. That’s a misuse of the non-breaking space character
  • HTML attribute values that need to contain quotes can either use the quote character entity reference &quot; or, if feasible, interchange between the single and double quotes

Resources

Questions?

If something is not clear about what I wrote in this article, please ask away in the comments below!