Why attribute values should always be quoted in HTML,
or the saga of the slashed validators

Summary

There are several reasons why an HTML author should always put attribute values into quotes in HTML, although the formal rules allow the omission of the quotes in some cases. This document briefly lists some of the reasons. Then it tells a particular horror story of what an unquoted attribute value containing a slash may cause, due to an interesting discrepancy between validators (and the HTML specifications) and most browsers. The lesson is that any attribute value which is unquoted and contains / will confuse validation; this may imply that gross errors (which make browsers wild) pass by since they are, technically speaking, not errors at all.

Notice that the slash (solidus) character / occurs frequently in URLs used in HREF, SRC, and other attributes.

The official rules

The rules for using or omitting quotes around an attribute value have been the same in HTML since its beginning. In the HTML 4.0 specification, section Attributes, the rule is formulated as follows:

By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (") and single quotes ('). For double quotes authors can also use the character entity reference ".

In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), and periods (ASCII decimal 46). We recommend using quotation marks even when it is possible to eliminate them.

The rules actually follow from the general rules of the SGML language. More exactly, they follow from the SGML standard and the following definitions in the SGML declaration for HTML:

General reasons to always quote attribute values

There are several reasons to use quotes around attribute values always:

The only drawbacks are a minor extra effort of typing and minor extra storage and transmission time required. These are practically negligible, since the quotes constitute just a small fraction of an HTML file.

A specific reason: the horror story of the slashes

Consider the following piece of HTML:

<A HREF=../test/abug.html>
<IMG ALT="Yucca" SRC=../yucca.gif ALING=right>
<A HREF=foo.html>bar</A>

Taking a sufficiently close look at it, you will probably notice a typo, ALING instead of ALIGN, the absence of the end tag </A> for the first A element, and the lack of quotes around ../yucca.gif (which needs to be quoted, since it contains the character /).

But this actually validates on different validators such as the W3C Validation Service. Even more surprisingly, it's not a bug in the validators!

The reason is that in HTML, as defined by the specifications, there is the following feature (due to SHORTTAG YES): as an alternative to the normal element presentation syntax
<name attributes>content</name>
the following can be used:
<name attributes/content/

To take a simple example, <EM/foo/ is equivalent to <EM>foo</EM>. Nice, isn't it? (But beware that browsers won't accept it; they'll get very confused, in varying ways, if you try that.)

This means that our code sample is, according to HTML specifications, equivalent to the following:

<A HREF=..>test</A>abug.html>
<IMG ALT="Yucca" SRC=..>yucca.gif ALING=right>
<A HREF=foo.html>bar</A>

which is completely legal (notice that .. as an attribute value needs no quotes), although most probably not what the author intended! The validator regards the string abug.html> as normal plain text between the A element and the IMG element. (Contrary to popular belief, the > character can be used as such within plain text. It might be wise to use notation &gt; for it, in symmetry with using &lt; for <. But sometimes it is important to realize that a plain > character within text is accepted as such by validators and displayed by browsers.)

Technically speaking, the slash character / (for which the official but rarely used name is "solidus") is a concrete syntactic presentation of NET, Null End Tag (which may have different presentations in other SGML based languages). In order to be enabled for this special meaning, NET must first appear as terminating a start tag (instead of the normal > character). Thus, in <A HREF=../test/abug.html> the first slash acts as a terminator which makes <A HREF=../ a "NET-enabling start-tag". If quotes are used around "../test/abug.html", this cannot happen, since that construct is parsed as a single string.

Browsers usually do not support the alternative syntax involved. Browsers so commonly deviate from the requirements that even the requirements themselves mention the fact. The HTML 4.0 specification has an interesting excuse-like formulation:

Some SGML SHORTTAG constructs save typing but add no expressive capability to the SGML application. Although these constructs technically introduce no ambiguity, they reduce the robustness of documents, especially when the language is enhanced to include new elements. Thus, while SHORTTAG constructs of SGML related to attributes are widely used and implemented, those related to elements are not. Documents that use them are conforming SGML documents, but are unlikely to work with many existing HTML tools.

What happens in practice (on the great majority of browsers at least) is that the code sample is treated as

<A HREF="../test/abug.html">
<IMG ALT="Yucca" SRC="../yucca.gif" ALING=right></A>
<A HREF=foo.html>bar</A>

That is, browsers imply the quotes and a missing end tag. So what is problem, you may ask. Should you care about validators accepting the code despite its being formally incorrect when it works in (virtually all) browsers?

Does it actually work? Well, not in the intended way. The attribute ALING=right has no effect; the normal (and recommended) policy of browsers is to ignore attributes if the attribute name is unknown. Thus, the image is not aligned in the way the author wanted. A validator should of course have detected the typo (ALING instead of ALIGN), but it didn't analyze that part of the code as an attribute at all since it thought, according to SGML rules, that it was just plain text in the content of an element.

As you may guess, problems like this can be hard to find. If typos in HTML markup cannot be detected automatically, things get pretty annoying. Notice that if you were wondering why the code does not work as intended, you might first notice the lack of the end tag </A>, check the specs to see that it really is required, and expect to have solved the problem. What actually happens when you resubmit the code to a validator is that now it says that the document is in error, complaining that the end tag shouldn't be there!

Error at line 5:
   <IMG ALT="Yucca" SRC=../yucca.gif ALING=right></A>
         end tag for element "A" which is not open ^ (explanation...)

Sorry, this document does not validate as HTML 4.0.


Source Listing

Below is the source input I used for this validation:

   1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
   2: <TITLE>Test</TITLE>
   3: <P>
   4: <A HREF=../test/abug.html>
   5: <IMG ALT="Yucca" SRC=../yucca.gif ALING=right></A>
   6: <A HREF=foo.html>bar</A>

(That has actually happened, as you perhaps deduced from the use of the words "horror story".)

Implications on validators and checkers

The observations made here have no direct implication on validators. Validators should continue processing HTML documents according to the specifications. They must not report constructs like our original example as erroneous.

Checkers, or linters, on the other hand, should issue warnings about any use of minimized syntax like <EM/foo/, by default at least. (Actually, some checkers, e.g. WebLint, seem to have parsers similar to those of typical browsers, not similar to those of typical validators. This means that without major reconstruction they have little chance of giving adequate warnings about minimized syntax.)

For a program acting both as a validator and as a checker, it is important, in this issue and otherwise, to make clear distinction between (validation) error messages and (checker) warning messages.

The conclusion

The traditional answer to people who ask "should I put an attribute value into quotes" has been "when in doubt, quote".

The discussion in this document strongly suggests appending the following to the answer: even when not in doubt, quote anyway.


1998-07-06

Jukka Korpela

I am indebted to Arjun Ray (aray@interactrx.com) for extremely valuable contributions on HTML and SGML issues, both in Usenet news (particularly in comp.infosystems.www.authoring.html) and by E-mail. The problem discussed in this document would still be a mystery to me without Arjun's advice.