There are several reasons why an
HTML
author should always put attribute values into quotes in HTML,
although the formal rules allow the omission of the quotes
in some cases.
This document briefly lists some of the reasons. Then it
tells a particular horror story of what
an unquoted attribute value containing a slash may cause,
due to an interesting discrepancy between validators
(and the HTML specifications) and most browsers.
The lesson is that
any attribute value which is
unquoted and contains /
will confuse
validation;
this may imply that gross
errors (which make browsers wild) pass by since they are,
technically speaking, not errors at all.
Notice that the slash (solidus) character /
occurs
frequently in URLs used in HREF
, SRC
,
and other attributes.
The rules for using or omitting quotes around an attribute value have been the same in HTML since its beginning. In the HTML 4.0 specification, section Attributes, the rule is formulated as follows:
By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (") and single quotes ('). For double quotes authors can also use the character entity reference ".
In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), and periods (ASCII decimal 46). We recommend using quotation marks even when it is possible to eliminate them.
The rules actually follow from the general rules of the SGML language. More exactly, they follow from the SGML standard and the following definitions in the SGML declaration for HTML:
LCNMCHAR ".-"
and UCNMCHAR ".-"
which specify that
in addition to letters (A - Z, a - z)
and digits (0 - 9),
which are the characters allowed in names by default
according to
the SGML standard,
the characters .
and -
are allowed.
In fact
SGML declaration for HTML in the HTML 4.0 specification
contains here
".-_:"
instead of
".-"
,
thus
allowing underscore (_
) and
colon (:
)
is name characters, too.
The formal declaration should probably be interpreted as more
authoritative than the prose specification quoted above,
in principle.
But authors should hardly rely on this new feature being
universally implemented; moreover, this change has little
impact on our topic - notice that e.g. the /
is
not among the allowed name characters.
SHORTTAG YES
, which is a feature which allows some
abbreviated forms and without which the omission of quotes would
not be allowed at all;
see
section
Minimize class in
HTML Unleashed: SGML and HTML DTD.
There are several reasons to use quotes around attribute values always:
SHORTTAG NO
;
see
Comparison of SGML and XML).
You may use XML in the future, and in that case your life will
be simpler if you have adopted the habit of quoting attribute
values.
SRC=foo.gif
is legal, but if someone changes the attribute (e.g. due to
moving a file to another directory) to
SRC=images/foo.gif
it becomes illegal.The only drawbacks are a minor extra effort of typing and minor extra storage and transmission time required. These are practically negligible, since the quotes constitute just a small fraction of an HTML file.
Consider the following piece of HTML:
<A HREF=../test/abug.html>
<IMG ALT="Yucca" SRC=../yucca.gif ALING=right>
<A HREF=foo.html>bar</A>
Taking a sufficiently close look at it, you will probably notice
a typo, ALING
instead of
ALIGN
,
the absence of the end tag </A>
for the
first A
element,
and the lack of quotes around
../yucca.gif
(which needs to be quoted, since
it contains the character /
).
But this actually validates on different validators such as the W3C Validation Service. Even more surprisingly, it's not a bug in the validators!
The reason is that in HTML, as defined by the specifications,
there is the following feature (due to
SHORTTAG YES
):
as an alternative to the normal element presentation syntax
<
name attributes>
content</
name>
the following can be used:
<
name attributes/
content/
To take a simple example,
<EM/foo/
is equivalent to
<EM>foo</EM>
. Nice, isn't it?
(But beware that browsers won't accept it; they'll get very
confused, in varying ways, if you try that.)
This means that our code sample is, according to HTML specifications, equivalent to the following:
<A HREF=..>test</A>abug.html>
<IMG ALT="Yucca" SRC=..>yucca.gif ALING=right>
<A HREF=foo.html>bar</A>
which is completely legal (notice that ..
as
an attribute value needs no quotes),
although most probably not what the author intended!
The validator regards the string
abug.html>
as normal plain text between the A
element and the IMG
element.
(Contrary to popular belief, the >
character
can be used as such within plain text.
It might be wise to use notation >
for it,
in symmetry with using <
for <
.
But sometimes it is important to realize that a plain >
character within text is accepted as such by validators and displayed
by browsers.)
Technically speaking, the slash character /
(for which
the official but rarely used name is "solidus") is
a concrete syntactic presentation of NET,
Null End Tag (which may have different presentations in
other SGML based languages).
In order to be enabled for this special meaning, NET must
first appear as terminating a start tag (instead of the normal
>
character).
Thus, in
<A HREF=../test/abug.html>
the first slash
acts as a terminator which makes
<A HREF=../
a "NET-enabling start-tag". If quotes are used around
"../test/abug.html"
, this cannot happen, since
that construct is parsed as a single string.
Browsers usually do not support the alternative syntax involved. Browsers so commonly deviate from the requirements that even the requirements themselves mention the fact. The HTML 4.0 specification has an interesting excuse-like formulation:
Some SGML SHORTTAG constructs save typing but add no expressive capability to the SGML application. Although these constructs technically introduce no ambiguity, they reduce the robustness of documents, especially when the language is enhanced to include new elements. Thus, while SHORTTAG constructs of SGML related to attributes are widely used and implemented, those related to elements are not. Documents that use them are conforming SGML documents, but are unlikely to work with many existing HTML tools.
What happens in practice (on the great majority of browsers at least) is that the code sample is treated as
<A HREF="../test/abug.html">
<IMG ALT="Yucca" SRC="../yucca.gif" ALING=right></A>
<A HREF=foo.html>bar</A>
That is, browsers imply the quotes and a missing end tag. So what is problem, you may ask. Should you care about validators accepting the code despite its being formally incorrect when it works in (virtually all) browsers?
Does it actually work? Well, not in the intended way.
The attribute ALING=right
has no effect;
the normal (and recommended) policy of browsers is to
ignore attributes if the attribute name is unknown.
Thus, the image is not aligned in the way the author wanted.
A validator should of course have detected the typo
(ALING
instead of ALIGN
),
but it didn't analyze that part of the code
as an attribute at all
since it thought,
according to SGML rules, that it was just plain text in
the content of an element.
As you may guess, problems like this can be hard to find.
If typos in HTML markup cannot be detected automatically,
things get pretty annoying.
Notice that if you were wondering why the code does not work
as intended, you might first notice the lack of the end tag
</A>
, check the specs to see that it really is
required, and expect to have solved the problem.
What actually happens when you resubmit the code to a validator
is that now
it says that the document is in error,
complaining that the end tag shouldn't be there!
Error at line 5: <IMG ALT="Yucca" SRC=../yucca.gif ALING=right></A> end tag for element "A" which is not open ^ (explanation...)
Sorry, this document does not validate as HTML 4.0.
Source Listing
Below is the source input I used for this validation:
1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"> 2: <TITLE>Test</TITLE> 3: <P> 4: <A HREF=../test/abug.html> 5: <IMG ALT="Yucca" SRC=../yucca.gif ALING=right></A> 6: <A HREF=foo.html>bar</A>
(That has actually happened, as you perhaps deduced from the use of the words "horror story".)
The observations made here have no direct implication on validators. Validators should continue processing HTML documents according to the specifications. They must not report constructs like our original example as erroneous.
Checkers, or linters, on the other hand, should
issue warnings about any use of minimized syntax like
<EM/foo/
, by default at least.
(Actually, some checkers, e.g.
WebLint,
seem to have parsers similar to those of typical
browsers, not similar to those of typical validators.
This means that
without major reconstruction
they have little chance of giving
adequate warnings about minimized syntax.)
For a program acting both as a validator and as a checker, it is important, in this issue and otherwise, to make clear distinction between (validation) error messages and (checker) warning messages.
The traditional answer to people who ask "should I put an attribute value into quotes" has been "when in doubt, quote".
The discussion in this document strongly suggests appending the following to the answer: even when not in doubt, quote anyway.
1998-07-06
Jukka KorpelaI am indebted to Arjun Ray (aray@interactrx.com) for extremely valuable contributions on HTML and SGML issues, both in Usenet news (particularly in comp.infosystems.www.authoring.html) and by E-mail. The problem discussed in this document would still be a mystery to me without Arjun's advice.