A critical review of the HTML 4.0 draft

by Jukka Korpela

This is a review of the HTML 4.0 draft as announced July 8th, 1997, in several formats. More exactly, these comments are based on the PostScript version. In December 18th, 1997, HTML 4.0 was approved as W3C recommendation. There was a large number of changes from the draft discussed here, but most of the remarks presented here still apply (in the author's opinion).

Content:

Summary

The HTML 4.0 draft, as compared with HTML 3.2, contains additions and changes of varying nature. Instead of being an all-inclusive attempt to define an improved version of HTML, the hypertext markup language for the World Wide Web, it appears to be the product of evolution by mutations and vendor-driven selection. Thus, a large set of individual changes have been applied to HTML 3.2, rather independently of each other.

These changes are rather miscellaneous and partially in conflict with each other, pragmatically speaking. They include

There are very few observable attempts to improve HTML as a logical, structured hypertext language. Most notably, there is a relatively generic and powerful generalization of the old IMG element, in fashionable object-oriented clothes (OBJECT element). This, together with the multilingual support, might be important enough to justify further refinement and acceptance of the draft, despite serious drawbacks elsewhere.

The basic structuring elements are the same as in HTML 3.2. In fact, very little has happened in this crucial area since the dawn of HTML. In particular, instead of introducing the logical sectioning (something like the one proposed in the ISO HTML draft), the HTML 4.0 draft goes to the opposite direction: it waters down the good old recommendation to use heading elements consistently, without skipping levels, by drawing away from it as follows: "Some people consider skipping heading levels to be bad practice." And as regards to the few new structural elements proposed, ACRONYM is badly designed, Q is introduced as a text-level counterpart of BLOCKQUOTE instead of reconsidering the idea of text level vs block level markup, and INS and DEL elements look rather homeless without their natural companion, an element indicating changed text.

The conclusion is that the next step should be to define a relatively small set of additions to HTML 3.2 by picking up from the HTML 4.0 draft the OBJECT element, the language support, the INS and DEL elements, added named character entities, and possibly some more. After approval of such additions, future work on HTML development should concentrate on the creation of better structured language, with features like fully nestable sections, rich markup for different kinds of emphasis, tables with columns specified structurally, and basic mathematical markup.

The size and format of the draft

In printed form, the HTML 4.0 draft is about 265 printed pages long, and it still contains empty parts to be added later. Compared with the about 77 pages of HTML 2.0, this exhibits an intolerable trend. The future HTML specifications should be strictly divided into the normative part and the informative part. (They should be clearly marked up using e.g. the idea of "generic colors" presented later in these comments.) The normative part may contain statements which are either clarifications or suggestions only, without being binding, but they should be flagged as such clearly and uniformly (using a uniform notation). On the other hand, the normative part should be explicit enough instead of relying on intuitive interpretation of names of elements and attributes, for example.

General comments

This part will present some general comments which apply to the draft as a whole or to phenomena occurring in several locations in the document. First some relatively technical notes on the draft are made, then some more fundamental issues are addressed.

The structure of element descriptions is not explained in the draft, although there seems to be a common pattern: heading, SGML declarations, omissibility of start and end tags (explained in prose), descriptions of attribute, short description of semantics, and possibly examples and notes. The semantic definition should be more prominent and explicit and should appear right under the heading. A heading should not be regarded as a substitute for definition. Neither should the semantic definition refer to vague in ordinary language; a definition like "The THEAD element defines the head" doesn't really define anything, since the draft does not say what a table head is in HTML. And an excerpt from a DTD (if included into the specification) should be ignored when considering whether the specification is explicit enough. Notice that the draft itself says that "DTD comments for HTML do have not normative value". (Perhaps this grammatically strange formulation reflects some uncertainty? ;-)

Whether the SGML declarations are needed inside (on interspersed with) the text at all is debatable. A reader who can make use of them could consult the DTD (without taking trouble of checking whether the declarations have been copied correctly from there!). For other readers, a prose description of the syntactic rules would be needed. (Providing a link to the SGML definition, i.e. a part of the DTD, would be quite OK, of course.)

Several examples exhibit "clickism", which is to be considered as a bad habit of referring to device and program dependent features. Less severe but still annoying is "hereism", which occurs as early as in an example in the discussion of the TITLE element and the title attribute. (If nothing better can be invented, even "There is a photo - -" would be better. It would not annoy so much eg. in a printed copy of a document containing a link to a photo.)

Some examples seriously suffer from another deficiency: they are not examples at all but sketchy schemes for writing an example. Even more seriously, this occurs especially with new features proposed. For example, "examples" on multilingualism do not actually contain several languages, and "examples" on THEAD and TFOOT do not contain actual headers and footers but metatext like "header information".

There are several fragments of illegal HTML, labelled as ILLEGAL EXAMPLE. Obviously, the purpose is to illustrate a prohibition. However, incorrect examples, no matter how clearly flagged as such, often affect people's minds in the wrong way. Nobody is perfect, and we tend to forget things; even worse, we migh remember an HTML construct but forget the context, such as being an illegal example. Therefore, illegal examples should be removed. On the other hand, prohibitions (such as "Links and anchors defined by the A element may not be nested") can and often should be explained further e.g. by rephrasing them ("That is, an A element may not contain another A element, directly or indirectly") and followed by a positive suggestion, if possible (such as "On the other hand, you can write a dual-purpose A element which has both a name and an href attribute").

The (printed version of the) draft contains phrases like "These examples illustrate the rendering of - -", without being really accompanied with an illustration (at least in the PostScript version of the draft). Moreover, illustrating renderings in an HTML specification should usually be avoided, since it tends to suggest that the given rendering is the rendering and not just one of myriads of possibilities.

Some notes in the draft are neither explanatory nor recommendations to HTML authors or user agent implementors but rather excuses. For example, the final note under the heading "Titles: the TITLE element and the title attribute" should be just a note in the files and minds of HTML specification developers. Now it looks half an excuse (without really saying any excuse), half a promise (without giving any promise which would be of some importance to HTML authors or software implementors). Moreover, it appears in very peculiar place, since encoding phonemic and prosodic information is a much wider question than providing a new attribute to be used in conjunction with the TITLE element or title attributes. The speech synthesis issues for normal text in the document body are more important!

The draft lacks elements for mathematical expressions, although a relatively good proposal for them was made already in the HTML 3.0 working draft. Assumably the idea is to leave such things to he handled using separate XML based solutions like MathML. In practice, this seems to involve highly complicated markup which requires a few dozens of elements in order to mark up a simple second degree equation. But the more fundamental problem is in the very approach.

Knowing how difficult process the development of the HTML language has been, it seems temptating to isolate problematic areas so that they will be handled separately, in separate specifications, by separate groups of people, creating separate DTDs. This operation, if successful, will kill the patient: HTML as a common language, which can be expected to be supported by a wide range of user agents - HTML as the language for the World Wide Web. No matter how useful XML might be for specialized areas where it could be used to define additional languages which supplement the basic structuring mechanisms of HTML, it will be most harmful to the Web if it will be used to explain away the needs for improving HTML itself. (Analogously, would it be wise to stop developing better versions of a high-level programming language, saying that people can write their own specifications for doing in assembly language what cannot be done in the current version of the high-level language?)

Instead of everyone or every group defining their own languages, with no reasonable expectations on portability, we should - in addition to defining things like basic mathematical markup for HTML - consider adding some generic mechanisms which I will call generic colors. The idea is that an author often wishes to mark up some parts of the document as having some special role which cannot be expressed in normal HTML. (It might be role which is common enough to be included into HTML in some later version.) For instance, an author might wish to mark up some parts as normative and some others as explanatory, or, in a description of the features of some language, flag some of them as deprecated. Specifying e.g. some specific colors for the purpose in a style sheet is far from being a generic and structured solution. And what the author, or rather the communicative process, really needs is some way of distinguishing things. If physical colors are used, it should be up to the reader to select them.

One solution would be to introduce a generic markup for "generic coloring". For example, it could be something like

<MARK KIND="Normative specification" COLOR=1>
which would indicate the start of an element which is characterized by the value of the KIND attribute and is to be presented generic color number 1. Each implementation would be required to provide some minimum number (say, seven) different "generic colors". They could be physically presented as background colors, text colors, text fonts, tones of voice, in some uniform way which distinguishes texts from each other and from normal text. Binding generic colors to physical presentations should be user-configurable, preferably dynamically so that the values of KIND attributes are displayed to give a hint about the author's intentions. (In some cases, it might make sense to change the colors from user's defaults to reflect the particular purpose for which the markup is used.)

A language like HTML should provide a lot of continuity. Even if changes are generally accepted and rapidly implemented in browsers, there will be quite a many copies of old browsers around. We should not expect people to install new versions of browsers every month. Continuity has been a goal in the sense that old constructs have been preserved, perhaps even too long. But there is more. First, a browser should be required to inform when it encounters a document in an HTML version which it does not know or when there is some gross incompatibility with the DOCTYPE declaration and the actual HTML code. Such a report could be e.g. just a small blinking text which announces that the document seems to contain markup which is not suppored by the browser. But when users cannot see any such reports (even upon special request), they will not be able to realize that what they see may lack essential information or be presented in a manner which radically differs from the intended structure of the document.

Continuity also requires that authors are well aware of the changes from one (approved) version of HTML to another. This applies in particular to deletions and semantic changes which can make old documents nonconforming or semantically invalid. There is no official comparison of differences between HTML 2.0 and HTML 3.2; so it is no surprise that some inofficial comparisons are incorrect. Notice that in addition to new features, HTML 3.2 has some definite deviations from HTML 2.0. This will be an even bigger problem with HTML 4.0, unless the specification itself provides the (adequate) information. In fact, the development process should consist of defining changes to a base document rather than writing yet another HTML specification whose structure greatly differs from previous specifications. That way a specification of changes would be born naturally, and it would prevent the phenomenon that when something is written from scratch, essential information might be left out. Naturally, it would be just a matter of technical editing process to create a standalone specification of the new HTML version by combining the old one and the changes.

This process should be started in the creation of HTML 4.0. This means that a proposal for additions (and eventually other changes) to HTML 3.2 should be prepared, carefully selecting the really useful new features presented in the draft. Even for the public discussion, it would be better to have a relatively short document of proposed changes than a large document (where the changes haven't even been marked). The structure of the HTML 3.2 specification might be non-optimal, but there is nothing very wrong with it, and for continuity reasons we should not deviate from it without a very good reason.

The structure of detailed comments

The following comments refer to the single-document version of the draft, made available in PostScript form, since for the purposes of presenting comments such a form is superior to hypertext.

The structure of the HTML 4.0 draft as a mixture of various material (ranging from strictly formal and normative rules to vague informal notes) and as a large collection of sections without explicit upper level structure makes it rather difficult to present comments. (The single-document version contains two tables of contents with no explicit relation to each other. Here the first table of contents is used, partly because it seems to cover the entire material.) In the following, the division into sections (as visible in the table of contents) is used, using headings in the section as headings for related comments here.

For easier reference, all section headings (excluding the tables of content) are included here even if no substantial remarks are presented.

Although detailed, these comments try to avoid discussing minor details and errors of editorial or presentational nature. Such things should be dealt with at a later stage of the standardization process. Thus, any apparently very detailed remark here should be interpreted as a note discussing a detail of fundamental importance.

HTML 4.0 Specification

In the abstract, the characterization of HTML as "the publishing language of the World Wide Web" is technically correct, yet surprisingly strange. It deviates from previous practice of calling it "a simple markup language used to create hypertext documents that are portable from one platform to another". (Naturally, we should not stick to old wordings. But is the original idea so widely accepted and applied now that it needs not be mentioned?) Moreover, the abstract emphasizes accessibility to users with disabilities in a manner which is incompatible with the small scale of improvements in that respect. The phrase "great strides towards the internationalization" is in disproportion with the fact that the proposed multilingual features, although mostly necessary, only provide a basic general framework for supporting various languages. (The specification does not even require support to any particular language, even English!)

The statement that HTML 4.0 replaces HTML 3.2 leaves the status of HTML 2.0 open. Although HTML 3.2 contains a statement about its replacing HTML 2.0, it in fact postulates HTML 2.0 as its base document (by leaving essential things unspecified, forcing readers to refer to HTML 2.0).

About the HTML 4.0 Specification

This section is obviously nonnormative. As regards to its content and the structure of the specification, please notice that the draft lacks the kind of reference material which is probably what HTML authors would most often use: a summary of all elements in alphabetic (or thematic) order, with short syntactic and semantic descriptions, including possible attributes. Naturally, such a reference need not be part of the normative specification, but preferably it should be prepared or at least approved by the same authority.

Introduction to HTML 4.0

The section is marked as "being written". It is debatable whether such a section is needed at all in the specification. Introductory material is definitely needed, but quite different introductions are needed for different readers. Thus, they are best written by people and organizations which are involved in practical support to HTML authoring and which can adapt their material to various local and personal needs.

Design principles of HTML 4.0

The subsection on interoperability takes it for granted that interoperability requires providers to develop "different versions of documents". This is contrary to the essential design principles and part of the common strawman arguments against "purists".

Fundamentally, Web interoperability requires that documents be marked up in a consistent structural manner and that browsers and other user agents display or otherwise use the documents in a manner appropriate to the user's situation. The obvious implication should be that the HTML language should evolve to allow more consistent, structural, and universal markup - that is, to be richer in expressing structure.

Multilingual support should be labeled as such, or as language-specific support, not as "internationalization". (Typically, multilingualism is important for national documents, whereas most international documents use just English.)

There are definite improvements to the possibilities of providing wider accessibility to visually impaired, but the results should not be exaggerated. The draft does not even make the ALT attribute of an IMG element mandatory, and it shows a lot of understanding as regards to the common misuse of BLOCKQUOTE for indentation or the use of tables for layout. Most importantly, encouraging the use of style sheets (in a context where that must mean author's style sheets) implies that authors will attempt to control presentation to a much wider extent than nowadays. Far from improving accessibility, author's style sheets seduce authors to design their documents for some very particular presentation environment.

The note on tables says that the "new" table model is "based on" RFC 1942. A comment in the DTD seems to claim that HTML 4.0 tables conform to RFC 1942. The exact relation should be made clear, and any differences should be listed carefully (in an informal annex). On the other hand, RFC 1942 is relatively strongly presentation-oriented. Instead of allowing the author specify the logical properties and class of a table (such as being a numerical table), it requires the author specify presentational attributes (such as alignment on some particular character). Thus, perhaps tables should be left as they are in HTML 3.2, until some essential improvement can be agreed on.

The OBJECT element being one of the most important features, it should be referred to under a more appropriate title than "compound objects".

The remark on "Ease of use" has great confidence on "powerful HTML authoring tools" which will "flourish". Currently most such tools produce grossly invalid and nonportable HTML. Suggesting that they will make eg. the creation of forms easier implies that forms be designed in the WYSIWYG manner. Such an approach implies that forms need not be designed as structural and logical constructs but as visual products. Quite apart from these fundamental things, the proposed constructs related to forms are unnecessarily complicated.

Designing documents with HTML 4.0

This section is probably intended to give recommendations on document design, despite the fact that the first statement mentions "implementation" as well.

In a sense, the section looks like a parody. First it refers to the widely propagated idea of separating structure and presentation and to the "roots" of HTML which involve specification of structural markup, then proceeds to the idea of replacing presentational features by style sheets "as HTML matures". Ultimately, it views the separation as a technical aspect in document design, not as fundamental separation of document design and document rendering - the jobs of the author (and languages and tools for authoring) and the user (and browsers and other tools for utilizing documents). This seems to imply the view that replacing a simple attribute like ALIGN=CENTER by a STYLE attribute containing corresponding instruction in a language external to HTML is a great improvement.

Similarly, the notes on universal accessibility emphasize accessibility to people with disabilities in a manner which seems to present this as an obligation to add some features to documents, not as a principle of overall design. To do some injustice to the draft - quoting something slightly out of context - it says that it only recommends that designers consider alternate renderings in their design.

And finally, these design recommendations suggest that designers help user agents render tables more quickly, obviously by giving presentational attributes. It might occasionally be useful to do so with large tables, although it is debatable whether such a feature should be included into HTML. But it is certainly wrong to suggest the use of such facilities as one of the three principles under the title "Designing documents with HTML 4.0". (The natural way to pay attention to eventual performance issues is to keep tables in reasonable size, dividing them into pieces if needed, and to refrain from using tables for layout.)

A brief SGML tutorial

This tutorial appears to be very useful especially for those who only need to know SGML for the purpose of understanding HTML DTDs. It deserves careful study and comments. It is mostly skipped here since this section does not belong to the normative part. Naturally, there should be an appropriate normative statement somewhere, defining HTML 4.0 as an SGML application.

However, the tutorial contains some statements which may be regarded as recommendations on HTML usage or on terminology related to HTML. The note about elements not being tags can be misleading, since it suggests that a phrase like "the P tag" is incorrect; in fact, there is hardly any reason to think so, provided that the phrase is used (as it usually is used) to refer to the construct <P>, which actually is a tag.

Another note seems to recommend the use of lower case letters for (assumed) compression performance reasons. Such reasons can hardly be so important that they should be presented in an HTML specification, still less in any introduction to SGML.

Information about the reasons behind the notions "block level" and "inline" do not really belong to an SGML tutorial even in an HTML context. It explains the rationale behind making HTML 4.0 (and, consequently, its DTD) what it is, not how the DTD is to be read.

Definitions and Conventions

The definition of "conforming user agent" is vague. Perhaps the verb "observe" has a suitable meaning in some forms of the English language, but in international contexts it is better to say eg. "satisfies - - conditions" than "observes - - conditions". In fact, the simple definition of the term in HTML 2.0 specification is much better: "A user agent that conforms to this specification in its processing of the Internet Media Type 'text/html'." Notice in particular that the draft proposes that conforming user agents be required to "try to render the content of any element is does not recognize" and to ignore unrecognized attributes. These requirements are a definite change to the old policy and would encourage people to use vendor-specific or otherwise nonstandard markup.

The concept of "deprecated" features is useful in itself, and the definition probably needs no improvement. However, the way in which the concept is used in the draft is very confusing and contradicts the definition. There is a large number of features which are not in HTML 3.2 and which are now proposed for inclusion into HTML 4.0, simultaneously declaring them as deprecated. The only conceivable reason to this is to say to browser vendors that they must support some features introduced by other vendors in the past and now standardized, although they are not recommended to authors. Of course, most authors will not see this fine point. Giving examples, even if indicated as deprecated, will be taken as an encouragement or at least acceptance.

And what is the point of requiring support to features which have not previously been accepted to public standards and which cannot be recommended to authors? Such policy would just slow down browser development by requiring new browsers to implement support to old hacks if they wish to claim conformance to HTML 4.0.

As regards to "obsolete" elements, HTML 3.2 declared XMP, PLAINTEXT, and LISTING as obsolete but still described them. They were obsolete even in very early HTML drafts prior to HTML 2.0. On the other hand, some features of HTML 2.0 were simply dropped away from HTML 3.2 instead of marking them as obsolete. The HTML 4.0 is more sensible, but to avoid any confusing idea of those elements still belonging to HTML 4.0, too, in some obsolete way, it is better to remove the deletion of "obsolete" from the definitions section and simple say in the change history that those elements have been entirely removed.

The remark that HTML files are usually given the extension ".html" or ".htm" looks relatively odd. What is the message? Web users have probably noticed that before they start reading any HTML documents. What may really puzzle them is which one should they use and how does the recognition of HTML files really take place. (Basically, an HTML specification is not suitable for answering such questions, but it might contain a reference to some reliable information.) And if there is a heading "Document names", one would expect it to answer important practical questions (eg. which characters are allowed in document names).

HTML and URLs

This section is obviously intended to be informative. It is in some respects insufficient or even misleading. On the other hand, it contains some normative information (most importantly the RFC references), which should be clearly marked as such.

The term "fragment URL" is technically incorrect. Both RFC 1738 and especially RFC 1808 refer to the construct of the form #fragment (so that the wording about the URL specification not offering a mechanism to refer to a location is not quite accurate), but they specifically say that it is not part of a URL.

Consequently, the specification of HTML should make it clear that in contexts where the syntax allows a URL, it may have a fragment identifier appended. This together with the RFC references should be presented as normative information.

The wording "In addition to HTTP URLs, authors might want to include MAILTO URLs - -" is misleading, since it might be read so that it suggests that only the schemes http: and mailto: can be used.

The statement that "User agents may support MAILTO URL extensions that are not yet Internet standards - -" is a strange way of encourageing the use of nonstandard features which may cause a lot of trouble. (User agents may support whatever they like, but mentioning something specific in this respect in an HTML specification is actually regarded as some sort of acceptance and encouragement.) Simultaneously it seems to claim that the ?Subject extension will be an Internet standard. There is little reason why it should, since forms are more suitable for initiating messages with the subject field (or other fields) predefined or prefilled.

HTML Document Character Set

This section, too, is mostly informative - and about an area for which clarification is really needed. However, the normative part should be clearly and distinctly presented so that it is easy to distinguish the exact rules from explanations.

Currently there seems to be the requirement that a user agent map characters to Unicode correctly for any encoding it recognizes but no requirement on such recognition. (The ISO 8859-1 is definitely suggested to be among them, but it is hard to find a definite requirement on that.) It is not even clear what recognition means here. Does it mean that the name of an encoding is recognized in the process described in the draft (eg. by picking up an HTTP "charset" parameter in a "Content-Type" field) or that such a name is actually recognized as being one of those supported by a user agent? Probably the latter, since otherwise user agents were required to support all encodings, which would be somewhat unrealistic.

There seems to be an important but quiet change. Previously ISO 8859-1 was defined as the default encoding, and HTTP 1.0 recommended servers to omit the "charset" attribute altogether when ISO 8859-1 was used. The changes might be worth doing, but they should be made more explicitly and emphatically. (There is work in progress on defining ISO 8859-0, and such a character set an encoding might become standard in the European Union. So it might well be best to remove the special role of ISO 8859-1.)

Notice that there seems to be no requirement on user agents specifying in their documentation which encodings they support or on reporting that they have encountered an unrecognized encoding. The problem, mentioned in the draft, that using "heuristics" (actually, wild guesses) "may lead to an unreadable presentation" would be relieved if browsers were required to report it to the user. The same applies to the situation that encoding is actually specified for a document and the name of the encoding is correctly extracted by a browser; in such situations, a browser should be required to report that, irrespective of whether they display the document applying some other encoding. Notice that in many cases the result of applying an incorrect encoding is not unreadable, just wrong. For instance, if a document has been encoded in some variant of ISO 8859 and it is displayed using another variant, the result might well be that most characters look as intended while some national characters are replaced by some other characters. (The reader might not even realize that he gets some words wrong.)

Referring to 'the Cyrillic letter "I"' is strange. What it probably means is the Cyrillic letter which is in some sense the counterpart of the Latin letter "I" but looks quite different. Does the HTML version of the draft refer to the Cyrillic letter meant here in the manner suggested in this part of the draft? - Moreover, the phrase 'the Chinese character meaning "water"' is hardly unambiguous; the Chinese probably has several characters used to denote water.

The draft refers to RFC 2045 for information about "charset" values. However, that RFC does not seem to define those values. It only defines generic syntax involving parameter=value and mentions charset as one possible parameter, without defining its value set. The registry of charset values is kept by IANA, in accordance with RFC 1345.

The recommendation to encode any instance of " by quot; is new, and it is unnatural for languages where that character is used as the normal quotation mark. Moreover, the question why &quot; was left out from HTML 3.2 has never been answered satisfactory in the public. Two quite different reasons have been given: simple mistake, and the reason that &quot; is somehow reserved for SGML reasons. See http://www.w3.org/MarkUp/Wilbur/ on one hand and http://d1.ph.gla.ac.uk/%7eflavell/iso8859/iso8859-pointers.html#quot on the other. Although this issue need not be discussed in the HTML 4.0 specification itself, it needs to be clarified before &quot; can be accepted. (If &quot; is really reserved, the natural solution would be to introduce &quote; for the " character. Browsers might still recognize &quot; the same way they have done, but authors should use the new mnemonic.)

Basic HTML data types

This section is obviously partly normative (eg. syntax of color values), partly informative in the sense of giving recommendations (mostly on color usage). As regards to the latter, it is illogical to say that the use of HTML elements and attributes for colors is deprecated and then proceed to giving advice on using them. (Moreover, the advice is debatable. For instance, referring to "common conventions" is vague; most people will either not understand it at all or take it as referring to color usage in some currently popular browsers with their default settings.)

Referring to CSS1 specification for normative information about the definition of a pixel deviates from the normal and natural practice of not implying any particular style sheet specification system in the specification of the HTML language. Something along the lines of the HTML 3.2 specification wording in this context should be applied.

The global structure of an HTML document

The wording that authors should include a DOCTYPE declaration "resembling" something is too indefinite. Authors often use incorrect DOCTYPE declarations, and the draft wording seems to say that it does not matter. Moreover, such declarations might be included by software, not by authors. Consequently, the specification should simply require that an HTML document contain a DOCTYPE declaration corresponding to the version of HTML used, then list (as the draft does) the variants that may be used for HTML 4.0.

This section refers to "strict" HTML 4.0, but the draft does not contain a DTD for strict HTML 4.0. Notice that "strict" is a misleading word here. It is meaningful and usual to say speak about strict conformance to a DTD, although strictly speaking it simply means conformance, But strictness is not a meaningful property of a DTD. "Structured" would be better. (Or perhaps "Puristic".)

Illogically, the draft says that an HTML document, except for the DOCTYPE declaration, "should be enclosed by the HTML element". This seems to confuse tags and elements in a manner against which the draft warns elsewhere. By definition, an HTML document, except for the DOCTYPE declaration, is an HTML element. The description of "a typical HTML document" seems to imply that a document should contain explicit HTML tags. The DTD does not seem to imply this. This problem seems to be caused by the finesses of SGML as well as the introduction of frames. Hopefully the problem will disappear due to removal of frames.

In the description of the META element, there are several examples which assume some particular user agent behavior, possibly according to some specification external to the HTML specification. They should be clearly flagged as being such. Otherwise they will easily be taken as part of the HTML specification. This applies eg. to page "refresh", search engine treatment of name="keywords", and PICS.

On the other hand, the provision of meta data should be standardized (in HTML specification) as regards to some minimal information concerning the document. Such a standard should also fix the way in which the information is given in an HTML document, instead of a multitude of different ways. It should also recommend user agent behavior on the display of the basic meta data by default or by explicit user request. Otherwise authors will have to provide the same information both as meta data and as part of the document body, which is illogical, unsystematic, and error-prone. The profile system is acceptable as an addition to such standardized basic system, not as a replacement for it.

The draft discusses the title attribute in conjunction with the TITLE element. Although semantically understandable, this breaks the structure of HTML and probably makes the specification hard to read, still harder to use as reference material. More importantly, the meaning of the title attribute is too vaguely defined. Although the definition is above the average, just saying that the attribute "offers advisory information about the element for which it is set" is not sufficient. For example, the draft contains a LINK element with with title="The manual in Dutch". Does the attribute really offer advisory information about the LINK element? Shouldn't we rather say that in a LINK (or A or OBJECT) element, the title attribute provides such information about the linked resource which is relevant in the referring context? (Notice that if the attribute provided information about the linked resource per se, it would be natural to write its value in Dutch in our example.)

In the draft, the DIV element is defined as a block element as in HTML 3.2 but now with text level (inline) SPAN element as its counterpart. The draft says that "user agents generally place a line break before and after DIV elements", while HTML 3.2 specification says that DIV terminates an open P element (and the draft says the same) but emphasizes that user agents are not expected to render paragraph breaks before and after a DIV element. This sounds really confusing. The natural selection would be to introduce a new element which is a pure grouper with no implications on line or paragraph breaks or inline vs. block whatsoever. (For compatibility, DIV might be preserved as a deprecated element with the old meaning.) - As regards to the example on using SPAN, it is unclear what it is intended for. Attaching class names to names of fields in a record might perhaps be useful, but it would probably be more useful to have class names for the fields themselves.

The heading elements have been part of HTML since the early days. Very little has happened around them, although the system is logically unsatisfactory. Instead of allowing the natural way of dividing a document into sections, subsections, etc, HTML just allows the author designate some phrases as headings of different levels. This implicitly contains structuring provided that some rules are obeyed, such as not skipping header levels. The draft actually removes the old recommendation on structured usage of headings by calling it just some people's matter of taste.

The ISO HTML draft, despite its serious shortcomings, provides a better solution in this respect: true sectioning mechanism. (Naturally, the old heading system should be supported and legal, although possibly deprecated, during a long transition period.) Although it would be important to add even better structuring facilities, such as optional introductory, summary, and conclusions elements within a section, even a simple sectioning along the principles suggested in the ISO HTML draft would be a definite improvement.

The HTML 4.0 draft now suggests the use of DIV for associating a heading with the corresponding section; this may help to achieve some presentational control, but it cannot serve the same fundamental purposes as true sectioning. (A user agent could be expected to use true sectioning information in a meaningful way, but it would be inappropriate to deal with all DIV elements as indicating sectioning.)

Structurally it would be better to have a uniform system with just one kind of nestable section element (which might be named SEC, for example) specifying a section, subsection, or corresponding lower level part so that a heading element (which might be named H) is required within it. (Perhaps even sectioning elements without headings might be allowed, and this might lead to a natural way of replacing the P element, simultaneously prodiving a subparagraph mechanism.) The heading element would be in a position similar to the HDL, SUM, CONCL, etc elements proposed below. That is, a heading would be just one constituent (although perhaps a required one) of a section structure, instead of being something which implicitly defines a section. Naturally, user agents would interpret the heading elements according to the level of nesting the sectioning elements.

The uniform system outlined above would imply that it is much easier to embed an HTML document to another, no matter whether such embedding is done using editors, authoring tools, servers, or user agents. Sections could be written so that they are not dependent on the level of nesting into which they will be embedded - no need to change H2 elements to H3 elements when making a document part of another document which already uses H2 elements for higher-level headings. This would make it simpler to maintain the same information in various forms and contexts.

Admittedly, the change would complicate some operations, preventing e.g. the extraction of all second-level elements from a file using a very simple method (which picks up all H2 elements). On the other hand, by that very implicitness, HTML code to be included into various contexts could be written or generated more easily and more naturally.

Probably the original reason for the simple (simplistic) system of headings with no section markup was to make browsers simpler: a browser can process each heading element in a context-free manner, just mapping the heading level to some particular font size and style, for example. At the present stage, CSS1 requirements already make the internal functions of browsers rather complicated. Moreover, adding simple bookkeeping (of nesting levels of sections) to browsers would not slow things down in any significant amount. Browsers might, of course, apply a more complicated, two-pass strategy which first counts the maximum depth of section nesting, then selects the presentation of headings of various levels appropriately. But that's just one possible implementation.

The role of the headings of sections, subsections, etc, is different from the heading of the entire document. Currently there is neither a recommendation nor established practice on the presentation of the document heading. Some authors use H1, reserving H2 for first level sections etc, whereas others use H1 for both, possibly attaching ALIGN=CENTER to the document heading. And several authors seem to feel compelled to try to control the presentation of the document heading otherwise, to make it more distinguished. This illustrates how the quest for presentation control reflects (in addition to misunderstanding of the nature of Web publishing, of course) the lack of suitable structured modes of expression. There should be a simple and single method for marking up the document heading and a simple and single method for marking up the sectioning, subsectioning, etc, and attaching headings to such parts. Probably the best way is to introduce a new element for the former purpose. (This naturally raises the question what the TITLE element would be for. A very good question indeed. More than often, novice authors get very annoyed when they see that their title text is not displayed. It is useful to be able to provide two different document headings, "internal" and "external", but not very wise to force authors to write both of them even if their texts are identical.)

Typically, H5 and H6 elements are abused to affect text font size, assuming the common but grossly illogical browser behavior of presenting in a font smaller than normal text. Since a well structured document can hardly contain more than four sectioning levels - division into several files should be applied to very large documents - the H5 and H6 elements should be declared deprecated.

The ADDRESS element is homeless, as the draft implicitly admits by excusing as follows: "For lack of a better place, we include the definition of the ADDRESS here." Since document author information is part of basic metadata, it should be expressed as metadata in a standardized format (with fields for first and last name, postal address, E- mail address, home page address, telephone, etc, and free text field for any additional information). If this can be achieved so that browsers will be able to display that data in a user-specified form, either as part of the document or separately, the ADDRESS element should be declared deprecated.

Language information and text direction

The possibility of providing language information is an important improvement. However, it is only a possibility, and it takes a lot of work before it becomes really useful. It should be remarked, as a basic pragmatic note, that authors should be encouraged to provide such information but not rely on wide utilization of that information by browsers and other software in the near future. Consequently, authors should be adviced to concentrate on providing language information in most crucial situations. This includes the language of the entire document, the languages of direct quotations within it (if different from the main language in the document), and some other cases. Although it would be logical to specify the language of proper names as well as terms taken from other languages (such as status quo), it would currently take too much effort to do so in general, and it is far from being clear what expressions are really foreign in this sense. (For example, should the word fiasco within English text be marked up as being Italian? Or should foreign words be marked up only if it is desirable to have them designated as foreign in some special way, such as in italics or some special tone of voice?)

In most cases, text in a language other than the overall language in a document is some kind of quotation. (There are other possibilities, such as providing the same information on a page in several languages, but it is usually more advisable to make them separate HTML documents.) The text and examples in the draft are somewhat contradictory in this respect. There is real danger that the lang attribute will be used as a replacement for the logical elements for various quotations (BLOCKQUOTE, Q, CITE), not in conjunction with them.

The statement that the default language is "unknown" needs to be reformulated. Is the string "unknown" expected to be a language code as specified by RFC 1766, as the formulation (especially the use of quotes) suggests? In that case, it would be natural to specify the default value in the DTD, too. Or should the statement simply say that in the absence of any language specification, a user agent should assume that the language is unknown? What that means is an interesting question; probably it is assumed to disable all hyphenation (except where explicitly allowed) and some vanilla (but exactly which?) presentation of things like quotation marks. Notice that the current formulation could easily be understood so that in the absence of a lang attribute, "unknown" language is assumed for an element, while in fact the language information is inherited (as explain in the draft, but later.)

The wording in the subsection Inheritance of language codes seems to imply that a user agent may have a default value (which is some particular language, not "unknown"). While this might possibly be useful, it contradicts the idea of default language being unknown.

Most authors of documents written in (normal) English will probably assume that they need not provide any language information. The HTML 4.0 specification should explicitly recommend authors to provide language code for all documents, including those written in English. The natural way to do this is to include a lang attribute for the HTML or BODY element. (It is not really the job of Web servers to provide such information.) The draft contains a recommendation in this direction but only in the annex Performance, Implementation, and Design Notes and without explicitly suggesting the use of the lang attribute for all documents.

The exclusion of computer languages is in a sense understandable due to the urgency of support to human languages, but in the future mechanisms should be developed for expressing that part of a document is written in a particular computer language. The reasons for this are partly quite similar to the situations where information about human language can be used, partly specific to computer languages. For instance, if a document contains sample programs in some language, the reader might benefit a lot from the ability to see them formatted according to his personal preferences as regards to font usage, indentation, bolding or coloring keywords, etc. (Naturally, HTML should simply provide a generic mechanism for indicating the language so that an appropriate browser function or plugin, if available, could be automatically invoked by a browser.)

A simple way to include computer languages would be to define that lang attribute values beginning with "comp" refer to various computer languages (including command languages, programming languages, hypertext markup languages, etc). First people might start using just lang="comp" to have the element interpreted as being in some generic or unknown computer language, typically representing in a monospaced font (which might, however, be different from other monospaced fonts used in the display of documents). Later a method should be fixed how specific computer language names are defined and registered, similarly to the registration of Internet media types or charset values. In fact, even before such standardization people would probably use several values like "comp-C", "comp-Fortran-90", "comp-HTML", etc, in similar ways.

The example under the heading Inheritance of language codes is not really an example. Despite the lang attributes, all text there is in English. A real example could include, for example, a quotation from a book in French and some Latin terms within English text.

In the discussion of bidirectionality, the text refers e.g. to HEBREW2 as being Hebrew text, even referring to the "H" in it. Obviously there should be some text which is really in Hebrew. Similarly, examples on support for character directionality and joining should contain real Arabic characters and not just their names in Latin letters. One cannot really present examples of implementing multilinguality without giving honestly multilingual examples. (If this requires the use of images when working under HTML 3.2, so be it.)

Paragraphs, Lines, and Phrases

(This section seems to have two alternative names, the other being Text.)

The definition of white space is, as in previous HTML specifications, vague and implicit. Instead of giving a list of examples ("such as - -"), the specification should explicitly define the concept of white space in HTML context. In particular, it should clearly say whether no-break space is white space or not.

The draft mentions the PRE element as the only exception from the rules of whitespace handling. Both previous specifications and established and useful practice suggest that TEXTAREA should be treated similarly.

The set of phrasal elements is almost the same as in HTML 3.2, especially since the Q element is for some reason not classified as phrasal. Notice that the wording of the meaning of CITE is still unclear, as in previous HTML specifications, which have very often been misunderstood in this respect. Although there is less room for misunderstanding, if the Q element is included, the wording should definitely say that the CITE element is used to mark up an expression (such as a book title or code number for a standard) which constitutes a reference (citation) to an external source.

The proposed ACRONYM element is debatable. If it is intended to mean that the enclosed text is an abbreviation formed from the initials of a sequence of words, this should be defined explicitly in the specification. It should be empasized that the pronunciation of such an expression can not be deduced from the use of ACRONYM. The statement that "acronyms are generally spoken by pronouncing the individual letters separately" is either incorrect or irrelevant, depending on the interpretation of the words "acronym" and "generally". On the other hand, this means that the usefulness of ACRONYM would be limited to spell checkers, and for such use a more general element is needed, such as an element for any proper name. (Admittedly, browsers could utilize ACRONYM markup so that they somehow allow the reader see the value of the title attribute, giving the expansion. Notice that the draft neither requires nor even recommends that. Moreover, since a title attribute is allowed almost everywhere, authors and browsers could use such a technique irrespective of the introduction of an ACRONYM element.)

As regards to speech synthesis in general, the approach suggested in a note in the draft is unsatisfactory. Dictionaries are not sufficient. There are cases where an explicit indication of the pronunciation of a word is needed. To take a trivial example, the English word record has different pronunciations (and different word divisions) depending on its meaning. Therefore, at a later stage of HTML development, some way of specifying the pronunciation of an individual occurrence of a word should be provided. (This might be an attribute which takes a string of IPA characters as value.)

There is definite need for new phrasal (or in fact structural) markup. There are several reasons for the common use of font level markup, but one of them is that the available repertoire of phrasal markup is unsatisfactory. (The probably counterargument against adding new structural markup is that everyone should use XML. That kind of thinking would require separate discussion (see some notes above). Here it assumed that definite, well-defined, widely known and supported markup is needed for the basic structuring of Web documents. Extensibility in particular areas of application can be useful as an addition to that, not as a replacement.)

According to HTML 2.0 and HTML 3.2 specifications, the EM element denotes emphasis, but this is defined rather vaguely: "basic emphasis typically rendered in an italic font". (Obviously, the word "basic" here is to be read as opposing "strong" emphasis expressed by STRONG.) Ignoring the fact that the HTML 4.0 draft is even more laconic ("indicates emphasis"), the fundamental question is what does it mean and imply that something is emphasized. There can be several essentially different reasons to emphasize, or rather different emphases. For example, emphasizing a key word or phrase in a paragraph is quite different from emphasizing an entire paragraph (such as a summary). Using the same kind of rendering for both (whatever that might be in each browser) is unsatisfactory. For example, using bolded font might be quite suitable for emphasizing an important word, while for an entire paragraph some special background color or a red bar in the margin might be much better.

Consequently, we need different elements for different kinds of emphasis or, more properly expressed, essential structural elements which would replace EM and STRONG. (Obviously, for compatibility reasons the generic EM and STRONG elements should be preserved for the time being, but in the future they would convey the idea of emphasizing something without really knowing why.) In fact, HTML 3.2 introduced DFN for defining occurrence - something which was most probably marked up using EM by a structuralist author in HTML 2.0. Unfortunately DFN is not supported adequately by popular browsers, but it is definitely an example of the direction to take.

Further replacements for EM might be something like the following:

OPP
indicates opposition or contrast with information given in a nearby element; would replace EM e.g. in
The OPP element is <EM>logical</EM> markup as opposite to <EM>physical</EM> markup such as the I element or the <EM>pseudo-logical</EM> EM element.
WARN
indicates warning
SUM
summarizes the essential contents of preceding text
HDL
"headline", summarizing the essential contents of text to follow
ATT
emphasizes a word or phrase, suggesting that the reader (or listener) should pay special attention to it when reading a statement; this comes closest to "plain emphasis" that one can imagine but it has the specific feature of making a common word (even a word like "a") important in a context
KEY
indicates a key word or phrase, typically one which shows what a paragraph discusses; notice that KEY is intended just to indicate visually or audibly that something is important whereas ATT IMP can make it important, often changing the tone or even the meaning of a statement; an indexer would probably pay a lot of attention to KEY but perhaps ignore ATT!
PROP
indicates suggestion or proposal (often presented after a lengthy introduction and discussion); in one bureaucratic style of presentation, such things are indicated by a ./. mark in a margin, but Web browsers could use much more advanced methods of making proposals look prominent
CONCL
draws conclusions, such as presenting what is considered as proved (this is not the same thing as a summary, since a summary may also include short presentation of the basic line of the reasoning)
Perhaps this list could be shortened by combining some elements into one, or perhaps it should be extended a lot. In any case, the implementation might be initially very simple: a browser would just need to recognize the elements and present each of them in one of the ways it uses for EM or STRONG now. But users would be able to develop their own presentation instructions (by browser options or by user style sheets) according to their personal preferences and needs.

Notice that for several emphasizing elements outlined above, the typical renderings of EM or STRONG might not be optimal. For instance, it seldom makes sense to have a long paragraph displayed or printed in bold face just because it is an important summary. Such text is hard to read, and any method which catches the reader's attention and makes it clear that this is a summary is sufficient. It could be distinctive background color, a vertical line in the margin, or a short musical prelude. The point is that each of the emphasizing elements might have its own presentation.

Current HTML has two elements for emphasis of different strength, yet no element for deemphasis. Some people use the SMALL element or the FONT element with size attribute, but this is illogical. Still less should we use H5 and H6 elements for deemphasis, as many authors seem to do! Moreover, expressing deemphasis with font size change is just one possibility, which might even be technically not available in several situations. Even if it can be done, it might not make sense. A good browser customization sets normal text font size to something which is conveniently legible, yet not larger than required for that. This means that anything significantly smaller might be illegible to the user. Thus, deemphasis might more suitably be expressed by using (perhaps in addition to small decrease in font size) smaller vertical space between lines, a different font, a different background color, some indentation, some drop in the volume of synthesized speech, or perhaps making the deemphasized text into a footnote. Naturally, some of these presentational methods are applicable only if the deemphasized text is a whole paragraph (or more). Demphasis within paragraphs might be expressed simply by putting the text into some kind of parentheses or brackets.

Since emphasis is actually a set of rather different things, for which different elements were proposed above, doesn't this apply to deemphasis, too? To some extent, yes. Probably the most usual kinds of deemphasis would be covered by the following new elements:

REM
a remark which does not belong to the main flow of thought, such as a note about the history of a phenomenon
DET
detailed information; typically to be used in textbook-like documents to indicate passages which may (or perhaps should) be skipped on first reading

The semantics of nesting has never been discussed in HTML specifications and drafts, except very marginally. For example, the only thing that the HTML 3.2 specification says about nested text elements is the following:

User agents should do their best to respect nested emphasis, e.g.
  This has some <B>bold and <I>italic text</I></B>.
So it does not really specify the semantics but gives a vague recommendation about rendering. It describes the B and I elements as emphasis, probably supporting the popular idea of STRONG and EM being just snobbish words for B and I. Semantically, it seems to imply that the effect of nesting is additive (cumulative). But is this really so in general, and should additiveness be interpreted with respect to the logical meaning or with respect to rendering?

For example, assume that we have a STRONG element which contains an EM element. Which one of the following interpretations should be applied?

The latter interpretation is obviously quite "presentational" and implies that in many cases elements have no effect. For example, assuming that a browser presented EM and VAR using italics (which is rather usual and normal), a VAR element within an EM element would not be distinguished from the other content of the VAR element. It's not clear what could be done according to the first interpretation, since it's not just a matter of emphasis, but that interpretation would at least give us some chances.

The draft contains the Q element, which is a text level counterpart of the BLOCKQUOTE element. An element for designating quotations is definitely needed. However, instead of prodiving two quotation elements, we should reconsider the idea of having text level and block level markup as a fundamental division. (Notice that style sheets have already made this division somewhat irrelevant.) Essentially, block level markup is something which involves some kind of paragraph breaks. Isn't this a presentational feature, not structural property? Moreover, the simple idea of something being a quotation need not be mixed up with paragraph division. The quoted text itself may or may not consist of paragraphs, and this should be indicated separately, using the normal markup for that.

Generally, the division into block and text level elements provides superficial structurality only. Perhaps there are some elements which might reasonably be restricted so that may not contain some specific "higher level" structures. But the current restrictions are excessive. There is, for example, no logical reason why an emphasis could not be applied to a sequence of paragraphs. The presentation of such emphasis might differ from emphasizing a few words, but this should not be of fundamental importance in the design of the HTML language.

Now, returning to the issue of marking up quotations, a logical solution would be the following. The Q element is introduced so that it may appear either as a text level or as block level element. By itself, it does imply paragraph breaks. For compatibility, BLOCKQUOTE is preserved (as a deprecated element) and defined so that
<BLOCKQUOTE>quotedtext</BLOCKQUOTE>
is equivalent to
<P><Q>quotedtext</Q></P>
Deprecation BLOCKQUOTE would be wise for the additional reason that it is so commonly misused as a mechanism to affect layout (indentation). Notice that the draft encourages such misuse by describing indentation as the normal rendering (which is an oversimplification) and by recommending that user agents not insert quotation marks due to the widespread use of BLOCKQUOTE for indentation. (It is not clear how speech-based user agents should behave according to such philosophy.) When the draft then says that the usage of BLOCKQUOTE to indent text is deprecated in favor of style sheets, it will hardly be taken seriously.

The concept of quotation should be clarified. Otherwise many people will be tempted to mark up phrases which are direct borrowings from other languages (like force majeure) as quotations. (If desired, the language of such phrases can be indicated using a SPAN element with the lang attribute.) Thus, a quotation should be described as a direct (or perhaps slightly edited for technical reason) quotation of text from some source, normally so that the source is explicitly mentioned.

There is important phenomenon of using words in a quotation-like manner without really presenting quotations: presenting words and phrases as linguistic objects. Linguistic objects need not be anything more complicated than the word "cat" (which is definitely different from any cat). In printed texts, there are three common ways to indicate them:

(Sometimes they are left totally unmarked, relying on people knowing that real oxen have no plural, or marked verbally by using expressions like "the word ox".)

As regards to HTML, such "quoting" of words deserves an element of its own. Otherwise there will be confusion, since some people will use Q, some others will use physical markup (according own their own presentational preferences), and some people will think they should use the lang attribute (which might cause different rendering when the linguistic objects are from another language than the surrounding text!). Perhaps some people will even think that the SAMP element is suitable (it was clearly suggested for this purpose in HTML 2.0!).

An important related problem is how to mark up extracts of computer code (such as commands, scripts, or program code) and man-machine interaction which takes place textually. To take a simple HTML related example, how should we mark up the word "alt" when writing about the alt attribute in HTML? The most obvious solution is to use the CODE element, regarding it as a (small) fragment of computer code. Strangely, the HTML 4.0 draft uses the SAMP element, using the class attribute and a style sheet to affect the rendering, although it clearly defines SAMP to be used for "sample output from programs, scripts, etc". (This is a typical although mild example of illogical markup to which CSS1 leads. Could we expect normal authors to behave better than W3C?) One might say that when one refers e.g. to the alt attribute in normal text, "alt" acts as a linguistic object: we are referring to a word or phrase in a language.

The natural solution would be to deprecate the CODE element in favor of using any suitable element with a suitable lang attribute referring to the particular language (or perhaps a computer language in the generic sense). This would, however, require that the language support be extended that way.

When quoting a text, one should normally cite the source; in many situations, this is even compulsory by law. But the meaning of the proposed cite attribute of BLOCKQUOTE and Q is vague. First, should it imply that the quotation was actually taken from the resource to which the URL points? Or does it suffice that the resource contains a copy of a document from which the quotation was taken? Moreover, should the URL point to the source document in general or should it try to pinpoint (using a fragment identifier, for example) the location where the quoted text appears there? Last but not least, should the cite attribute appear instead of a tradional designation of the source or in addition to it? The example suggests the former. However, there is no requirement that browsers should show the cite attribute in any visible way. And there is no obvious way they could do that in general. The BLOCKQUOTE example in the draft is a quotation from a book. The normal method - which might be regarded as legally obligatory in some cases at least - is to specify to book by its title and author and possibly some additional bibliographic data. Providing a link to a copy of the book on the Web can be very useful, but it should not be regarded as a substitute for a normal citation.

There could be a structured element, to be used within a quotation element only, for specifying the source in textual formal, as a normal citation. This needs some elaboration so that a suitable format can be developed which is reasonably simple, yet rich enough to express the most basic information. It would be natural to specify the URL of a copy of the source there and not in the quotation element.

The SUB and SUP elements are currently quite controversial. The draft seems to take the position that they are to be used for presentation issues only. Whether that would imply a definite change from HTML 3.2 is debatable. In any case, the specification should explicitly state whether SUP should be used for mathematical exponents or not. (Currently it is often used that way, and there is some pressure to do so, due to the lack of mathematical notations in HTML.)

Hyphenation is an important but difficult problem. It is often suggested that authors should be able to indicate hyphenation points. This is, however, just a minor and exceptional ingredient in any reasonable solution of the hyphenation problem. Moreover, it is generally not powerful enough, since word division may involve changes in the spelling of the word.

The soft hyphen has often been assumed to be an invisible hyphenation indicator according to ISO 8859-1. However, in fact the soft hyphen is a visible character. Naturally HTML specification may define some specific semantics for that character, and the HTML 4.0 is the first attempt to do so. Technically it is possible to define that it is treated as a "discretionary hyphen". (Whether it is wise to do so is debatable. There are several solutions which would be more natural and of more use, e.g. defining a special element like <WBR>, possibly with an attribute which allows the author make some word division points more preferrable than others, or introducing a new attribute to be used as in the following: <SPAN hyp="rec-ord">record</SPAN>.) The following points should be observed: The current wording seems to prohibit the use of the soft hyphen as such in an HTML file, which is exceptional, since obviously any other ISO 8859-1 character may appear as such. The current wording is obscure when it says that "the plain hyphen should be interpreted by a user agent as just another character". This seems to prohibit word division in compound words where the components are separated with a normal hyphen. It is normal to regard such word division as acceptable, even as one of the most recommendable word division points. The treatment of soft hyphens within preformatted text (where it might well appear in its original use specified in ISO 8859-1) should be specified; the current wording seems to imply that it should be removed when the document is rendered!

Elements for marking document changes are definitely needed. The draft contains INS and DEL elements but no element to specify a change in the sense of replacing some text by a new version of that text. The explanations refer to marking changes, however. It seems that the idea is to indicate a change by having the old text withing DEL and the new text within INS. This is theoretically illogical, since completely deleting something, inserting something completely new, and changing something (perhaps just a little) are three different kinds of editing text. It is also practically very inconvenient, since the normal way of indicating changed texts is to use something like a change bar in the margin, and this cannot be done without knowing what has been changed. (Recognizing adjacent INS and DEL elements would be a clumsy solution and not a correct solution, since there is no guarantee that they actually belong together indicating a change, instead of something completely deleted and something completely new inserted just accidentally appearing one after the other!)

Thus, an element indicating change (replacement) is needed, too. It could be named CHG, for example. In addition to an attribute pointing to information about the reason for the change, there should be another (optional) attribute (like OLD) pointing to a document containing the previous version of the text. This would allow user agents to provide some simple way of accessing the old text, perhaps displaying it in another window for comparison.

In addition to insertions, deletions and changes by the document author, a document or part of it can be submitted to an editorial process. The draft provides no obvious way of distinguishing them from other changes. Naturally, one could use the cite attribute to refer to a document explaining that the changes have been editorial, but this would be very clumsy for normal editorial things. A partial solution to this would be to extend the URL concept to allow "inline resources", i.e. a scheme which specifies that the rest of the URL is not an address of a resource but a literal resource, a piece of text (using some suitable literal notation and encodings). But such a solution, although desirable, would not allow a simple, uniform treatment of the most basic and usual kinds of change.

Perhaps there should be an attribute, with a set of keywords as possible values, indicating the nature of the changes in rough terms. Referring to a reason, in the sense of providing the URL of an explanation, plays a different role; it cannot form the basis of selecting different renderings for different changes. For example, in manuals it is often practically very important to distinguish between changes which try to improve the description of a program and changes which reflect changes in the program itself. Those keyword values might indicate document changes due to changes in the reality described (such as changes in a manual of a program due to changes in the program itself), improvement of the wording or structure of the document (by author), error correction (by author), editorial changes (for journalistic reasons), and changes in quoted text (such as omitting irrelevant parts or adding explanatory words) when including quotations. This would allow each user to define suitable presentation styles for those kinds of changes that are important for him to distinguish (in general or in some particular situation).

The date and time format defined in the draft is extremely theoretical. Despite being formally an ISO standard, it is rarely used in the world, except by some extravagant enthusiasts. It also deviates from well-established Internet usage defined in RFCs and referred to even in the HTML 4.0 draft in other contexts, such as the syntax if HTTP headers. - User agents should be encouraged to allow readers see the change timestamps, upon reader's request, and perhaps in a time format defined by the reader or by the browser settings. And authors should be discouraged from writing stupid INS elements like the one now given as an example in the draft. It should be explicitly said that the contents of an INS element should be simply the inserted text. It should not contain an explanation of the insertion (the cite attribute exists for such purposes), still less consist only of such an explanation!

Lists in HTML documents

The draft makes the DIR and MENU elements deprecated, even strongly ("We strongly recommend using UL instead of these elements"). Although these elements can be regarded as presentational, the real reason for deprecating them seems to be the defaitist attitude expressed as follows: "In practice, a user agent will render a DIR or MENU list exactly as a UL list." (This is against the good old recommendations in HTML 2.0, and it makes a rather strong claim about user agents in general. There are rumors that some browsers make some distinction.)

It would be quite appropriate to deprecate DIR and MENU in favor of UL, if browsers were clever enough to analyze UL elements is display them in directory list or multicolumn or some other format, selected on the basis of the list as a whole. But au contraire, most user agents are not able to use a simple directory list or multicolumn format even if explicitly instructed to do so (by the use of DIR or MENU)! Even more absurdly, both specifications and browser makers seem to be much more anxious to work on tiny details like numbering style of list elements.

The key question should be the following: How should a list presentation format be selected? Should it depend on the nature of the list or author's or user's preferences? Exactly how? Such problems should be solved before simply throwing DIR and MENU away.

The draft lacks elements for list headers. This is serious, since it is now impossible to express lists in a structured way. There is currently no way to indicate logically that a piece of text is a list header or an explanation (or a legend) of a list. (Notice that these are two separate concepts.) Thus the reader has to spend his time in finding out the structure which should be expressed in HTML and in the visible presentation in the first place. A simple solution would be to introduce an element for a list header (as in the HTML 3.0 proposal). An even simpler solution would be to modify the syntax of list elements so that at least text elements (to be interpreted as the list header) were allowed within a list element before the elements for items in the list.

More fundamentally, defining lists as block level elements which cannot appear within a paragraph is unnatural. As David Perrell has said so clearly,

A list rarely if ever stands alone as a concept. A list is usually an appendage to a paragraph, not a separate entity. To give a list equal space above and below is not a reasonable presentation.
This is actually related more to the paragraph (P) element than the list elements. Originally there was an (empty) P element used as a paragraph separator. This idea never got into any official specifications, but it still survives in its way. The "convenient" principle that an end P tag is omissible has a very high price. It implies that paragraphs are not nestable and cannot contain block level elements. This serious restriction is often circumvented by using lists instead of paragraph structuring, since lists are nestable. That solution is neither logical nor practical; as the size of list elements grows, typical list presentation becomes inconvenient.

The optimal solution would be to simplify and unify the basic structuring elements in HTML. This means some kind of true sectioning. The uniform concept of nestable sections with optional elements for headings, summaries, etc, would make separate elements for lists unnecessary. For example, a simple section containing a set of sections containing just text could, depending on user preferences and other factors, be presented as a list of items (numbered or not) or as a sequence of paragraphs or as a table containing the subsections in a sort of multicolumn format.

Definition lists deserve special discussion. A definition list is in a sense comparable to a table with two columns. Another view, a more logical one, is to regard it as a simple list of items, each of which is an ordered pair consisting of a definiendum and a definiens. The question arises why the inner structure has no logical markup by itself. If I only wish to present one definition somewhere in my document, should I write a definition list containing just that? That would mean that I accept that it is typically rendered in a strange way, and I cannot embed my definition into a normal paragraph, for instance.

Since HTML 3.2 we have a separate element, DFN, for designating something as a defining occurrence. But there is no way to indicate what part of the document constitutes the definition. And it seems that little attention has been paid to the relationship between definition lists and DFN. It would be natural to assume that normally a DT element is displayed in a manner similar to DFN or that at least some special presentation is used for DT due to its very nature. But the HTML 4.0 draft seems to assume that this is not the case: the example uses explicit EM element to mark up the terms within DT elements!

Designating definitions as definitions clearly and in a uniform manner is important for the purpose of suitable (and user-customizable) rendering of such crucial things in documents. But it is even more important for search engines and user-friendly tools in browsers. If a document would use the same basic markup for all definitions (indicating clearly the term to be defined and its definition), search engines could extract the definitions easily. Notice that very often people wish to find definitions for terms, not just any information related to term. Moreover, a document which is clearly marked-up in this respect would allow a browser to present a list of definitions to the user. This would be much more sensible than explicit links to definitions from all occurrences of terms in a document.

An obvious solution would be that there is one recommended way of presenting a definition: a DT element followed by a DD element. This could be rendered in a more or less distinctive manner, depending on the browser and on user settings. The DFN element would be deprecated but interpreted as if there were a DT element with no associated DD element. And the DL element (containing DT and DD elements) could still be used for the purpose of grouping definitions together, perhaps with some effect on layout. Alternatively, completely new names could be introduced for elements related to defining. A practical reason for that would be the widespead misuse or incorrect use of DL, DT, and DD.

Tables in HTML documents

The design principles being discussed earlier in these comments, it suffices to say here that the the table-related constructs in the draft provide no substantial improvements over HTML 2.0. In fact, they may lead to more confusion due to their complexity, but this depends on the availability of suitable tutorial material which encourages the use of a small subset of the facilities.

It is probably impossible to define a suitable structured table model in a short period of time, still less to gain acceptance to it. Therefore, if a new version of HTML is to be defined within a few months, there are three practical options:

  1. to keep tables as in HTML 2.0 (with obvious additions due to "internationalization" etc, of course)
  2. to approve the table model in the draft, or
  3. to adopt RFC 1942.

However, for the purpose of future work on tables, the following notes are made here:

Essentially, a table differs from a nested list (a list with lists as elements) only by the sublists beings structurally similar. (Astonishingly, this essential feature seems to be neglected in HTML table specifications and proposals; there is even no requirement on rows having the same number of elements!) Thus, a logically designed table model would provide a method for specifying, in addition to the data in a table, the logical structure of each row. (Naturally, this would make useful syntax checks possible.) This may or may not involve specifying visible or audible row or column headings for a table.

The very table structure should indicate whether a column contains, say, integers, numbers with decimal point, images, words or short phrases, or text paragraphs, for instance. The repertoire of such alternatives can be difficult to define so that it meets practical needs, but such a job would be necessary for a structured table model. The current trend of adding enhancements to presentational attributes inevitably leads to unmanageable complexity, in addition to encourageing the use of tables for mere layout. It should be left to browsers and readers to define a suitable presentation by mapping each of the column types into some way of rendering (such as aligning integers to the right in a visual presentation).

As an example of unnecessary complications caused by unstructured table models, consider the COLS attribute proposed in the draft. It is proposed as an efficiency enhancement. It is questionable whether it helps a browser to know the number columns beforehand, since it does not know anything about the space and format needed for the columns. Knowing just the number of columns isn't that much! If one wants to allow fast formatting of tables, so that rendering may begin before reading the entire table, then the beginning of a table element should specify the number of columns implicitly via the structure specifications for each column (as outlined above). For further optimization a browser would need information from which it can compute suitable widths for columns, for instance in the form of "sample elements" (written to reflect largest possible real elements). But it is debatable whether such optimization is really needed at all.

The draft, after introducing the COLS attribute (as something new when compared with HTML 2.0), the recommends authors to avoid it, using COL and COLGROUP elements instead. (This might be somewhat half-hearted, since the draft itself uses COLS in an example.)

Links in HTML documents

In current HTML, it is unclear whether an anchor (A) element defines a location or a zone (region) in a document. Given e.g. <A NAME="foo">, does it name the location of that tag or that part of the document which starts there and ends at the end tag </A>? The draft takes a step forward by fixing the latter interpretation, which is more sensible. However, it fails to consider the consequences: since anchor elements are restricted to containing text elements only, it would be suitable to deprecate A NAME=... in favor of using the id attribute (introducing a new element, such as a DIV or a SPAN element, if needed).

On the other hand, the draft appears self-contradictory in this respect. Under the title Anchors with the id attribute it says that the attribute is used to position an anchor at the start tag of an element! (Moreover, the example, or rather the sketch, given there associates the name "section2" with the heading of a section, not with the section as a whole, which is normally a more meaningful association.) An example uses the wording "location of anchor one", followed by a statement which speaks of zone. The draft also mentions empty A elements, suggesting that they should be avoided, but due to the behavior of some browsers, not for the logical reason that referring to an empty zone seldom makes sense!

Since the draft contains the important type attribute for the LINK element and for the OBJECT element, the question arises why there is no such attribute for the A element. The draft says, referring to some attributes related "internationalization", that 'armed with this additional knowledge, user agents should be able to avoid presenting "garbage" to the user'. Obviously, information about the media type would play a crucial role here. Logically, a browser needs to know the media type in order to know what to do with the resource (e.g. displaying an image in some format or launching an application which can handle resources of that type). Even if the resource is accessed via the HTTP protocol and it is served with content type information, a type attribute in the HTML construct for the link would allow the browser check that the types match. Such type information would also be useful in a variety of other ways, such as allowing search robots to classify and characterize documents on the basis of the types of links they contain.

Notice that the draft contains a charset attribute for the A element, thus providing a tool for expressing some of the information which could be expressed within a type attribute. It would be important to replace it by the type attribute. Moreover, the use of this attribute should be strongly encouraged (in all A, LINK, and OBJECT elements).

The semantic definition of the type attribute should be clarified. Notice that the draft very vaguely says that type and media attributes for LINK are "header style information". The DTD fragment oddly says that it is an advisory Internet content type. (And notice that a reference to RFC 2045 elsewhere in the DTD is just a comment, and consequently has no normative value. Moreover, it is practically somewhat misleading, since the media types are defined in RFC 2046.) The definition of the type attribute should something like the following: "The value of a type attribute specifies an Internet media type (content type) for the resource referred to. It consists of a top-level media type and a subtype, optionally followed by parameters which may specify properties such as character encoding (charset parameter). The Internet media type shall be specified according to RFC 2046 and related specifications. Use of the type attribute is strongly encouraged. In the absence of a type attribute, user agents will have to deduce the media type from sources external to the HTML file or even guess it."

The standardization of (basic) link types (REL and REV values) would be crucial for the development of the Web as a reasonably organized forum for hypertext publishing. The draft lists some values without making any requirement on or even suggestion to supporting them. Authors "may use" some "recognized" link types which are claimed to have "conventional interpretations". The content of list could be discussed a lot, but it is sufficiently useful to be publicly approved in the following sense:

Some mechanism (which is lighter than the approval of HTML specifications) should be created for extending the registry of basic link types and for refining their definitions.

Inclusions in HTML documents

The proposed OBJECT element is very complicated. To begin with, there are 17 possible element-specific attributes (in addition to the 16 "generic attributes") with very varying meanings and uses. Otherwise, too, the description of the OBJECT element looks what it is: a soup cooked up from different ingredients: It sounds tempting to integrate such things into a single concept, expressed using fashionable object-oriented terminology. But as the draft shows, putting various vendor-specific hacks together leads to confusing complexity. The syntactic uniformity (the use of OBJECT elements) would just hide the true incompatibilities between, say, ActiveX based and Java based systems.

The draft does not actually define the meaning of the OBJECT element. The structure of element specifications is especially annoying here, After a long discussion of the attributes, the text just starts explaining: "Most user agents contain mechanisms for rendering common data types - -". The very meaning of the OBJECT element is left implicit.

The principle that the content of an OBJECT element is a replacement for the object specified in the OBJECT tag is not emphasized. In fact, it is expressed in a relatively indirect way, as a requirement on browser behavior. The principle itself is debatable. It does not sound natural, does it? Obviously we need something better than the simple ALT attribute of the IMG element, but the idea of nesting OBJECT elements in order to specify alternatives is weird.

Moreover, the philosophy behind the OBJECT element seems to be that of graceful degradation, implying the idea that a more "advanced" form of presentation (such as animation) always should have precedence over less "advanced" (such as static images, which in turn have text as substitute). Although things might be that way in some cases, the principle is far from being universal. Even if I can view animations, I may have different very good reasons not to view them in situations where there are other alternatives.

In fact, an "advanced" object seldom has good replacements. This seems to apply at least to the examples in the draft, since they provide "alternatives" like the text "An animated clock" (thus confusing the title of an animation with a replacement for it) or "The Earth as seen from space" or, even worse, "This user agent cannot render Python apps" and the nerdish "You're missing a really cool poem viewer ...".

If an object has meaningful replacements or, rather, if there are alternative presentations of some message, then the natural method would be to provide some sort of menu from which the user (or in some cases the user agent) can select. This could be done using a simple list construct with links inside. If, however, we would like to have some sort of autoselection so that the user sees a presentation of the message embedded into the document, then something like OBJECT element could be used. But this requires that there is some structured mechanism for specifying that a set of OBJECT elements is to be treated that way. Since embedded objects can appear one after another, mere sequentiality is not enough.

The draft uses nesting of OBJECT elements to indicate alternatives. Even if there are reasons to suggest some order of preference, nesting emphasizes it too much. A natural solution would be to have a specific element, let us call it ALT, for specifying that the objects within it are alternatives to each other. (Selection among them would be affected by their textual order, which would be taken as the author's recommended order of preference, and by user settings which may specify e.g. that several image formats are accepted but in a particular order of preference.) Within such an element, anything following the OBJECT elements would be interpreted as the alternative to be taken when none of the objects is selected for presentation. (Admittedly, having that alternative, which is typically textual, in a symmetric position with respect to the others would be more natural.) This means something like the following:


<ALT>
<OBJECT TYPE="image/png" DATA="myhorse.png">
<OBJECT TYPE="image/gif" DATA="myhorse.gif">
<OBJECT TYPE="application/postscript" DATA="myhorse.ps">
My horse is a handsome black stallion.
</ALT>

Perhaps some better structure could be developed than the one above. The basic point here is that it should be a list, possibly an ordered ordered list, not a nest.

The difference between a description of an image and a substitute for it is crucial, of course. The old controversy about the use of the alt attribute of the IMG element revolves around the problem that there are cases where it seems necessary to use that attribute for the description, while the intended use is for substitutes. This problem should be removed from HTML by providing suitable methods for both purposes. Basically, the type and title attributes of an OBJECT element, when used appropriately, mostly serve the needs of describing images and other objects. Therefore, any construct corresponding to the alt attribute (such as the content of an OBJECT element according to the draft) should be regarded strictly as a substitute for the object. It should be something which is meaningful and useful even if (and especially if!) there is no way to access to the object itself. The draft does not say this clearly, and its examples seem to imply just the opposite.

In addition to the inherent problems described above, there are other serious problems in the design of the OBJECT element in the draft:

Image maps seem to be a confusing issue to HTML authors, partly due to the duality of "client-side" and "server-side" image maps. The draft does not remove the confusion. On the contrary, it confuses things further by including image maps into the object concept in an ad hoc way. The logical approach to fixing the image map issue would be the following: There is an important quiet change in the specification of the IMG element. The draft says that the height and width attributes override the natural dimensions of the image and that user agents should scale the image. There is no such requirement in HTML 3.2; on the contrary, HTML 3.2 suggests that those attributes are for efficiency reasons only and should reflect the true size of the image. Some browsers are known to scale images, but some of them are known to do it very poorly, distorting the image. More importantly, it is extremely unstructured to specify an image size in pixels in an HTML file and to require that browsers scale an image to such a size. Whether images are scaled by a browser should depend on the browser and the browsing environment. (Good-quality browsers may allow the user to scale images dynamically.)

The draft contains an example of including an HTML file into another HTML document by making use of the OBJECT element. This sounds like a nice way of including an inclusion facility as a byproduct of a general-purpose element. However, since the semantics of the OBJECT element is vaguely defined, we must just assume that the example in the draft really means the following:

Notice that the idea of using the content of an OBJECT element for displaying an error message violates the idea of using it as a replacement for the object. Moreover, it is far from being obvious in the draft whether the content is to be used when the browser does not even try to access the resource (due to having a media type not belonging to supported types, for example) or when it tries but cannot access it (due to being offline, for instance). These are quite different situations. The latter should obviously be primarily handled in the error processing of the user agent, although good error handling might involve giving the user an option to select a substitute for the object (as defined in the HTML file) or to try accessing the resource again (perhaps it was just a temporary failure in connection?).

It would seem very natural to assume that using an OBJECT element with type="text/plain" is a way to include a plain text file as if its content appeared in the document within a PRE element. The draft should either say that this is so or warn against this obvious interpretation.

The draft contains a rather complicated description under the heading How to specify alternate text. Contrary to what that heading suggests, the text contains, after some guidelines to authors, recommendations to (or requirements on?) browsers. Instead of clearing things up, the recommendations are confusing. First, why recommend anything new as regards to IMG and APPLET elements which should not be used in new documents (assuming that a suitable OBJECT element is available)? Second, by suggesting that the value of a title attribute be used as alternate text one messes up the ideas of description and substitution. Recommending (or requiring?) that browsers extract text from GIF images is hopefully just a strange joke. (If an author, instead of writing a decent heading element, embeds an image containing a heading text, without providing an alt attribute, should a browser really act as a text scanner to decipher the words?)

Presentation of HTML documents

This section should be removed, with the exception of the description of the (deprecated) presentational features inherited from HTML 3.2. It is irrational to add new presentational features to HTML. This applies especially to frames. As regards to style sheets, there is no reason to say anything more about them in an HTML specification than what HTML 3.2 specification says. The policy adopted in the draft strongly suggests that the criticism presented in Why style sheets are harmful is sound.

In short, style sheets are not wonderful.

<P type="text/css" style="font-size: 12 pt; color: fuschia">
is just a perverse way of writing a FONT tag, although the FONT tag is more structured in the sense of not specifying font size using physical measures. (The example also exhibits, as minor perversions, an incorrect spelling of a color name and a reference to an unregistered Internet media type.)

The HR element is present among elements for physical markup, and it is described as a presentational element. It should be described as an element for change of topic, which can be rendered as a horizontal rule. If true sectioning is introduced into HTML, the HR element should be declared deprecated.

Frames in HTML documents

The use of "frames" in HTML documents should not be encouraged by including them into the HTML specification. Typically the use of frames wastes valuable screen space, prohibits saving the URL of a document for later use and makes it unclear which documents belong to a particular site and which are just framed from elsewhere.

Frames are an expensive plastic imitation of windowing, to be used in environments where true windowing is readily in use. They are tårta på tårta (Swedish saying, 'cake on cake') and make windowing clumsier.

Graphical browsers typically allow the user to select, when following a link, whether the document is loaded into the current window or a new one. A normal link (with no target indication) works fine for that purpose. Browsers should be improved so that they support viewing two (or more) documents at a time so that links from one document to another can be followed smoothly (using an existing window for a document instead of opening a new one). For instance, one of the documents could be an index page for a site. There is no need to add any special framing features to HTML documents or the HTML language for the purpose.

Forms in HTML documents

Forms and scripts have been grouped together under the heading "Interactive HTML documents". It is not clear what the message of the grouping is, especially since there no text under the common heading except the subsections of forms and scripts.

Forms handling has been rather primitive in HTML. For instance, there is no way to check input data before submitting it. And there is no way (within HTML) to make a form really interactive in the sense that the form changes when it is being filled (so that e.g. selecting an option causes some of the input fields to be omitted due to their being irrelevant).

Perhaps this is as things should be, leaving advanced interaction to be handled by various scripts. This causes restrictions and incompatibilities, since there is really nothing an author of a form may assume about the support to various scripting methods. Thus, if this approach is selected (or kept), it would be desirable to require support for at least one scripting language in all browsers conforming to HTML specifications. (Such requirements would be quite normal, comparable to requirements on supporting at least one specific character encoding.)

Alternatively, the HTML forms concept could be enriched by providing HTML methods expressions for things like requirements on input values. We already have expressions for single-line text input and multi-line text input as well as selection from a fixed set of alternatives. Adding more complicated requirements like numerical input field (which must provide a value within a given range) would not be anything drastically new. If expressed in HTML, such requirements could be reflected in the presentation of forms. For example, a browser might be configured to display numerical input fields in a distinguised color (by user option), and the user would have a clear visual hint about a field requiring numerical input. Naturally, it would take a lot of work and time to define such a forms concept and agree on it.

The draft makes the forms concept more complicated, but mostly not in the useful direction outlined above. On the contrary, there are even some features which have been designed to allow authors (not readers) have more control over the presentation of various fields. For example, the BUTTON element, by allowing "richer presentational possibilities", makes it more and more difficult to readers to recognize the various parts of forms as being what they are.

In current HTML, forms suffer from being unstructured: an author cannot indicate that some text is a title or an explanation for a text input field, for example. The draft proposes methods especially designed for forms. The question arises: shouldn't we use more generic structuring methods instead?

In particular, assuming that true sectioning were introduced to HTML, it would be natural to interpret a sectioning construct within a FORM element as constituting a field, or perhaps fieldset. The heading of such a section would be interpreted as the label for the field; browsers might indicate this structure in some suitable manner Any other content except form field elements would be interpreted as an explanation of the field (to be displayed only upon special information request from the user, perhaps).

The draft contains several complications to the forms concept due to focusing. Perhaps one reason to this is that some browsers do not automatically focus on the first input field of a form. The proposed cure is, however, much worse than the disease. An author should simply put the fields of a form in the natural order, i.e. the order in which they are to be filled, and expect browsers do their job accordingly. (As regards to changing forms dynamically, see remarks above. In this respect, the solution proposed in the draft complicates HTML but without solving the problem: it leaves the dynamics to be handled by scripts.)

An example in the draft uses an action attribute with a mailto URL. Since a widely used browser family fails to conform to specifications in this respect, with very harmful results, it is unwise to present such an example without a warning. Moreover, using a mailto URL for sending the content of a form by E-mail isn't very advanced, especially if the encoding of the content is x-www-form-urlencoded (as in the example).

Another example (related to FIELDSET and LEGEND elements) contains a form for submitting very personal information, including medical history. Taking into account that in many countries such information is strongly regarded as confidential data to be especially protected (see e.g. EU directive 95/46/EC), it should be emphasized that such data must not be submitted unencrypted in an open network (which is the typical way of submission on the Web).

Scripts in HTML documents

As mentioned above, an HTML specification which contains a generic interface to scripting should make support to at least one specific scripting language mandatory in conforming user agents. (Naturally, this would not imply that the support would always be enabled.)

Especially for small tasks which (currently) need to be implemented with scripts, it would be crucial to know which language to use to achieve reasonable accessibility. And probably most uses of scripting are made in order to do rather small tasks.

The idea of embedding scripts into HTML documents may have some technical benefits, such as avoid separate transmissions of small files, but it seems to have made several people confuse scripts with HTML things and scripting languages with HTML. Moreover, the text under the heading Syntax of script content (and under Commenting out scripts) eloquently shows what practical problems arise when languages are mixed this way.

The natural solution would be to associate scripts with HTML documents so that the HTML document only gives the URL for the script file (and its media type). Thus, the SCRIPT element should be simplified so that it has empty content and must refer to the script using a src attribute.

The semantics of the SCRIPT element is vaguely described. For example, exactly what is the difference between having a SCRIPT element in the head part and having it in the body part? An when the draft says that a script is executed "when the document loads, or at some other time such as when a link is activated", shouldn't it say things a bit more specifically?

It needs more justification than the draft gives to make it reasonable to include the repertoire of "intrinsic events" into HTML. The examples given are not very illustrative. For example, as regards to checking the contents of numerical input field, HTML based constructs should be developed instead of every author writing his own code in this own favorite scripting language (which may or may not be supported by various user agents). Most of the other examples are too generic to give any idea of why an event handling is needed and whether there could be a more HTML like solution to the problem (if there was a problem before creating one with scripting).

As regards to designing documents for user agents that don't support scripting, the draft seems to based on "all or none" thinking: either a user agent supports scripting in any language, or it does not support scripting at all. The formulation in the draft is somewhat obscure, but it seems that the NOSCRIPT element be rendered by such user agents which do not support client-side scripts at all. However, the meaningful processing would be that a NOSCRIPT element be rendered if the particular scripting language (or perhaps the particular script?) used in the corresponding SCRIPT element is not supported by the browser.

Obviously, this raises the question why such situations are not handled be regarding the scripts as objects instead of introducing a new kind of element. The object approach would allow the specification of alternative presentation using the general tools for that as well as the specification of alternative scripts in different languages to accomplish the desired task.

Other parts of the draft

In addition to the specification commented above, the draft contains several parts.

SGML reference information for HTML is normative of course, but its basic content, apart from some syntactic details, is (or should be) contained in the specification, so it is not commented here. Howeverm notice that additional character entities are a good thing to include into HTML, but they should also be specified in the specification itself, in an SGML-free compact tabular form.

The rest of the material consists of more or less technical annexes, the last of which is unwritten. The appendix which lists changes between HTML 3.2 and HTML 4.0 is far from being exhaustive. For example, it does not even mention a large number new attributes. On the other hand, it repeats information such as motivation for some changes which does not belong here.

The appendix Performance, Implementation, and Design Notes is a strange collection of mixed notes. For example, it "informatively" describes search engine and robot behavior without citing anything or referring to relevant recommendations or guides. It also presents a long explanation of the adopted table model, reflecting an even more presentation-oriented way of thinking than the model itself.


Jukka Korpela
Initially written 1997-08-21/1997-09-10. Some minor modifications (e.g. added links) and corrections have been made later. Table of content was added 2002-06-03.