Summary

The difference between required elements and required tags has received a fair amount of attention recently, but the difference between the two is rarely (if ever) explained in detail.

Author: Gez Lemon

Contents

Optional Tags

In HTML, the start and end tags are optional for the html, head, and body elements (as well as some other elements). The following is a perfectly valid and conforming HTML 4.01 Strict document.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
          "http://www.w3.org/TR/html4/strict.dtd">
<title>Juicy Studio - Minimal Document</title>
<meta http-equiv="Content-Type"
      content="text/html; charset=ISO-8859-1">
<script type="text/javascript" src="script/s.js">
</script>
<h1>Minimal Document</h1>
<p>
This is a minimal 
<abbr title="HyperText Markup Language">HTML</abbr> 
document.
</p>

Although the tags are optional, the elements are not. Remember that an element is part of the document structure, and that tags are the markup declarations to create the document structure. Where the start tag, end tag, or start and end tag can be unambiguously determined, HTML allows these tags to be optional in the markup, but the elements will appear in the document structure. In the above example, I've included a closing tag for the paragraph, but this is optional in HTML as the end of a paragraph can be unambiguously determined.

The markup displayed above can be examined in a file called minimal.html. The file references a script that displays which elements are in the head section, and which are in the body section. Obviously, scripting must be enabled to examine which elements are in the head and the body. You can validate the markup for yourself with the W3C's HTML validator to confirm that the markup is perfectly valid.

In XHTML, the start and close tags for html, head, and body elements are required, as the well-formedness requirements of XML do not allow tags to be omitted from the markup. The file minimal.php is an XHTML version of the content displayed above, but with a form to provide options to serve the content as application/xhtml+xml, with the html root node, and with the head and body tags to demonstrate the effect is has on the resulting document. The script is exactly the same for both the HTML document and the XHTML document to help illustrate the differences.

When delivered as text/html, different browsers give different results for the script. For example, even though the document is invalid XHTML, when served as text/html without the html, head and body tags being explicitly specified in the markup, Firefox and IE will assume HTML and insert the missing elements as if it was a regular HTML document. Opera 8.51 is unable to execute the script, as it cannot find the head or body elements (as they haven't been defined, and are not implicitly implied by the structure with XHTML), even though the document is delivered as text/html. IE doesn't understand application/xhtml+xml, but all XML capable browsers should result in a well-formedness error when served as application/html+xml due to junk following the document element, as they at least require the html root element. When the html root element is provided, non-validating XML capable browsers are able to parse the document, but cannot run the script as the head and body elements are missing from the document structure. A validating XML parser wouldn't be able to parse the content at all, unless the html, head, and body tags were explicitly in the markup, and the rest of the document conformed to the DTD. At this moment in time, none of the mainstream browsers that are capable of handling XML are validating parsers.

Standard Generalized Markup Language

To understand why these elements are optional with HTML requires examining Standard Generalized Markup Language (SGML).

HTML 2.0 was formalised with SGML so that restrictions could be specified to determine the start and end of elements. To achieve this goal, SGML documents use a Document Type Definition (DTD) to define a set of markup declarations, including restrictions on how elements and attributes can be used. The SGML declaration section includes an optional FEATURES clause, which contains a list of SGML features that can be switched on or off using the values YES and NO respectively. There are two optional features that are set to YES in HTML; OMITTAG and SHORTTAG. It's the OMITTAG feature that allows elements to be implied by the structure in HTML.

When the FEATURES clause of the SGML declaration's OMITTAG entry has been set to YES, start tags, end tags, or both start and end tags may be excluded in situations where they can be unambiguously determined by the document structure. Each element defined in the DTD must define the minimalisation rules when OMMITTAG is YES. This is specified using a hyphen for required starting or closing tags, and the letter "O" for optional starting or closing tags. The two characters are specified together straight after the element name, and are separated by whitespace. For example, in HTML, the unordered list element requires both a start and close tag. The following is the DTD fragment that defines the ul element.

<!ELEMENT UL - - (LI)+>

All elements are defined in SGML with the ELEMENT keyword, followed by the name of the element; in this case, ul for the unordered list element. The next two characters define the minimalisation rules, which are depicted with hyphens indicating that both the start and end tags are required in this case. The rest of the declaration defines the content model, which in this example means that a ul element must contain at least one li element.

In HTML, a closing tag for a list item is optional, as defined by the following DTD fragment (comments from the original DTD fragment have been removed).

<!ELEMENT LI - O (%flow;)*>

In the above example, the first hyphen following the element name indicates that the start tag is required, and the "O" that follows indicates that the closing tag is not required.

The HTML 4.01 DOCTYPE defines the html, head and body elements as follows (comments from original DTD fragments have been removed):

<!ELEMENT HTML O O (%html.content;)>

<!ELEMENT HEAD O O (%head.content;) +(%head.misc;)>

<!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL)>

As the minimalisation rules for both start and end tag have been set to "O", both the start tag and end tag are optional. As well as leaving out the head and body tags completely, it means that you can provide closing tags without specifying the start tags for those elements, and still conform to the specification.

You could be forgiven for thinking that no respectable author is likely to omit important tags such as html, head, and body, but you would be wrong. Anne van Kesteren, one of the most respected and knowledgeable markup authors around, serves his content without explicitly specifying these tags as they are optional. The tags are optional in HTML, but the elements are required; that's the important thing to note.

Category: Markup.

Comments

  1. [required-elements-required-tags.php#comment1]

    All elements are defined in SGML with the ELEMENT keyword, followed by the name of the element; in this case, a for the anchor element.

    I think you meant "in this case, ul for the unordered list element."

    Interesting article; thanks!

    Posted by Josh on

  2. [required-elements-required-tags.php#comment3]

    Interesting article. Would you recommend we leave out html head and body tags? Seems to me it could have practical benefits thinking about bandwidth.

    Posted by Doug on

  3. [required-elements-required-tags.php#comment4]

    Hi Doug,

    I wouldn't recommend leaving out the html tag, as that's where you should specify the natural language for the document. Using head and body is a matter of preference. It won't effect search engine rankings, but the bandwidth saving would be minscule.

    Posted by Gez on

  4. [required-elements-required-tags.php#comment6]

    I wouldn't recommend leaving out the html tag, as that's where you should specify the natural language for the document.

    Alternately you could place that information in the Content-Language HTTP header.

    Posted by zcorpan on

  5. [required-elements-required-tags.php#comment7]

    Alternately you could place that information in the Content-Language HTTP header.

    Yes, that's a good point. The automatic language detection of screen readers leaves a lot to be desired, and unfortunately, they don't determine the language from the HTTP headers. They usually check for a lang attribute on the html tag, and should that fail, evaluate all of the words in the document against a word list for supported languages. Hopefully things will improve, but in the meantime, specifying the language attribute on the html tag is the safest in terms of accessibility.

    Posted by Gez on

  6. [required-elements-required-tags.php#comment8]

    Opera has a bug that it doesn't open HEAD element when there's no <head> start tag. The BODY element is there though.

    Posted by zcorpan on

  7. [required-elements-required-tags.php#comment9]

    Nice article, but there isn't any practical use for this, is there? Or am I missing something?

    In fact, it might encourage some people to leave out these tags just to show off that they know something that isn't common knowledge.

    But I learnt something I didn't know, so thanks. *smile*

    Posted by Rohit Sinha on

Comments are closed for this entry.