The difference between required elements and required tags has received a fair amount of attention recently, but the difference between the two is rarely (if ever) explained in detail.
Author: Gez Lemon
In HTML, the start and end tags are optional for the
body elements (as well as some other elements). The following is a perfectly valid and conforming HTML 4.01 Strict document.
Although the tags are optional, the elements are not. Remember that an element is part of the document structure, and that tags are the markup declarations to create the document structure. Where the start tag, end tag, or start and end tag can be unambiguously determined, HTML allows these tags to be optional in the markup, but the elements will appear in the document structure. In the above example, I've included a closing tag for the paragraph, but this is optional in HTML as the end of a paragraph can be unambiguously determined.
The markup displayed above can be examined in a file called minimal.html. The file references a script that displays which elements are in the
head section, and which are in the
body section. Obviously, scripting must be enabled to examine which elements are in the
head and the
body. You can validate the markup for yourself with the W3C's HTML validator to confirm that the markup is perfectly valid.
In XHTML, the start and close tags for
body elements are required, as the well-formedness requirements of XML do not allow tags to be omitted from the markup. The file minimal.php is an XHTML version of the content displayed above, but with a form to provide options to serve the content as
application/xhtml+xml, with the
html root node, and with the
body tags to demonstrate the effect is has on the resulting document. The script is exactly the same for both the HTML document and the XHTML document to help illustrate the differences.
When delivered as
text/html, different browsers give different results for the script. For example, even though the document is invalid XHTML, when served as
text/html without the
body tags being explicitly specified in the markup, Firefox and IE will assume HTML and insert the missing elements as if it was a regular HTML document. Opera 8.51 is unable to execute the script, as it cannot find the
body elements (as they haven't been defined, and are not implicitly implied by the structure with XHTML), even though the document is delivered as
text/html. IE doesn't understand
application/xhtml+xml, but all XML capable browsers should result in a well-formedness error when served as
application/html+xml due to junk following the document element, as they at least require the
html root element. When the
html root element is provided, non-validating XML capable browsers are able to parse the document, but cannot run the script as the
body elements are missing from the document structure. A validating XML parser wouldn't be able to parse the content at all, unless the
body tags were explicitly in the markup, and the rest of the document conformed to the DTD. At this moment in time, none of the mainstream browsers that are capable of handling XML are validating parsers.
Standard Generalized Markup Language
To understand why these elements are optional with HTML requires examining Standard Generalized Markup Language (SGML).
HTML 2.0 was formalised with SGML so that restrictions could be specified to determine the start and end of elements. To achieve this goal, SGML documents use a Document Type Definition (DTD) to define a set of markup declarations, including restrictions on how elements and attributes can be used. The SGML declaration section includes an optional
FEATURES clause, which contains a list of SGML features that can be switched on or off using the values
NO respectively. There are two optional features that are set to
YES in HTML;
SHORTTAG. It's the
OMITTAG feature that allows elements to be implied by the structure in HTML.
FEATURES clause of the SGML declaration's
OMITTAG entry has been set to
YES, start tags, end tags, or both start and end tags may be excluded in situations where they can be unambiguously determined by the document structure. Each element defined in the DTD must define the minimalisation rules when
YES. This is specified using a hyphen for required starting or closing tags, and the letter "O" for optional starting or closing tags. The two characters are specified together straight after the element name, and are separated by whitespace. For example, in HTML, the unordered list element requires both a start and close tag. The following is the DTD fragment that defines the
<!ELEMENT UL - - (LI)+>
All elements are defined in SGML with the
ELEMENT keyword, followed by the name of the element; in this case,
ul for the unordered list element. The next two characters define the minimalisation rules, which are depicted with hyphens indicating that both the start and end tags are required in this case. The rest of the declaration defines the content model, which in this example means that a
ul element must contain at least one
In HTML, a closing tag for a list item is optional, as defined by the following DTD fragment (comments from the original DTD fragment have been removed).
<!ELEMENT LI - O (%flow;)*>
In the above example, the first hyphen following the element name indicates that the start tag is required, and the "O" that follows indicates that the closing tag is not required.
The HTML 4.01 DOCTYPE defines the
body elements as follows (comments from original DTD fragments have been removed):
<!ELEMENT HTML O O (%html.content;)> <!ELEMENT HEAD O O (%head.content;) +(%head.misc;)> <!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL)>
As the minimalisation rules for both start and end tag have been set to "O", both the start tag and end tag are optional. As well as leaving out the
body tags completely, it means that you can provide closing tags without specifying the start tags for those elements, and still conform to the specification.
You could be forgiven for thinking that no respectable author is likely to omit important tags such as
body, but you would be wrong. Anne van Kesteren, one of the most respected and knowledgeable markup authors around, serves his content without explicitly specifying these tags as they are optional. The tags are optional in HTML, but the elements are required; that's the important thing to note.