There's a lot of confusion about whether or not it is okay to serve XHTML documents as
application/xhtml+xml. This article investigates the benefits of XHTML over HTML, and which is the most suitable MIME type. A MIME Type Test Suite has been added to help illustrate the differences between sending XHTML as
Author: Gez Lemon
- Hype about MIME
- The W3C's take on MIME Types
- The IETF's take on MIME Types
- Validation and Semantics
- Well-Formed XML
- Benefits of XHTML
- Further Reading
Hype about MIME
There have been a lot of articles recently about web standards; in particular, using XHTML and serving it as
text/html. Personally, I'm not that bothered whether people serve XHTML as
text/html, but think it's important that authors understand why this is wrong. Although I'm not bothered about content developers serving XHTML as
text/html, I don't agree with people encouraging content developers to deliver XHTML as text/html.
The W3C's take on MIME Types
Molly recently made a great post about web standards on the Web Standards Project (WaSP), which she cross-posted on her personal website so that people could comment. Like most of the work Molly produces, the article is a great read for anyone interested in web standards. There are, however, a couple of points that I'm not entirely convinced about.
Firstly, Molly correctly points out that the W3C is not a standards organisation, but an organisation that provides specifications and recommendations. Molly goes on to point out that these have been adopted as de facto standards rather than official standards, so referring to them as web standards is a misnomer. Referring to specifications from the W3C as being industry standards is slightly presumptuous, as the W3C is not an official standards organisation. De facto standards are standards that have been adopted by the relevant industries, but where a recognised standards body, such as the International Standards Organisation (ISO), hasn't issued the standards. Although the term de facto explicitly indicates that the standards are provided by an organisation that isn't officially a standards organisation, in the case of HTML and XHTML, they've non-the-less been adopted as standards, de facto or otherwise. For this reason, I don't believe that referring to de facto standards as standards is a misnomer as such.
The IETF's take on MIME Types
Referring to XHTML documents, Molly references a note about media types issued by the W3C. A few of the comments in Molly's post point out that the note isn't an official W3C recommendation, but a note; in other words, there is no consensus of opinion by the W3C: it's just a note. More to the point, the Internet Engineering Task Force (IETF) are the body that produce standards for the transport layer of the Internet through Request For Comments (RFC). The relevant RFCs for the XHTML MIME type issues are RFC 3023 (XML Media Types, proposed standard), RFC3236 (application/xhtml+xml media type, informational), and RFC 2854 (text/html media type, informational).
Validation and Semantics
Finally, Molly states that validation isn't a requirement, but conformance is (a point I do agree with). Validation (in the context of Molly's post) is a measure of how well a document conforms to a recognised Document Type Definition (DTD). A validator can only ensure that the markup elements are legal, and contain the appropriate attributes according to the DTD; a validator couldn't possibly know whether the elements are semantically correct for the content, so it stands to reason that validation should not be a requirement, but is a useful yardstick for testing whether a document conforms to particular specification.
The crux of Molly's post is to reaffirm that standards (regardless of whether they're de facto standards) play an important part of being a professional, and that we all have a duty to encourage rather than discourage best practice. Returning to the MIME type issue, understanding what you're compromising by delivering XHTML as
text/html is important. User agents treat XHTML delivered as
text/html as tag soup. The term tag soup refers to poorly written web documents that browsers, through really quite incredible parsing, still manage to make sense of. For example, most browsers will manage to correctly render a list of
<li> elements that are not contained within a
<ul> element. Serving an XHTML document using an incorrect MIME type does not result in well-formed markup becoming tag soup, but it does result in the parser having to treat the document as if it is tag soup as there will be things in the document that the HTML parser doesn't understand, and will have to rely on its error-handling capabilities in order to parse.
XML parsers are required to stop processing a document on the first error they encounter. As most XHTML documents are not delivered using the correct MIME type, many authors of XHTML documents will be oblivious to this. To illustrate it, I've provided a document that contains an unencoded ampersand. If you view the document using an XML capable browser (such as Firefox, or Opera), a parse error is displayed. With Firefox, no effort is made to parse the document if it isn't well-formed, whereas the latest version of Opera is slightly more friendly in that it will parse as much information as it can, and stop on the first error it encounters. Some browsers, such as Internet Explorer, are completely unable to handle XHTML delivered with the correct MIME type. To illustrate this, I've produced a document that serves content as application/xhtml+xml regardless of the capabilities of the user agent. If you view this document with Internet Explorer, you will be prompted to download the document, as Internet Explorer doesn't understand what it is meant to do with the document. As Internet Explorer is still the most widely used browser at this moment in time, content developers delivering XHTML as
application/xhtml+xml must use content negotiation to ensure that older browsers (such as IE) are still able to receive the content.
Benefits of XHTML
HTML 4.01 is the latest version of HTML, and would be far more appropriate for pretty much all of the content on the web at the moment, with some notable exceptions. I've heard many reasons as to why credible content developers use XHTML, and deliver it using an incorrect MIME type, but I've yet to hear a reason that sounds credible. I think if people were honest, they would admit to buying into the XHTML rhetoric without fully understanding the consequences. That pretty much sums up how I've ended up with a set of XHTML documents. The only benefit of using XHTML for me has been that I'm able to easily parse comments written in XHTML to ensure they're well-formed and conform to an XHTML 1.0 Strict DOCTYPE before committing the comment. In fairness, I've made much bigger mistakes than buying into the XHTML rhetoric. Anyone who has known me long enough to follow the progress of this site will probably recall that it was once a Flash website, that was re-written to include a Java navigation system (with none of the Java Accessibility features). As if that's not bad enough, and it's painful to admit this, I even carried a message that this site was best viewed in Internet Explorer. This is the first ever website that I have built, and like everyone else working in this industry, I'm on a perpetual learning curve.
When I started out, I would have made far less fundamental errors if information about best practice were more widely available. The situation is a lot different today, as there are many, many standards activists all providing good advice on best practice. However, many are split over the XHTML MIME type issue, making it difficult for someone starting out today to make an informed decision as to which markup language to use; XHTML or HTML. The rhetoric surrounding XHTML is that as it's well-formed, it has a much stricter syntax than HTML, resulting in user agents that are lighter as they aren't required to have so much error handling capabilities. Serving XHTML as
text/html goes no way towards achieving this goal, meaning that the real benefits of using XHTML aren't going to be realised any time soon.
XHTML MIME Suite Tests
Jacques Distler suggested putting together some examples of valid XHTML documents to illustrate the difference between serving content as
text/html. It's a very good idea, so I've put together the following basic XHTML MIME Type Test Suite.
- Sending XHTML as text/html considered harmful
- MIME Types Matter
- It's just a note
- Pretending to Use XHTML
- HTML Compatibility Guidelines
Category: Web Standards.