Summary
XHTML should be delivered as application/xhtml+xml
. Most modern browsers, with the exception of Internet Explorer 6, support the MIME type application/xhtml+xml
. This article demonstrates how to use content negotiation to deliver application/xhtml+xml
to user agents that support that MIME type, and text/html
to the rest.
Author: Gez Lemon
Contents
- The Internet Engineering Task Force (IETF)
- MIME Types
- XHTML MIME Types
- MIME Types and User Agents
- Content Negotiation
- Setting the MIME Type with Code
- Setting the MIME Type with PHP
- Setting the MIME Type with ASP
- Setting the MIME Type with PERL
- Conclusion
The Internet Engineering Task Force (IETF)
The Internet Engineering Task Force (IETF) is a large open international community, responsible for the evolution of the Internet architecture, such as TCP/IP. The standards they produce are expressed as Request For Comments (RFC), which are the result of committees, and reviews by interested parties. RFC 1521 and RFC 1522 specify the types and subtypes for Multipurpose Internet Mail Extensions (MIME).
[Back to the table of contents]
MIME Types
The original Internet e-mail protocol only supported ASCII text. MIME is an extension of the e-mail protocol, to allow other types of data to be exchanged over the Internet, such as video, images, and applications. When you access a document over the Internet, the HTML document, images, style sheets, and any objects all have an associated MIME type. Web servers insert the MIME information into the HTTP headers on each transmission. Web clients, such as a browser, use this information to determine how to handle the data. For example, a MIME type of image/gif
, informs the user agent to handle the data as an image.
The Internet Assigned Numbers Authority (IANA) maintains a list of registered MIME Media types. MIME types are specified in two parts. The top-level media type declares the general type of media, and the subtype defines the specific format for that media. The two are separated by a forward slash, top-level/subtype. For example, image/gif
has a top-level media type of image
, and the specific format is GIF. There are five discrete top-level media types; text
, image
, audio
, video
, and application
. There are also two composite media types; multipart
, and message
. Experimental, or unofficial MIME types are denoted with a subtype that starts with, "x-".
For example, a MIME type of application/x-shockwave-flash
is an unofficial
MIME type that instructs user agents that recognise the MIME type to use a Flash Player to handle the data.
Visual browsers have the ability to handle a range of standard MIME types, such as HTML (text/html
), JPEG (image/jpeg), and GIF (image/gif
). XHTML is the reformulation of HTML, as an XML application. As such, XHTML documents must conform to the rules of XML, and be well-formed. Well-formed means there can be no overlapping elements, all elements must contain a closing tag, attributes must be quoted and given a value, and the case of the characters is important.
User agents that handle text/html
, do so in a forgiving manner. For example, every major user agent would display the following, despite the fact that it's malformed if it was intended to be XHTML.
<input type="checkbox" name="option1" checked>
[Back to the table of contents]
XHTML MIME Types
Two RFCs were published for handling XML documents; RFC 3023 (XML Media Types) and RFC 3236 (application/xhtml+xml
). This resulted in four possibilities for specifying a MIME type for XHTML documents; application/xhtml+xml
, application/xml
, text/xml
, and text/html
. Care should be taken when serving as text/xml
as the character set rules for text/*
are more complex than application/*
, and you may get unexpected results. The MIME type application/xml
is a generic media type for any XML document. As such, it is plausible to serve an XHTML document with this MIME type. Generic XML processors may not necessarily recognise the document as an XHTML document, and may not render the content how you intended. The text/html
MIME type (RFC 2854) is intended for HTML, and is not suitable for XHTML. When an XHTML document is served as text/html
, the user agent will not process it as XML.
The preferred MIME type to use with XHTML documents is application/xhtml+xml
. When served with this MIME type, XHTML compliant user agents must ensure the document is well-formed, complies with the rules of XML. For example, if you serve the above code to Mozilla using application/xhtml+xml
, the page will not display as it isn't well-formed.
[Back to the table of contents]
MIME Types and User Agents
So that's that. All XHTML files should be served with a MIME type of application/xhtml+xml
, and everyone's happy. Well, that's not quite the whole story. Unfortunately, some browsers do not understand the application/xhtml+xml MIME type. Internet Explorer 6, the most widely used browser at the time of writing, falls into this category. If you serve an XHTML document with a MIME type of application/xhtml+xml
, Internet Explorer will prompt you to download the file, because it doesn't know how to handle the file. That's quite a serious issue, and one that stops many developers using the correct MIME type.
However, other browsers such as Netscape, Mozilla, and Opera do understand the MIME type, and are able to handle the document correctly. Compatibility issues usually improve over time, but with the announcement that Microsoft no longer intends to provide free stand-alone versions of Internet Explorer, this particular compatibility issue may be with us for a long time yet.
[Back to the table of contents]
Content Negotiation
A solution to the compatibility issue is to use content negotiation to serve application/xhtml+xml
to user agents that understand that MIME type, and text/html
to other user agents. When a user agent requests a document from the server, it sends an Accept HTTP header, containing the various MIME types it supports, and how well it understands the MIME type using a quality parameter. The server may be configured to reply with a version of the resource that is most suitable for the particular user agent. Whilst XHTML 1.0 may be served as text/html
, it should be served as application/xhtml+xml
to user agents that understand it.
Apache have a document, explaining how to configure content negotiation on an Apache HTTP server. Some user agents send incomplete Accept headers, making it difficult to determine which version to serve. To cater for this, it's sensible to lower the quality of source parameter (qs) a little for application/xhtml+xml
, to make sure that text/html
is the preferred MIME type when using the AddType
directive with Apache.
AddType application/xhtml+xml;qs=0.8
[Back to the table of contents]
Setting the MIME Type with Code
It is not sufficient to try and set the content type through the meta
element in the head
of the document. User agents receive the MIME type from HTTP headers set on the server. If for whatever reason you're unable to configure the server for content negotiation, you will have to resort to scripting to determine the MIME type to serve the document. The principle is the same as above. You read the HTTP Accept header, and set the MIME type depending on the capabilities of the user agent.
The following is typical of what may be specified in the HTTP Accept header.
text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, video/x-mng, image/png, image/jpeg, image/gif;q=0.2, text/css, */*;q=0.1
The quality parameter (q) indicates how well the user agent handles the MIME type. A value of 1 indicates the MIME type is understood perfectly, and a value of 0 indicates the MIME type isn't understood at all. The reason the image/gif
MIME type contains a quality parameter of 0.2, is to indicate that PNG is preferred over GIF if the server is using content negotiation to deliver either a PNG or a GIF to user agents. Similarly, the text/html
quality parameter has been lowered a little, to ensure that the XML MIME types are given in preference if content negotiation is being used to serve an XHTML document.
[Back to the table of contents]
Setting the MIME Type with PHP
In PHP, the MIME type is set through the header
function. The $_SERVER
array contains the server variables, allowing us to interrogate the Accept HTTP header.
header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
header("Content-Type: application/xhtml+xml; charset=utf-8");
else
header("Content-Type: text/html; charset=utf-8");
[Back to the table of contents]
Setting the MIME Type with ASP
In ASP, the content type and the charset are specified separately through the Response
object. The ServerVariables
collection allows us to interrogate the Accept HTTP header.
If InStr(Request.ServerVariables("HTTP_ACCEPT"), "application/xhtml+xml") > 0 Then
Response.ContentType = "application/xhtml+xml"
Else
Response.ContentType = "text/html"
End If
Response.Charset = "utf-8"
[Back to the table of contents]
Setting the MIME Type with PERL
In PERL, the MIME type is set by writing it out at the start of the page, before any other content. The $ENV
hash array contains the environmental variables, allowing us to interrogate the Accept HTTP header.
if ($ENV{'HTTP_ACCEPT'} =~ /application\/xhtml\+xml/)
{
print "content-type:application/xhtml+xml; charset=utf-8\n\n";
}
else
{
print "content-type:text/html; charset=utf-8\n\n";
}
[Back to the table of contents]
Conclusion
According to W3C Guidelines, XHTML 1.1 should not be served with a MIME type of text/html
. "Should not" is not as serious as "must not", so for the time being, many content developers are overlooking this particular recommendation. It is still clear that XHTML 1.1 should be served with a MIME type of application/xhtml+xml
. The techniques outlined above can easily be extended to serve text/html
and a DOCTYPE of HTML 4.01 Strict to user agents that don't understand application/xhtml+xml
, and application/xhtml+xml
and a DOCTYPE of XHTML 1.0 Strict to those that do. This page is served as HTML to user agents that don't understand application/xhtml+xml
, and XHTML to those that do. See Tommy Olsson's article on content negotiation for a detailed explanation on how to do this with PHP.
[Back to the table of contents]
Category: Web Standards.
[content-negotiation.php#comment1]
Googeling around I came over your site.
Impressive, as far as MIME history etc are concerned. But I miss discussion of the extension problem as far as helper applications with hardcoded extensions are concerned. It is easy with Opera, lynx and Netscape up to Netscape 7, but I dont know any solution with Mozilla and the like, including Netscape 8.
Im Opera etc, I get the extension, that is in mime.types (or similar menus), also in the browser cache, in Mozilla etc I get this extension on the desktop, where I dont need it (save as..), but not in the browser cache, where applications see it. For example bla.php, that applications, that come with hardcoded extensions, would not understand.
Not a XML problem, but MIME,
anyway, best,
H.
Posted by Heiko Recktenwald on