Summary
There's a lot of confusion about whether or not it is okay to serve XHTML documents as application/xhtml+xml
. This article investigates the benefits of XHTML over HTML, and which is the most suitable MIME type. A MIME Type Test Suite has been added to help illustrate the differences between sending XHTML as application/xhtml+xml
and text/html
.
Author: Gez Lemon
Contents
- Hype about MIME
- The W3C's take on MIME Types
- The IETF's take on MIME Types
- Validation and Semantics
- Well-Formed XML
- Benefits of XHTML
- Further Reading
Hype about MIME
There have been a lot of articles recently about web standards; in particular, using XHTML and serving it as text/html
. Personally, I'm not that bothered whether people serve XHTML as text/html
, but think it's important that authors understand why this is wrong. Although I'm not bothered about content developers serving XHTML as text/html
, I don't agree with people encouraging content developers to deliver XHTML as text/html.
The W3C's take on MIME Types
Molly recently made a great post about web standards on the Web Standards Project (WaSP), which she cross-posted on her personal website so that people could comment. Like most of the work Molly produces, the article is a great read for anyone interested in web standards. There are, however, a couple of points that I'm not entirely convinced about.
Firstly, Molly correctly points out that the W3C is not a standards organisation, but an organisation that provides specifications and recommendations. Molly goes on to point out that these have been adopted as de facto standards rather than official standards, so referring to them as web standards is a misnomer. Referring to specifications from the W3C as being industry standards is slightly presumptuous, as the W3C is not an official standards organisation. De facto standards are standards that have been adopted by the relevant industries, but where a recognised standards body, such as the International Standards Organisation (ISO), hasn't issued the standards. Although the term de facto explicitly indicates that the standards are provided by an organisation that isn't officially a standards organisation, in the case of HTML and XHTML, they've non-the-less been adopted as standards, de facto or otherwise. For this reason, I don't believe that referring to de facto standards as standards is a misnomer as such.
The IETF's take on MIME Types
Referring to XHTML documents, Molly references a note about media types issued by the W3C. A few of the comments in Molly's post point out that the note isn't an official W3C recommendation, but a note; in other words, there is no consensus of opinion by the W3C: it's just a note. More to the point, the Internet Engineering Task Force (IETF) are the body that produce standards for the transport layer of the Internet through Request For Comments (RFC). The relevant RFCs for the XHTML MIME type issues are RFC 3023 (XML Media Types, proposed standard), RFC3236 (application/xhtml+xml media type, informational), and RFC 2854 (text/html media type, informational).
Validation and Semantics
Finally, Molly states that validation isn't a requirement, but conformance is (a point I do agree with). Validation (in the context of Molly's post) is a measure of how well a document conforms to a recognised Document Type Definition (DTD). A validator can only ensure that the markup elements are legal, and contain the appropriate attributes according to the DTD; a validator couldn't possibly know whether the elements are semantically correct for the content, so it stands to reason that validation should not be a requirement, but is a useful yardstick for testing whether a document conforms to particular specification.
The crux of Molly's post is to reaffirm that standards (regardless of whether they're de facto standards) play an important part of being a professional, and that we all have a duty to encourage rather than discourage best practice. Returning to the MIME type issue, understanding what you're compromising by delivering XHTML as text/html
is important. User agents treat XHTML delivered as text/html
as tag soup. The term tag soup refers to poorly written web documents that browsers, through really quite incredible parsing, still manage to make sense of. For example, most browsers will manage to correctly render a list of <li>
elements that are not contained within a <ul>
element. Serving an XHTML document using an incorrect MIME type does not result in well-formed markup becoming tag soup, but it does result in the parser having to treat the document as if it is tag soup as there will be things in the document that the HTML parser doesn't understand, and will have to rely on its error-handling capabilities in order to parse.
Well-Formed XML
XML parsers are required to stop processing a document on the first error they encounter. As most XHTML documents are not delivered using the correct MIME type, many authors of XHTML documents will be oblivious to this. To illustrate it, I've provided a document that contains an unencoded ampersand. If you view the document using an XML capable browser (such as Firefox, or Opera), a parse error is displayed. With Firefox, no effort is made to parse the document if it isn't well-formed, whereas the latest version of Opera is slightly more friendly in that it will parse as much information as it can, and stop on the first error it encounters. Some browsers, such as Internet Explorer, are completely unable to handle XHTML delivered with the correct MIME type. To illustrate this, I've produced a document that serves content as application/xhtml+xml regardless of the capabilities of the user agent. If you view this document with Internet Explorer, you will be prompted to download the document, as Internet Explorer doesn't understand what it is meant to do with the document. As Internet Explorer is still the most widely used browser at this moment in time, content developers delivering XHTML as application/xhtml+xml
must use content negotiation to ensure that older browsers (such as IE) are still able to receive the content.
Benefits of XHTML
HTML 4.01 is the latest version of HTML, and would be far more appropriate for pretty much all of the content on the web at the moment, with some notable exceptions. I've heard many reasons as to why credible content developers use XHTML, and deliver it using an incorrect MIME type, but I've yet to hear a reason that sounds credible. I think if people were honest, they would admit to buying into the XHTML rhetoric without fully understanding the consequences. That pretty much sums up how I've ended up with a set of XHTML documents. The only benefit of using XHTML for me has been that I'm able to easily parse comments written in XHTML to ensure they're well-formed and conform to an XHTML 1.0 Strict DOCTYPE before committing the comment. In fairness, I've made much bigger mistakes than buying into the XHTML rhetoric. Anyone who has known me long enough to follow the progress of this site will probably recall that it was once a Flash website, that was re-written to include a Java navigation system (with none of the Java Accessibility features). As if that's not bad enough, and it's painful to admit this, I even carried a message that this site was best viewed in Internet Explorer. This is the first ever website that I have built, and like everyone else working in this industry, I'm on a perpetual learning curve.
When I started out, I would have made far less fundamental errors if information about best practice were more widely available. The situation is a lot different today, as there are many, many standards activists all providing good advice on best practice. However, many are split over the XHTML MIME type issue, making it difficult for someone starting out today to make an informed decision as to which markup language to use; XHTML or HTML. The rhetoric surrounding XHTML is that as it's well-formed, it has a much stricter syntax than HTML, resulting in user agents that are lighter as they aren't required to have so much error handling capabilities. Serving XHTML as text/html
goes no way towards achieving this goal, meaning that the real benefits of using XHTML aren't going to be realised any time soon.
XHTML MIME Suite Tests
Jacques Distler suggested putting together some examples of valid XHTML documents to illustrate the difference between serving content as application/xhtml+xml
and text/html
. It's a very good idea, so I've put together the following basic XHTML MIME Type Test Suite.
Further Reading
- Sending XHTML as text/html considered harmful
- MIME Types Matter
- It's just a note
- Pretending to Use XHTML
- HTML Compatibility Guidelines
Category: Web Standards.
[all-in-the-mime.php#comment1]
I don't understand why validation shouldn't be a requirement. If a document is conformant to the standard it will validate too. It shouldn't be the only requirement for a conformant document, but it certainly should be a requirement for a conformant document.
Posted by Anne on
[all-in-the-mime.php#comment2]
I'm not telling people not to use application/xhtml+xml, I was merely pointing out that since so few people serve their pages as such, there is no point in jumping through the many many many hoops involved in writing a DOM Flash embed script.
Posted by Geoff Stearns on
[all-in-the-mime.php#comment3]
Geoff, you are the kind of guy that should not use XHTML at all. You are still using HTML, writing javascript intended for HTML. There is exactly 0% benefit in using XHTML as your markup language.
And by the way, you are not conformant. Writing some (non-conforming) markup with javascript doesn't make you conform to the standard. The opposite, actually.
Posted by Anne on
[all-in-the-mime.php#comment4]
Anne, you're quite right - it's a matter of semantics. Conformance is a requirement, and validation is used to test conformance. If I don't validate this document, it doesn't necessarily mean it won't conform to the specifications. Validation merely confirms that to the best of a validator's ability, this document does conform to the appropriate DTD. If I want confirmation that this document is in conformance, then validation will help, but isn't a requirement.
Posted by Gez on
[all-in-the-mime.php#comment5]
It reads as though you're telling people not to use application/xhtml+xml to me. The relevant part of the article I'm referring to is:
You then go on to provide a list of well-respected websites that do exactly that. If, for whatever reason, XHTML is unsuitable for your content, then as Anne says, the obvious choice of markup language would be HTML. Encouraging others to deliver XHTML with the wrong MIME type is exactly the kind of advice that's going to ensure the advancement of XHTML never happens.
I'm sorry if this appears to be a personal attack on you; it was never meant to be. Your website just happened to be one that I visited recently that was advocating serving XHTML as text/html.
Posted by Gez on
[all-in-the-mime.php#comment6]
I think you skipped over quite a bit of relevant material here:
What are the advantages to using HTML over XHTML served as text/html? Are there any at all? You seem to be saying that it's basically the same thing since the browsers treat it as 'tag soup,' so why not use the one that is newer and will force more new developers to write better markup that is more accessible?
I would think that at least requiring people to close their tags and provide things like alt attributes on their images would be a nice step up from HTML. Sure, you can do that in HTML as well, but with XHTML you have the nice advantage of having tools that will point out omissions in your markup that would be ignored in HTML.</p>
Right, however it's not possible at the moment to write comforming markup for the Flash plugin. So for those people who want the few advantages of XHTML and want to use flash as well, we have to bend the rules a little bit, so we try to ruffle as few feathres as possible along the way.
Posted by Geoff Stearns on
[all-in-the-mime.php#comment7]
The obvious benefit is that you're using a specification as it is intended to be used. You've already stated that you're not prepared to use DOM compliant scripting techniques, so I fail to see exactly what benefit XHTML has brought you.
The term tag soup refers to markup that is unstructured. Valid HTML isn't tag soup, and is far more accessible than invalid XHTML.
An alt attribute is a required attribute for an image in HTML 4.01, so there isn't any benefit there. It would help in ensuring all elements contain a closing tag, but discovery of unclosed elements is hardly a good reason to abuse a standard when it's possible to close all non-empty elements in HTML anyway, and still have a conforming document.
Posted by Gez on
[all-in-the-mime.php#comment8]
Is it really abuse when the standard allows you to use text/html?
As for not being prepared to use DOM scripting, it's not that I don't like it or anything, it's just that in the specific case where you are using Javascript to embed a Flash file [by appending an object tag to the dom] in a page there are many many issues due to the way browsers handle the object tag differently. My post wasn't a blanket statement about using javascript + application/xhtml+xml at all and I think that's where the confusion was.
Posted by Geoff Stearns on
[all-in-the-mime.php#comment9]
Sorry, you're right; abuse is a bit strong. I just feel it's an important issue for people to understand, for more or less the problems you've encountered.
Posted by Gez on
[all-in-the-mime.php#comment10]
You're a courageous man!
You have seen what happens to people who have the audacity to point out that XHTML shouldn't be served as text/html. You will be an outcast among your peers; scorned and ignored.
But you are 100% right, of course.
Posted by Tommy Olsson on
[all-in-the-mime.php#comment11]
I think you'll convince 6 or 7 more people by providing an example of a perfectly valid XHTML page which, when served as text/html, renders just fine, but which bursts into flames when served as application/xhtml+xml
Some ideas:
And this is all assuming that we start with well-formed XHTML. That's true (if we're being generous) of perhaps 1% of XHTML pages served as text/html.
Posted by Jacques Distler on
[all-in-the-mime.php#comment12]
Great ideas, Jacques. I'm at work at the moment, but will put something together when I get back home.
Posted by Gez on
[all-in-the-mime.php#comment13]
Oh, and don't forget my favourite issue: character encodings!
text/html
andapplication/xhtml+xml
have different precedence rules for determining the encoding of a document. So you can easily create a page containing valid XHTML, which will display correctly when served astext/html
, but which will give you a "yellow screen of death" when served asapplication/xhtml+xml
.This one seems a little more artificial than the others. But you'd be surprised how frequently even people who understand these encoding issues get bitten by them.
Posted by Jacques Distler on
[all-in-the-mime.php#comment14]
Blimey.
The reason I started to use XHTML was because it allowed me to have smaller file sizes and conform to a common standard. Like Gez, I started off as a Flash developer and came very late to the standards community.
I have heard of this issue but to me the bottom line in this case is 'doesn't work in IE'. I couldn't possibly justify using a document that doesn't work at all in IE.
Possibly I've got hold of the wrong end of the stick here and I'm sure there's a few workarounds for this but I guess this definitley puts me in the 'buying into the XHTML rhetoric without fully understanding the consequences' camp Gez mentions.
I'm asking for a little help here . I recently got into a very long debate elsewhere about a possible standard of accreditation for designers including web standards as a criterion and now it seems there's a whole (and seemingly vitally important) piece of info thats totally blind-sided me. I can see the practical good of W3C's web standards and WCAG and I can see the practical good of having a law to ensure the accessibility of business sites but I can't see the practical good of serving a MIME type that excludes the most used browser on Earth.
Posted by Kev on
[all-in-the-mime.php#comment15]
I would be surprised if HTML worked as it's supposed to do in Explorer myself. As for the practical good (not much at the moment) well, I think if enough people keep the theoretical ball rolling... One day the major market-share browser vendor might decide to accept
application/xhtml+xml
then it may not just be niche market though we know that's not going to happen for a long time, we await for the masses to play catch-up or convert.Posted by Robert Wellock on
[all-in-the-mime.php#comment16]
Jaques - I've been toying with the idea of doing just that myself. If I could only find the time ...
I did another thing, though, just to disprove another "XHTML is our saviour" myth. I created an ill-formed document (but valid HTML) with an XHTML 1.1 doctype, then served it as text/html. No Yellow Screen of Death, of course, because browsers see it as badly written HTML.
Kev - I don't know how you arrive at a smaller file size when using XHTML over HTML. If anything, it should be slightly larger. Now, using a Strict doctype with all presentation in CSS, over a Transitional with presentational attributes, would most likely result in a significant reduction in file size. Regardless of whether you use HTML 4.01 or XHTML 1.0. Empirical evidence shows reductions in the 40-60% range.
Posted by Tommy Olsson on
[all-in-the-mime.php#comment17]
Agreed. The best solution is to use content negotiation, and deliver XHTML to user agents that can handle application/xhtml+xml, and HTML 4.01 to the rest. Realistically, the world isn't ready for XHTML yet, as Internet Explorer cannot handle it when it's served with the correct MIME type. If content negotiation isn't an option, and you need to ensure that Internet Explorer is included, then at this moment in time, and probably for the foreseeable future, the best solution would be to use HTML 4.01 Strict.
I admire your integrity, Kev. I think it's true of most people, but I don't think many would admit it.
Posted by Gez on
[all-in-the-mime.php#comment18]
You're probably right, Tommy
Posted by Gez on
[all-in-the-mime.php#comment19]
Well, welcome to the club!
Posted by Tommy Olsson on
[all-in-the-mime.php#comment20]
I'll be in good company then
I don't think you've lost the respect of your peers, Tommy. Some people don't see the MIME type issue as being important, but there's a growing number of people that do get it.
Posted by Gez on
[all-in-the-mime.php#comment21]
Netscape's DevEdge did used to have some of those tests available the article was called
though now you are condemned to using the: "The Wayback Machine" to access the document though it was good for beginners."The society which scorns excellence in plumbing as a humble activity and tolerates shoddiness in philosophy because it is an exalted activity will have neither good plumbing nor good philosophy...neither its pipes nor its theories will hold water."
So what, if you falsely think you have become a shambling leper Tommy that doesn't mean you are a lesser person, you're being too hard on yourself and anyway I've been lifelong member it does have its perks.
Posted by Robert Wellock on
[all-in-the-mime.php#comment22]
I agree, Gez - you most certainly have not lost the respect of your peers, Tommy... there are a growing number of people that do get it, recognize it as important. Having said that, I suspect a lot of people aren't prepared to actually do anything about it at this point.
I worked at implementing content negotiation over at WATS.ca and was met with a number of yellow screens of death. I didn't have time to get through all the pages and deal with it appropriately, and we have one resource on there that still has me puzzled. So, it stays as text/html for now because the reality is that it may; if it was a must, then I'd make the time to do it sooner rather than later. That doesn't mean I won't be addressing it -- just that I am comfortable putting it on the back burner.
As for why I even use XHTML, well... it boils down to this: When XHTML was an official recommendation in January of 2000, I waited for a while then converted everything over to XHTML 1.0. Back in those days, all the first version of the XHTML spec said this about media type:
Everything I've built since about August of 2000 has been XHTML. And honestly, until about a year ago, I don't think I ever really read that part of the spec again to see the updates where they provided more detail on MIME types. In short, I know I need to change my evil ways and get on with content negotiation, but it isn't a game breaker for me at this point when taken into consideration with all of the other stuff that is happening...
Posted by Derek Featherstone on
[all-in-the-mime.php#comment23]
Your third test, the "Styling the body element" test, shows two identical pages in Opera.
The "JavaScript enclosed in SGML Comments" and "Hybrid Comment Test with JavaScript" tests - both versions worked fine in Opera, the functions working perfectly.
Nice text suite though. It's very useful to see things like this in action!
Posted by Dave Child on
[all-in-the-mime.php#comment24]
Derek, without the correct MIME type, you are not using XHTML and your document MUST NOT be treated as XHTML. You are aware of that, right?
Posted by Anne on
[all-in-the-mime.php#comment25]
Yes, I'm aware of it, and I'm ok with it. The MUST that I was referring to was the fact that I MAY serve it as text/html. If the spec told me that I MUST serve it as application/xml+xhtml then I would take the time to do it pretty much right away... It is a compromise I'm willing to make until I can get a chance to fix things up.
Posted by Derek Featherstone on
[all-in-the-mime.php#comment26]
regarding the previous comment: that is obviously a problem with Opera's handling of true application/xhtml+xml
yes, nice tests. remind me of something i remember seeing back on the netscape devedge site (before AOL ungraciously pulled it).
Posted by patrick h. lauke on
[all-in-the-mime.php#comment27]
Derek - it doesn't matter that "a lot of people aren't prepared to actually do anything about it at this point." The important thing is that they are aware that their documents are parsed and treated as HTML, and that they shouldn't keep using old HTML-specific tricks with their new code. If they are aware of that, they can simply change the media type one day and things will keep on working. If not, they're in for a nasty surprise.
If you get the Yellow Screen of Death when serving the proper media type, you really shouldn't be using an XHTML doctype. If you cannot make a document well-formed, you should at least stick to HTML where the error handling is somewhat standardised. I really think that you could fix the error, though, because I know that you're a very capable person.
I just posted a write-up on content negotiation (in PHP) on my blog, since quite a few people have been requesting that for some time. I don't mean to steal any attention from Gez's excellent article; I just think that it could be useful information for those who aren't yet aware of all the issues. The URL is http://www.autisticcuckoo.net/archive.php?id=2004/11/03/content-negotiation (I hope you don't mind, Gez, or feel free to delete it).
Posted by Tommy Olsson on
[all-in-the-mime.php#comment28]
Of course I don't mind. There's already an article on here for specifying MIME types, which shows how to do it with ASP, PHP, and PERL. I've added a link to your article at the end, as yours deals more specifically with content negotiation, and is a really well done.
Posted by Gez on
[all-in-the-mime.php#comment29]
I discovered the same thing with Opera while writing the tests, Dave, and came to the same conclusion as Patrick.
If you can remember any of the tests they did, let me know and I'll add them to the tests here.
Posted by Gez on
[all-in-the-mime.php#comment30]
When I say that they aren't prepared to actually do anything about it at this point, what I'm trying to say is that the fact is that there are a group of people that haven't the time to fix something that is already working (albeit working as HTML, not XHTML). It is simply the fact that we still know that we MAY serve it as text/html that allows us to sleep at night, even if we know it is not 100% best practice.
Agreed... and for those of use that didn't ever use those old HTML specific tricks in the first place... we're much better off ;)
I'd like to think that I could fix the error as well (thanks for concurring!), but alas, I have been unable to find a solution. Patrick and I actually tried to figure out a solution to this one before, but we couldn't come up with anything. But maybe yourself, Gez, Anne, or Jacques (or anyone else for that matter) can help me with this one.
The only reason I'm not using content-negotiation (PHP-based, by the way) on WATS.ca is because of this Testing Tools Resource page. The JavaScript based bookmarklets wreak havoc when served as application/xml+xhtml. Take a peek at the source and it becomes pretty obvious why it results in the yellow screen of death.
I was trying to come up with a means to escape characters within the bookmarklet, but I couldn't come up with a solution at all. Honestly, I haven't looked at it in a few months -- worst case scenario, I suppose I can pull those bookmarklets out of that page and post them on a static page written in HTML 4.01, and remove it from the CMS. I was hoping that I wouldn't have to do that, though, but it may be my only choice...
Posted by Derek Featherstone on
[all-in-the-mime.php#comment31]
Derek - well, I did a little test and it wasn't very hard to make that page well-formed and valid. Took me about 5 minutes. Look at http://autisticcuckoo.net/temp/44.php and you'll see for yourself. I might have missed something vital, though, because this was a no-brainer. I added a header so that this page is always served as application/xhtml+xml (don't try it in IE, in other words . I don't get the Yellow Screen of Death in Mozilla, and it validates with the W3C validator.
On the other hand, a page like this with so much inline JavaScript really ought to be HTML, in my humble opinion. Besides, the favelets generate HTML (sometimes invalid HTML too; there are some unquoted attributes with non-alphanumeric characters in them).
Gez - Thanks.
Posted by Tommy Olsson on
[all-in-the-mime.php#comment32]
Egads. I'll have to test them out for functionality. I was sure that when I was working at this before, I tried escaping the < and > but there was something about it that didn't work. Or perhaps I was imagining things. Perhaps you should take back what you said about me being a capable person!!
Exit stage left, egg on face, tail between legs, questioning sanity...
Posted by Derek Featherstone on
[all-in-the-mime.php#comment33]
I had a question on Content Negotiation and Javascript. Is there a way to make JS optimised for XHTML not blow up if you run a PHP to turn a page into HTML??
Posted by Manny Fleurmond on
[all-in-the-mime.php#comment34]
That depends on how you write the script and in most cases you should use external JavaScript files with XHTML if at all possible.
Posted by Robert Wellock on
[all-in-the-mime.php#comment35]
As Robert said, it might depend on the script. Make sure that all tag-related DOM calls work with both uppercase and lowercase element names, and most of it will be fine. It's a lot easier to mess up the other way around, i.e. write JavaScript that works in HTML but not in XHTML.
It depends on how you have "optimised" the script for XHTML.
Posted by Tommy Olsson on
[all-in-the-mime.php#comment36]
Just a short comment - shouldn't "HTML 4.01 is the latest version of HTML" read, 'HTML 4.01 is the final version of HTML'? W3C have said that it won't be changed other than sorting bugs.
This offers developers a rock to develop on - in a sea of change.
All the best,
Jim
Posted by Jim Byrne on
[all-in-the-mime.php#comment37]
You're quite right, Jim. HTML 4.01 is the final version of HTML from the W3C, but the Web Hypertext Application Technology (WHAT) Working Group are doing some good work in progressing HTML, such as Web Forms 2.0 (first stable draft). Whether we end up with a completely new specification for HTML remains to be seen, but I believe that HTML has a future.
Posted by Gez on
[all-in-the-mime.php#comment38]
There is a quite a large chance, when all copyright issues are resolved, that the WHATWG will develop HTML 5.0 and that once that specification has interoperable implementations it will be given to the W3C to become a formal specification.
Posted by Anne on
[all-in-the-mime.php#comment39]
Not exactly sure what you are testing. I thought the test suite is for testing "valid" XHTML served with different MIME types. In test 1 you are serving invalid XHTML. In test 2 you are not comparing MIME types. In test 9 you are using invalid CSS for an XHTML document. Anyone still using "document.write" (test 8) should be banned from using the Web.
Posted by Phil on
[all-in-the-mime.php#comment40]
Sorry, I should have made it clearer. The suggestion was to create a set of tests that worked with valid XHTML, but I've also included invalid examples so that people could see what sort of effect it had. The reason for writing the tests is that there are still a lot of people using XHTML as if it was HTML.
Test 1 is meant to show what happens when a document is not well-formed; test 2 is meant to show that not all browsers are able to handle
application/xhtml+xml
; test 9 is meant to show that CSS selectors are case sensitive when delivered asapplication/xhtml+xml
; and test 8 is for those people that refuse to accept document.write is a problem at all, but are in no way advocating the use of serving XHTML as text/html.Posted by Gez on
[all-in-the-mime.php#comment41]
This is a great idea for demonstrating the core differences (ie. what will break) between MIME types and will make a handy reference for people learning - nicely done.
The innerHTML property is also affected under application/xhtml+xml and is read-only (I don't think this had already been mentioned.)
I have read that innerHTML is not officially a W3C standard anyways, but I would imagine a lot of people would be wondering why it breaks under this MIME type due to its popularity (one string instead of multiple DOM calls, performance as in http://www.quirksmode.org/dom/innerhtml.html,) and resulting widespread use.
Posted by Scott Schiller on
[all-in-the-mime.php#comment42]
Hi Scott,
A test for innerHTML is a good idea, even though it isn't part of the DOM. It's a widely used JavaScript method, so worth having.
Thank you for the suggestion.
Posted by Gez on
[all-in-the-mime.php#comment43]
Since you're branching out a bit to look at general differences between XHTML served as text/html and as application/xhtml+xml, you could, for instance, include some examples which work as the latter, but break when sent as text/html. The most obvious one would be including MathML content.
Also, you should include some character-encoding tests.
Here's one test. Compose an otherwise valid XHTML page in iso-8859-1 (using characters beyond 7-bit ASCII, of course). Iso-8859-1 is the default MIME-type for text/* documents, so it will render correctly when sent as text/html. You can even include an
declaration. If sent to the validator as text/html, this page will validate perfectly well.
What happens to this page when served as application/xhtml+xml?
The default encoding for application/*+xml is utf-8. The <meta> declaration is ignored. The validator will barf and refuse to even attempt to parse the page. Mozilla will display a page of gibberish, which is more than it should. Technically, it should display a yellow screen of death (since the page is not well-formed XHTML).
Posted by Jacques Distler on
[all-in-the-mime.php#comment44]
Thanks, Jacques; I'll add them when I get back from work this evening.
Posted by Gez on
[all-in-the-mime.php#comment45]
At the risk of redundancy, let me remind you of what the encoding tests should cover.
The precendence rules for determining the encoding of an application/xhtml+xml document are
The precendence rules for determining the encoding of an text/html document are
So there are lots of possibilities for a mismatch.
Posted by Jacques Distler on
[all-in-the-mime.php#comment46]
I'll provide a form so that people can choose the encoding, and where they want to declare it, along with the MIME type. I'll have to do it tomorrow now, as I've just got back from work and need to eat
Posted by Gez on
[all-in-the-mime.php#comment47]
I hope it's not too late to join the discussion.
I've been reading around several such discussions, among which is the one on Tommy's site.
I agree that XHTML should be served with the appropriate recommended mime, even more so because, using XHTML 1.1 myself, I "SHOULD NOT" (red flag) use text/html but rather "SHOULD" (greenest flag) use application/xhtml+xml.
Then what? Well as far as I've understood until now the only way for me to do it is to use some server-side scripting. I don't have acces to the server, the site I'm working on will be a "homepage" hosted on a mutualized server (if you wonder why the heck I bother with XHTML 1.1 on a homepage well, suffice to say that it's my absolute right, and I do have a good reason too).
The question is how come this isn't automatic? I never had to worry about mime or content type when I used HTML 4. Was it done automatically, are the servers set to text/html by default? Or did everybody send mystery tag soup, relying on the browser's ability to parse it anyway?
This may sound like a noob question (and I won't be offended) but I think it's a good question, that may explain the reluctance of many designers to serve application/xhtml+xml.
Bottom line is nobody wants their page to be inaccessible to IE users and we should be able to serve them tag soup or whatever works for them. I mean we have all implemented some sort of "box model hack" or other CSS contortion to contend with IE, so I think we will go the extra hundred miles, until we can forget about IE.
On a sidenote, according to my rudimentory tests it seems that IE (from IE 5.0 to IE 6 SP2) doesn't really break (i.e. asks you to download an unrecognized mime type) when served an application/xhtml+xml content-type until you add the XML prolog. It seems the XML prolog is really what makes IE "soil itself" as a guru once put it.
(More detail on what I'm talking about in Tommy's article comments :
http://www.autisticcuckoo.net/archive.php?id=2004/11/03/content-negotiation)
BTW I'm using Juicy Studio's PHP script to serve the appropriate content-type. Thanks for that and all the discussion going on here.
Very helpful!
Posted by ghola on
[all-in-the-mime.php#comment48]
Not at all; thank you for joining in.
It usually is automatic. Servers are configured so that they insert the appropriate MIME information into the HTTP headers on each transmission, which help the browser understand what to do with the retrieved item. The problem is that by default, a server will be configured to insert a MIME type of text/html into documents with a .html extension, as that is the correct MIME type for HTML.
IE asks to download the document regardless of whether or not the XML declaration is included.
Posted by Gez on
[all-in-the-mime.php#comment49]
Gez,
I see the light now!
I must have screwed up in my test somewhere... I'll spare you the details.
So I can confirm now out of my own experience that IE will cry mummy if served application/xhtml+xml.
I must say to everyone who reads this article that the end of my previous comment was based or erroneous tests and therefore you should not pay attention to it!
Your article was very helpful, as was Tommy's on the same subject. I really believe it is important webdesigners/authors understand what they do when they send XHTML as text/html. It is all the more important with XHTML 1.1 where it is clearly WRONG to do so.
We need more articles like this one with practical examples.
Thank you so much for your help.
Posted by ghola on
[all-in-the-mime.php#comment50]
I am writing a list of differences for the Mozilla Web Author FAQ. The draft is attached to bug 271261.
Posted by Henri Sivonen on
[all-in-the-mime.php#comment51]
Thanks for posting that, Henri; the definition of an XML parser (expat) is worth a read by itself. I was particularly interested in the following:
It would be interesting to hear your take on HTML as text/html over XHTML as text/html, which is where the idea for these tests came from.
Posted by Gez on
[all-in-the-mime.php#comment52]
I think XHTML as text/html is a case of the emperor's new clothes. It is treated as tag soup by browsers and you still need a special serializer if you are producing it with XML tools. What's the point?
Posted by Henri Sivonen on
[all-in-the-mime.php#comment53]
I couldn't agree more.
Posted by Gez on
[all-in-the-mime.php#comment54]
I just like to say I think your article was the best one I have read on the subject of MIME types. (I have read a lot of them and its giving me a head ache!) I am new to this so your examples really helped but I have found what I was looking for here, Great job!
Posted by STEVE on
[all-in-the-mime.php#comment55]
I noticed an interesting behaviour (in firefox) in the "MIME Type Test Suite - Character Encoding Test"
When setting the character encoding to utf-8 and mime type to "application/xhtml+xml", only checking off "In the HTTP Headers" will actually cause the page to be recognized as utf-8. Setting the character encoding in the XML prolog does not yield the desired effect as discussed above (http://juicystudio.com/article/all-in-the-mime.php#comment45).
Perhaps someone could also point me towards an article or reason why setting the character encoding to utf-8 makes the British currency symbol become a question mark.
Thanks!
Posted by John on
[all-in-the-mime.php#comment56]
What are the advantages to using HTML over XHTML served as text/html? Are there any at all? You seem to be saying that it's basically the same thing since the browsers treat it as 'tag soup,' so why not use the one that is newer and will force more new developers to write better markup that is more accessible?
Please tell about this...
Posted by Damage on
[all-in-the-mime.php#comment57]
Sory... I am want to ask else how W3C's work?
Posted by Damage on
[all-in-the-mime.php#comment58]
The advantage of serving HTML over XHTML as text/html is that browsers understand HTML correctly, as do search engines and other automated programs. IE does not understand XHTML natively so there is no advantage to using it whatsoever. Writing XHTML does not force developers to write better markup. Most XHTML on the web is invalid and if it were served using the correct MIME type it wouldn't be displayed by XML capable browsers.
Posted by Dave Sandford on