XHTML or HTML5? Summary of XHTML’s Current State
A look at the various problems and benefits of XHTML. The article makes a case for using XHTML as opposed to HTML4 as advocated by some developers. The article includes a discussion of W3C’s recommendations and how realistic they are to implement at this point.
W3 Says XHTML.
That’s a pretty half-baked reason to chose one over the other. But if everyone always followed the W3’s recommendations the web would be a simpler place. Of course since Internet Explorer doesn’t support XHTML, it’s a pretty irrelevant argument all things considered. But alot of web developers are getting sick of writing HTML4 (with a million ID selectors for IE’s poor CSS implementation!). So, there’s always the moral responsibility aspect.
W3 currently recommends use of the XHTML 1.0 Strict DTD in the following general template:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>An XHTML 1.0 Strict standard template</title>
<meta http-equiv="content-type"
content="text/html;charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
</head>
<body>
</body>
</html>
XHTML is more human interoperable.
Well, it is! Tags should be properly nested and closed, and all in lowercase. It makes for a more aesthetically-pleasing, more readable document. It also makes it alot easier to validate, as well. XHTML is a better standard than HTML because it doesn’t permit as much nonsense.
X is a cool letter to start an acronym.
This is the main reason I and many others came to be involved with XHTML. The acronym simply looks cooler than HTML does. Furthermore, words that look cool spread as trends like wildfire. So XHTML is the latest and hippest, and everyone wants to be cool and hip - even if they don’t know what they’re doing.
It actually has new features!
But 90% of the people with XHTML declarations at the heads or their page don’t use these features. A lot of people think, for this reason, that you should just stick with HTML4 for the average webpage. Under certain circumstances, using XHTML can even be harmful.
Validation vs. Interoperability
It is in good practice to ensure that all web documents validate against the chosen DTD before being published. It should be noted, however, that the current use of XHTML as a well-formed document model does not imply absolute interoperability across all user agents. (A "user agent" is what W3 calls a browser. I’ve been reading too many technical documents and the lingo is part of my vocabulary now.) A misconception of many web developers is that a validated document will be interoperable as such. The key issue is that different user agents have varying degrees of support for different document types.
The application/xhtml+xml is the MIME setting for the XHTML document type. However, this MIME is currently only supported by the Presto and Gecko rendering engines (at the heart of Opera and Firefox, respectively.) XHTML files sent with the text/html MIME type are always rendered as HTML documents by all user agents. Chris Wilson of the MSIE development project has reported that IE7, like its predecessors, will not render XHTML documents specified with the native application/xhtml+xml type. So this is mainly a Firefox technology right now.
The W3 has provided a list of guidelines for the HTML compatibility of XHTML documents. A valid XHTML document that fails to abide to these guidelines will likely suffer from inconsistent rendering across user agents.
Since XHTML must be rendered as HTML for all practical purposes at this point, the language is still susceptible to all of the element-specific user agent interoperability issues of HTML.
XML vs. XHTML
Keep in mind that the XHTML definitions are specifically drawn from those in HTML for backwards compatibility. For example:
<?xml><myattribute bool="true">This is a valid XML document</myattribute>
The above is a well-formed xml document, and with the appropriate namespaces set, may be valid XHTML. But, it would not validate against the XHTML DTD because it uses elements and attributes not defined in the XHTML DTD. So how extensible is XHTML at this point, anyway?
A key feature of an XHTML 1.0 Strict document is that the html tag must use one specific namespace, http://www.w3.org/1999/xhtml. While it would not be a strictly conforming XHTML document, it is possible to include different or multiple namespace URIs in this element. Namespaces may be declared for other elements, however. Indeed, the few who make use of XHTML at all often declare namespaces for MathML or SVG.
XHTML isn’t always Extensible HTML?
Only under very unusual circumstances can XHTML documents be considered extended HTML documents. The requirements are that:
- An auxiliary namespace has been defined and is referenced by some element with the xmlns attribute.
- The document specifies and validates against an XHTML DTD (easy!).
- The document is served as application/xhtml+xml and rendered as such by the user agent.
Any other combination and you get tag soup that may or may not parse, let alone render the custom elements.
It should be noted that even if the above conditions have been met, the document will not validate against the XHTML DTD, as this DTD contains no mention of custom elements. These would be described in the DTDs referenced by other namespaces. However, there will be no easy way to validate extended html documents.
More on MIME settings.
Just changing the meta content type tag in your document will not work. Why? Well, Appendix F of the XML DTD states:
When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML.
This means, among other things, that the default MIME settings associated with a given file extension, as specified by the web server, will take precedence over those declared in an XML family document. Specifically:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
The above is a functionally useless directive for most content providers. The MIME type and charset of the document are decided before this line is even parsed. The decision is based on the server’s directives. It may remain useful as a method of providing information about the webpage, but only when it is accurate.
It may be worth noting that modern Linux servers are setup out-of-the-box to send .xhtml files as application/xhtml-xml, .xml files as text/xml and .php files as text/html.
How do we solve this?
You can use content negotiation to send HTML documents to Internet Explorer users, and XHTML to Firefox and other users. This can be accomplished with .htaccess files, PHP scripts, and ostensibly in several other scripting languages as well. I won’t outline these techniques here, a simple search for “application/xhtml PHP negotiation” on Google should find you more than enough to get started. Keystone Websites, specifically, offers a well thought out solution in PHP.
However, if you’re actually making use of XHTML and include something like MathML on your pages, just changing how the document is sent isn’t really going to cut it. Your IE users won’t be able to take advantage of the XHTML stuff, anyway. So you can script out these portions of your document, and notify users of the issue. Suffice it to say, if you need to use MathML or SVG, that doesn’t change the fact that IE won’t support it. So it’s really something of a tragedy.
Watchout for the XML Document Declaration, too!
The XHTML DTD specifies an XML application using objects known to the HTML DOM. The current XML Recommendation says that XML documents should be declared with the following directive, placed at the beginning of the document:
<?xml version="1.0" encoding="utf-8"?>
Including this directive is harmful for current web development, however, as it causes Internet Explorer 6 to render the page in Quirks Mode. This will, however, be "fixed" in IE7 - That is, IE7 will simply ignore the xml directive. But, hell, if your document is going to render in Quirks Mode, why bother read about DTDs in the first place.
Some consider the xml directive a formality as it is not required by the XML 1.0 Specification. Naturally, it is not required by the XHTML specifications either. Practically speaking, it is only useful for specifying a character set. So contrary to the XHTML template W3 provides, it’s probably a better idea to leave the XML directive out.
So what’s your point?
HTML is too relaxed of a spec. We’re all tired of it anyway. You should use XHTML in your documents because it’s the latest standard, offers more flexibility, and the code looks better. But don’t be a sucker! There are alot of things to watch out for, and if you are going to send XHTML documents you need to address them all. Happy coding!
References
- W3C Quality Assurance. (2005, November 28). Recommended DTDs to use in your Web document. Accessed June 8th, 2006.
- W3C Technical Reports and Publications. (2002, August 1). XHTML 1.0 The Extensible HyperText Markup Language (Second Edition). Accessed June 8th, 2006.
- W3C Technical Reports and Publications. (2004, February 4). Extensible Markup Language (XML) 1.0 (Third Edition). Accessed June 8th, 2006.
- W3C Internationalization Activity. (2006, January 1). Serving XHTML 1.0. Accessed June 8th, 2006.
- Wikipedia. (2006, June 5). Comparison of layout engines (XHTML). Accessed June 8th, 2006.
- The Microsoft Internet Explorer Weblog. (2005, September 15). The <?xml> prolog, strict mode, and XHTML in IE. Accessed June 8th, 2006.
- Mark Pilgrim, XML.com. (2004, July 21). XML on the Web Has Failed. Accessed June 8th, 2006.
- W3C Technical Reports and Publications: Note. (2002 August 1).XHTML Media Types
- Wikipedia. (2006, June 4). Comparison of layout engines (HTML). Accessed June 8th, 2006.
- W3C. (1999 January 14). Namespaces in XML. Accessed June 8th, 2006.
- Ian Hickson. (2004). Sending XHTML as text/html Considered Harmful. Accessed June 10th, 2006.
- Keystone Websites. (Unknown) Serving up XHTML with the correct MIME type. Accessed June 10th, 2006.
Tags: ascii, browser, charset, conformance, css, declaration, declatation, document, dtd, engine, firefox, GNU/Linux, htaccess, html, internet explorer, interoperability, layout, mathml, mime, namespace, php, svg, tag soup, technical, trend, user agent, utf, validate, validation, w3c, web, web developer, web page, wikipedia, xhtml, xml