XHTML Content Negotiation


Introduction

In order to serve XHTML properly, it should be served with a MIME type (also known as “content type” or “media type”) of “application/xhtml+xml”. This isn’t particularly difficult to arrange, but it does present a problem—not all user agents support XHTML served in this way (notably, Internet Explorer ≤ version 7). Serving XHTML with the correct MIME type is clearly not practical, as it would break your site for a large percentage of potential users.

The solution is to send the MIME type “application/xhtml+xml” to user agents which support it, and fall back to “text/html” for those which don’t. This is done by checking the HTTP Accept header, which is sent by user agents to specify which content types they prefer. If the user agent explicitly states that they can handle “application/xhtml+xml” (modern Mozilla browsers do this, but Konqueror doesn’t), then the content is served with that MIME type. If not, then the server will fall back to serving pages as “text/html”.

This of course requires some server‑side scripting, and the solutions used on this page include Apache’s mod_rewrite, ASP.NET (with C#), PHP, and Perl. These have all been written by myself, and have been thoroughly tested to make sure they are reliable. In fact the techniques I am about to describe are used to serve this site!

Potential Pitfalls

This article assumes that you are aware of the differences in the way in which HTML and XHTML MIME types are treated by web browsers (see the links at the end of this article). This is important, because there are some subtle differences in the way in which CSS, the DOM, and language attributes are handled. This can result in scripts which work when served with one MIME type, and break when served with the other. If you don’t have experience with these differences, then you should stick to using these methods in a test environment only.

Where a “charset” parameter has been appended to the “Content-Type” header in these examples, I have used UTF-8 encoding. This should be changed to match the encoding of your documents. Also note that I have omitted this parameter when serving pages with the “application/xhtml+xmlMIME type. This is because the character encoding is specified in the XML declaration (“<?xml version="1.0" encoding="UTF-8" ?>”).

Content Negotiation With Static XHTML Pages

If you have static XHTML pages (those which aren’t pulled from a database, and which don’t differ depending on some kind of user input), there is a simple solution—use a “.xhtml” file extension for your files. You can then assign the media type “application/xhtml+xml” to this file extension, and use content negotiation to serve these pages as “text/html” to user agents which lack the support.

For example, in Apache’s httpd.conf or .htaccess:


# Send .xhtml files with correct content type

AddType application/xhtml+xml .xhtml


# Add "index.xhtml" to list of files to server when a directory is requested
# (adjust to suit the file names you use)

DirectoryIndex index.xhtml index.html index.php


# If the user agent requests a ".xhtml" file, but doesn't advertise
# "application/xhtml+xml" in its accept header, then send the content as
# "text/html".

<Files ~ "\.xhtml$">
  # Send header to indicate content negotiation depends on the "accept" header
  Header append Vary Accept

  RewriteEngine on
  RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml
  RewriteRule .* - "[T=text/html; charset=UTF-8,L]"
</Files>

This solution requires mod_headers in order to send the “Vary: Accept” header (important to avoid caching issues), and mod_rewrite to perform the content negotiation.

Content Negotiation With ASP.NET & C#

Content negotiation with ASP.NET is particularly easy to do, because the controls provided with ASP.NET versions ≥ 2.0 produce compliant XHTML output by default. This example uses C# (C Sharp), but you could just as easily use Visual Basic .NET by substituting in the appropriate language syntax.


public class MyPage : System.Web.UI.Page
{
  private const string XHTMLMIME = "application/xhtml+xml";

  protected void Page_Load(object sender, System.EventArgs e)
  {
    // Send a "Vary: accept" header.
    Response.AppendHeader("Vary", "Accept");

    // Check the array of accepted types for the string "application/xhtml+xml".
    if (Request.AcceptTypes != null)
    {
      foreach (string acceptType in Request.AcceptTypes)
        if (acceptType.Contains(XHTMLMIME))
        {
          Response.ContentType = XHTMLMIME;
          Response.Charset = null;
          break;
        }
    }
  }
}

The content negotiation code is placed inside the “Page_Load” method, so that it is run whenever a page is requested. It simply checks the array of accepted MIME types, and if the type “application/xhtml+xml” is found, then the content type of the response is set to match this. If it isn’t, then the default type (presumably “text/html”) will be sent.

Content Negotiation With PHP

If you are using PHP to send your pages, then content negotiation can be simply achieved with the following few lines of code:


<?php

// Send header to indicate content negotiation depends on the "accept" header
header('Vary: Accept');

/* If the user agent advertises "application/xhtml+xml" in its accept header,
   then send the content as "application/xhtml+xml". */
if (stristr($_SERVER['HTTP_ACCEPT'], 'application/xhtml+xml'))
{
  header('Content-Type: application/xhtml+xml');
}
else
{
  header('Content-Type: text/html; charset=UTF-8');
}

?>

There are two points which you should be aware of when using this method:

Content Negotiation With Perl

The Perl approach is similar to that of the PHP method, although obviously with different syntax.


# Send header to indicate content negotiation depends on the "accept" header
print "Vary: Accept\n";

# If the user agent advertises "application/xhtml+xml" in its accept header,
# then send the content as "application/xhtml+xml".
if ($ENV{'HTTP_ACCEPT'} =~ m/application\/xhtml\+xml/)
{
  print 'Content-Type: application/xhtml+xml' . "\r\n";
}
else
{
  print 'Content-Type: text/html; charset=UTF-8' . "\r\n";
}

Once again, this must be called before any content is sent. Also, if you copy and paste this code directly, don’t forget to include a shebang on the first line, and a second newline after the final header is sent.

Other Programming Languages

I have included solutions for most of the common programming languages used to serve dynamic web pages, but it shouldn’t be difficult to adapt them for use with other languages. For example a C program could use the “getenv()” function to return the HTTP Accept header, use standard library string functions to check it for accepted types, and then simply “printf()” an appropriate MIME type to the response stream.