Negotiated Content and Caching

Preface

This was a discussion paper relating to a temporary feature of content negotiation in HTTP/1.0. Time has now essentially passed it by, given the richer negotiation options of HTTP/1.1 and later additions.

It addressed what I believed to be a real practical issue at the time (mid-1998), but is now here primarily for historical interest.


There are numerous situations where a client accesses a server with a request, the server performs some magic to determine the correct response to the request, and then returns one out of a small number of static documents in order to fulfil that request. This paper relates to any sitation where that scenario arises. One example would be the provision of a language variant from one of a short list of available languages, determined by the client's Accept-language header. Another example would be to send a document in compressed form when the browser indicates via ACCEPT_ENCODING that it can handle it, otherwise to send the uncompressed document. Yet another example is where an image is available in PNG format for those browsers whose Accept headers say they support it, with a fallback to (probably) GIF for those browsers that don't[a].

Some server-side imagemaps are also like this: they cause the server to return one document from a relatively small repertoire of static documents.

The issue is the effect that this kind of activity has on caching. We're assuming that, in general, documents are cached not only in the browser, but also in a proxy cache server that is network-near to a community of clients (and maybe cached in a chain of such proxies on the way from the distant server).

What's supposed to happen

According to the HTTP/1.1 protocol, the intended procedure is for the document to be returned (i.e status 200 OK) along with a Vary: header that tells what features of the request were used to determine the document that was sent. See RFC2616 section 13.6 Caching Negotiated Responses. A cache that supports HTTP/1.1 would be able to cache the document, tagged with this information: if it later received another request with the same "or equivalent" request headers, then it could use the same cached document to fulfil it. (It's not entirely clear to me how this phrase "or equivalent" is able to be evaluated by a proxy on its own initiative, except in some special cases: in general it seems to me that the "magic" that determines how to resolve a request into a specific document reference is some algorithm hidden in the server, and not known in detail to the proxy. It's only if the algorithm is known, e.g for negotiating preferred language or content-type, that the proxy would stand a chance of determining whether a request is "equivalent", and, even then, there would be situations where the proxy cannot determine the best fit, because it doesn't know the full repertoire of document variants that are available to the server. I return to this theme in the next section.)

This is the scenario supported by, for example, the Apache mod_negotiation module, and there's nothing wrong with that as far as it goes.

The full HTTP/1.1 protocol includes the ability for the proxy to inform the distant server of which relevant documents it currently holds cached, and the server could then respond (via status 304) to permit the cache to return an appropriate one, if it has it. I'm not well-informed about how widely this is implemented(???) So, if the cache server can't determine for itself that the resource is suitable, then it seems the distant server needs to send the document afresh. What is the likelihood of someone else making a matching request during the lifetime of the cached copy? It seems to me that this likelihood can differ a lot, depending on which header(s) featured in the Vary.

For example, if the Vary header said that the content had been negotiated on the basis of the Accept header, then it can be expected that the same browser/version will often send the identical Accept header, and the chances of a match are probably quite reasonable. On the other hand, if negotiated on the basis of the User Agent header, then the chance of getting a match again seems rather small, there being thousands of variations of the User Agent header: even between the same release of the same browser, many differences of detail are found, and the tiniest difference must stop the cache server from declaring a match. Language preferences would also exhibit much variation between users, and although a proxy could be programmed to show some intelligence (for example if the request has a first preference for English, and an English version is available, it can be assumed to fulfil the request), it isn't possible in general to determine whether the best-fit to a request is one of the cached variants, without knowing what other variants the server has available, for which purpose the proxy has to pass the request to the server anyway.

A temporary compromise for HTTP/1.0

The solution is described here in protocol terms. Some discussion of server support comes later.

The suggested compromise is this. Instead of sending the selected document back from the server with a status 200 OK, the server would send back a relocation to the real URL of the static document. The document itself would then be eligible for caching.

When, subsequently, another request is made for the content-negotiated URL, the cache again refers to the distant server for a match. Again, the response will be a relocation to the static document. But when the client performs the redirection, it will be able to retrieve the cached copy, instead of having to retrieve the whole document from the distant server.

To illustrate this with a concrete example, let's consider language negotiation. The server has available e.g three static documents: foo.html.en in English, foo.html.fr in French, and foo.html.de in German. The client requests the URL foo.html from the server, the server matches the user's Accept-language headers (let's say fr,de,en), and concludes that the best match is French. So the server sends back a relocation to foo.html.fr, and the client initiates retrieval of this static document, which then gets cached.

Another customer requests the same document, and, despite their different Accept language headers (let's say fr-CA,fr,en,de) the server concludes that French is their best match again, and again redirects to foo.html.fr. This time, when the client responds to this redirection, the cached copy of foo.html.fr is found in the proxy server, and they get it without having to retrieve it again from the distant server.

Some have argued that the proxy could often draw its own conclusions without referring to the distant server, for example if it is holding an English cached version and a request is made whose first preference is English then it can return the English version immediately. However, it's clear that in more-complex cases the proxy can't determine merely on its own initiative which version the server would have sent.

Trade-offs

Clearly, the referral to the distant server, the return of the redirection, and the subsequent transaction startup to retrieve the target of the redirection, is an overhead. The tradeoff is between the additional cost of performing the redirection, on the one hand, and the potential benefits of being able to retrieve the contents from a local cache rather than having to fetch them afresh from the distant server.

Obviously, any benefits will be larger, the larger the target document is. The technique may not be worth considering if the result documents are small (i.e of a size comparable in network resource terms to a redirection transaction).

Bookmarks

If a reader navigates to such a negotiated page and bookmarks it, they'll bookmark the URL of the negotiated instance. In other words, the bookmark won't be portable to a different browsing situation, one in which a different negotiated document version would be needed.

Server support

Well, I looked in Apache. mod_negotiation looks promising, but the result after negotiation has to be an actual path on the server, it doesn't look as though it's possible to have it issue a relocation. I tried configuring .htaccess with either a Redirect or a RewriteRule against the target document's relative URL, but neither of them seemed to be actioned. Indeed, the presence in .htaccess of a Redirect against a target document (the German language variant) seemed to cause mod_negotiation to behave as if the target document didn't exist (it switched to returning the French variant instead, which in the absence of a German variant would indeed have been the best match, in the situation that I was testing).

Obviously it could be done with a CGI (but it would be good to avoid the overhead) or with a custom module. Am I perhaps overlooking something that can be made to work with the existing mechanisms (RewriteRule with SetEnvIf, perhaps)?

Other Views

After I had posted to usenet about this discussion document, Nick Kew suggested that I contact Andrew Daviel (of TRIUMF and Vancouver Webpages, justly famous for the Cache Now! initiative), and indeed he had already considered and briefly discussed the issue, at least in relation to HTTP/1.0, and had come to a very similar conclusion. I apologise to him for having overlooked that page on my visits to his excellent "Cache Now!" resource.

Anyhow, I see that Andrew also decided to roll his own CGI script to implement the "negotiated redirect" as he aptly calls it.

Epilogue - 2000

But let me repeat that HTTP/1.1 contains a better solution to this problem, and by now it would surely be a bad idea to base a new implementation on the shortcomings of HTTP/1.0.. Thus, this page should be read at most for its historical interest, not as a practical agenda for a fresh deployment on the WWW.


[a] Unfortunately, while this idea of negotiating image formats is fine in theory, MSIE browsers (various versions) retrieve images by sending an Accept header with "*/*", which is neither very useful, nor even true!

NN4.* and other browsers have been seen indicating equal acceptance for PNG and other image formats, so it would only "work" (in the intended sense of sending PNG in preference to competing format(s), specifically GIF) if the server assigned a higher source quality factor to PNG.


|Up| |RagBag|About the author|