This page attempts to demystify the HTTP
and its relationship to browser configuration.
This page started life as a usenet posting in 1996. The topic still seems as relevant today with respect to World Wide Web applications; but more important than before in reference to MSIE which in this regard is anything but a properly behaved participant in the World Wide Web.
Experienced webnauts have indeed lost count of the number
of times that newcomers report
I know that I've done it
right because it works with MSIE, but it's failing with
[some other browser]. In so many cases, they are doing it wrong,
MSIE - contrary to a mandatory requirement of
the specification - is doing what they wanted, rather than
what they asked for; whereas the [other browser] is behaving
Now read on...
When a WWW client (e.g browser) accesses a remote document using the current HTTP protocols (1.0 or 1.1), then the server is expected to inform the client what kind of content is being returned, by means of the HTTP Content-type header. According to HTTP standards, this information is authoritative. If the server is sending the wrong type, then fiddling with the browser configuration is the wrong way to try to solve the problem.
Many servers (although this is by no means the only possibility in theory) are configured to decide what Content-type header to send, on the basis of the "filename extension", i.e whatever comes after a period "." in the file name.
If you use such a server, it is vital that at the server end you
either use the filename extension which
your server already expects for that content
type (something that you should
be able to find out from your server admin or from
your server provider's documentation), or
that you take whatever action is necessary to configure your server to
serve out the file names that you use with the Content-type that you need
(in Apache this might take the form of an
in the appropriate
When, on the other hand, a browser accesses remote
documents by some other
protocol, e.g a URL using the FTP scheme
then there is no mechanism
for the server to tell the browser what is the content type of the
file, and in this case the browser will have to
guess the type of the file for itself.
Browsers often do this by reference to the filename extension.
(This was also the case for the now-obsolete HTTP/0.9 protocol.)
This is a source of frequent confusion. Browsers typically have configuration entries with three pieces of information, something like this:
|(a) MIME type||(b) filename extension/s||(c) what to do|
|text/html||html, htm||view in browser|
|application/msword||doc||open using MS Word|
|application/octet-stream||bin, exe||download-to-disk dialog|
|application/x-zip||zip||ask user whether to download to disk,
or open with WinZip
and so on (this table only shows some typical settings:
your preferences might be set differently, of course).
What you have to understand clearly is that,
in a WWW-conforming implementation, item (a) is used for
URLs of the
http: variety - these URLs make
no use of item (b) for determining the content-type
when a content-type has been sent by the server.
The HTTP protocol specifications (1.0 and 1.1) effectively
forbid a browser that has
received a valid Content-type header from the server, from making
its own unilateral determination of the content-type - see RFC2616 section 7.2.1 (my emphasis):
If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource.
The consequences of this when a server is misconfigured aren't
always immediately evident; for example, consider an HTML page
(sent out correctly as
which calls out a stylesheet and a number of in-lined images:
if the server sends these out with a wrong Content-type,
then the browser might be displaying the HTML page's main content,
but the browser has every right to ignore the offending
stylesheet, or to omit the offending image(s) from the display:
indeed a strict interpretation of the rules would say that it
must behave that way.
Faking the wrong Content-type from the server is potentially
a way of compromising security, so there's a genuine reason
for this rule being the way that it is.
Item (b) is used for URLs of the ftp: etc. variety, and for viewing local files - these URLs can make no use of item (a), since it is not available in these kind of contexts.
Users who are confused about this are typically sending themselves demented trying to reconfigure the wrong parameter in their browser's preferences dialogue, and can't work out why it seems to be having no effect.
The above presentation applies to properly-behaved World Wide Web software - which Win-MSIE, at least all the versions that have been seen so far, up to and including at least version 6.0, are not. For quite a number of important open-defined content types, MSIE does what it thinks best, which - in the case of those who are incompetent to configure a web server - often can seem to be beneficial, but in other practical situations can be utterly frustrating. Worse, it has security implications.
MS have a page which documents their MIME type handling in IE, but at no point do they admit that this behaviour is contrary to published interworking specifications (see RFC2616, particularly the "if and only if" clause previously cited). Nor are there any clear statements about the security implications of this violation, although a few vague hints were added later. Active content, e.g Jscript, could be slipped past a customer's protection disguised as a harmless-looking content type, but could still be interpreted as active content by MSIE.
Although their "moniker" hit-list contained 26 different types, they imply that they are likely to add other - as yet unspecified - types to the list in the future.
In a later update, they mention in relation to XP SP2 that they will no longer "upgrade" a document served as text/plain. However, the page then links to yet another complex rigmarole which again introduces new security issues - while claiming to "enhance" security.
I think it's only fair to stress that this particular change, which
deals specifically with
text/plain, does not
affect the other types which are targetted by their action "moniker",
where they still continue to violate this HTTP protocol requirement.
A site might, for example, have decided for security
reasons to filter text/html content before it
reached the browser, to protect it from unauthorised
scripting etc.; but MSIE could still be fed with
something resembling HTML by disguising it as another content-type,
for example image/jpeg,
and IE could still respond to active content,
for example scripting, which had been concealed within.
I quote here my earlier reports (which have now been overtaken
by the above XP SP2 fix, at least for users who are keeping-up
with MS updates), showing IE stumbling from its original
uncritical behaviour of, basically, "
means IE will just guess", towards more restrictive positions.
It must be stressed, though, that the change in the way that this
text/plain, is handled, is
only one feature amongst the morass of special cases described
in their above-cited XP-SP2 mime handling page.
There is a KB article Q239750 describing a fix for
text/plainhandling in IE5.0, and implying that later versions would be corrected. But I tried the documented registry setting against IE5.5 (whose Urlmon.dll version and date were much later than the KB article) and it didn't make any difference to this misbehaviour: it still treated my
spoof.txttest case document as text/html, evidently because it thought it had spotted HTML-like syntax in it, and it actioned the meta-refresh. When tested in IE6 in mid 2005, it behaved differently to this extent: it displayed an empty window, indicating that it was still treating this text file as HTML, but it did not action the meta-refresh (not because they'd started conforming to the RFC, but rather because they'd introduced yet another special case behaviour without addressing the fundamental issue).
I tried these tests with and without the registry settings mentioned in Q239750, i.e the IsTextPlainHonored key, and found no difference: however, some other informants reported different results, which I admit I couldn't explain.
A correspondent calls my attention to a newer KB article, Q329661 relating specifically to IE6; he tells me that in his test with IE6, it did produce the result that was claimed for
text/plain; but the article seems to be saying that in the presence of XML content, the fix will be ineffective and the browser will still behave incorrectly.
Conversely, it has been observed that when XHTML is served out with a content type of
text/htmlin accordance with the provisions of XHTML 1.0 Appendix C, MSIE will disbelieve the HTML content type if the
htmlopening tag is not encountered sufficiently close to the start of the document: this can easily happen if, for instance, the document contains a block of comments before the
htmltag appears. MSIE then processes the document using its skeleton XML-handling code, producing something that's unusable to the intended reader.
text/plainmean "recipient can guess"?
If anyone should doubt what
text/plain content-type means
in Internet terms, they could refer to RFC2046 section 4.1.3:
This indicates plain text that does not contain any formatting commands or directives. Plain text is intended to be displayed "as-is", that is, no interpretation of embedded formatting commands, font attribute specifications, processing instructions, interpretation directives, or content markup should be necessary for proper display.
As of September 2005,
a correspondent called my attention to the fact that, if one
tried to look-up the URL of the previously cited
at Google, it became clear that they must have treated that
text/plain document as HTML, since they had
actually indexed, not the
spoof.txt page itself, but the page which
was the target of the
appears on that plain-text page!
Other search engines (alltheweb, altavista, yahoo search) had not made this mistake.
The moral of this story: wouldn't it be so much easier all around for them to produce a piece of software that simply conforms to the mandatory interworking specifications, and leaving their 'clever' fiddly bits to their proprietary extensions???
[I'm not saying that other browsers don't occasionally show up security problems, I know that they do and that they hit the headlines from time to time. But at least they are following published interworking rules, so it's pretty clear if they're doing something wrong. Whereas IE seems to make up rules as it goes along, inventing security traps at each point of the way and with no way to know what they are, nor even, in many cases, whether their (mis)behaviour is intentional.]
I have some introductory notes on negotiation in my earlier page
about language negotiation.
The underlying principles are much the same; just that here we are
concerned with the HTTP
Accept request header, whereas
there we were dealing with the
A fine introduction to the concepts and terminology of negotiation mechanisms (as well as to Apache's support for them) can be found as Apache Content Negotiation at the Apache web site.
Here we are only going to consider server-driven negotiation; we do not cover the other, fundamentally different, mechanism, transparent negotiation as defined in RFC2295-6.
When a client agent (browser etc.) requests a resource from a server,
it may include in the request an HTTP
header, indicating something about its acceptance for different types
These may or may not include
q= quality factors expressing
the relative acceptability of the various types.
When the server receives such a request (provided that different variants
of the resource are available and content negotiation is enabled), the
server considers these acceptance criteria, in conjunction with the
available variants and with any associated
source-quality factors, and computes the most acceptable variant to
send in its response.
In practice, client agents may or may not give the user the ability
to express their personal wishes for different content types via
For this and other reasons,
it would be a bad idea to use content negotiation as the
sole means of choosing between different content formats: it's
pretty much essential to offer an alternative route via a menu of
There are some special situations however where this is unnecessary,
as we see in the examples.
If the client agent indicates
image/png in their
Accept headers, then they can be sent PNG format in
preference to GIFs. Probably, clients which do not explicitly accept
PNG should normally be sent GIFs.
The relative desirability, from the point of view of the provider,
of the two formats can be adjusted by
qs= source quality factors.
Note however that if the client indicates acceptance
image/* (or of
without explicitly mentioning PNG, it would
probably be unwise to send them PNG in preference to GIF.
(This of course becomes less of an issue with time, as more and more
browsers include competent support for PNG, whether or not they
advertise it in their accept header.)
I would rate this as an example which can be deployed successfully in practice.
Please don't read this section as a practical recommendation: it's no more than an exploration of what's technically feasible.
According to W3C guidelines, XHTML should normally be sent with a
however, for compatibility
reasons, it is also permissible to send out XHTML/1.0 -
it meets the requirements of Appendix C - with a content-type of
text/html with a view to it being handled by
user agents that had been designed to handle HTML.
For safe deployment of XHTML (beyond the limited concession of
Appendix-C-compatible XHTML/1.0), user agents capable of handling
XHTML are expected to include the content-type
application/xhtml+xml in their Accept request-header.
Server negotiation can then arrange to send the XHTML version,
with its proper content-type, to such client agents.
With other client agents, the server should send out a document which
can properly be described by a
that could be either HTML/4.01, or (using the compatibility
provisions of XHTML/1.0) it could be Appendix-C-compatible
However, HTML/4.01 and XHTML/1.0 are functionally equivalent,
and can be transformed by rote into each other.
There are some anomalies (which we won't go into here in detail) in
the handling of XHTML/1.0 "as if" it were genuine HTML, meaning that
anyone looking for best compatibility would be advised to use HTML/4.01
for this purpose, in preference to "compatible" XHTML/1.0.
But for most practical purposes, admittedly the difference is
likely to be unnoticed, provided that the guidelines of
Appendix C are scrupulously followed.
It may be worth stressing a point here, since many authors seem to be confused by the distinction between "Strict" and "Transitional" (and "Frameset") DTDs on the one hand, and between HTML and XHTML on the other hand. But "Strict", "Transitional" (and "Frameset") variants exist in XHTML/1.0 as well as in HTML/4.01: there's no compulsion if you want to move from "Transitional" to "Strict" that you should also move from HTML to XHTML, nor vice versa. Each choice should be taken on its own merits. Although there are differing views held, quite a number of respected commentators warn against moving to XHTML merely on the basis that it seems to be "sexy": for practical production use on the WWW, they recommend moving only when a clear and substantive reason has been identified for making the move, and the implications have been clearly grasped. One strongly-held opinion can be seen in Hixie's advocacy paper. This would also be my recommendation, generally speaking: on the other hand, what you do in your own page-generation process is your own affair, and could well be XML-based if you so wish, but that wouldn't prevent the production of valid HTML/4.01 as the end product for sending to the web, if there is no substantive reason for doing otherwise. What the future may bring is another matter, for sure, but staying with HTML/4.01 for now as a final format does not in any way impair future options. Whereas, making a move from HTML-flavoured tag soup to nothing better than XHTML-flavoured tag soup seems to be no kind of forward progress, and rates only to re-implement in XHTML the very disadvantages which had developed in HTML, and from which XML-based markups such as XHTML had been intended to save us!
We considered two typical examples, from many: GIF versus PNG, and HTML versus XHTML. These notes can be extended to other equivalent cases.
Consider the case where a client makes no mention of the
competing content-types on an Accept header (e.g they send no
Accept header, or they send an Accept header which includes
*/* but makes no explicit mention of the content
types in question).
You'd want to make the more-conservative choice, wouldn't you?
i.e to send GIF rather than PNG, HTML rather than XHTML.
If the server is using the standard negotiation algorithm, then
the way to achieve that is to set GIF, HTML resp. with marginally
qs= source quality factors on the server side.
Now consider what happens if a client indicates equal acceptability for GIF as for PNG (resp. HTML as for XHTML), then the server would inevitably send them the more-conservative choice, even though they have explicitly indicated willingness to accept the more-advanced format.
A client such as Mozilla solves this by expressing a lower acceptability for HTML (e.g 0.9) than for XHTML, and a lower acceptability for GIF (e.g 0.2) than for PNG. This is strong enough to dominate the small differences in source quality factors at the server, and the dilemma is solved.
As a different example, however, Opera 7.01 indicates equal acceptability for GIF as for PNG.
There are other ways of getting the desired effect, such as using slightly tweaked negotiation algorithms at the server (see documentation for Apache 2.0 for example), or by providing a copy of the conservative variant as a generic content-type (it has to be negotiated on some other axis, such as language or charset, in order to keep the negotiation machinery happy). But this page wasn't intended to be an exhaustive tutorial in server-side negotiation, so I'll stop there.
Win MSIE (versions 5 and 6 have been tried;
MS Office was also installed on the systems which were tested)
sent an Accept header which includes various specific image formats,
followed by MS Word, PowerPoint and Excel, followed by
Note particularly that this does not include
This means, amongst other things, that the server does not get any indication of the client's relative preferences for HTML versus XHTML.
There's no advertised user interface for users to set a preference; so, as authors we have to work on the basis of most users having no influence over this.
Conclusion: in this kind of situation, even if you decide to implement content-type negotiation as an available option, it's essential to provide an alternative way of accessing the variants explicitly, to give the users a full range of choices.
MSIE sent a different Accept header on "Refresh" and similar
operations than it sent on initial retrieval.
In fact, on refresh/reload it sent
To take an example: if HTML and JPEG variants were available,
with the HTML variant assigned a higher source quality, then
on initial retrieval, Apache's handling of content negotiation
would send the JPEG
variant in response to IE's request, but on reloading, the
content negotiation would return HTML instead.
A rather surprising result!
Another consequence is that since such negotiated content would be sent with a Vary header showing that it had been negotiated on the basis of the Accept header, then by presenting a different Accept header on the subsequent refresh, any cache server en route is informed not to re-send the previously cached variant. So this also defeats cacheability.
(When the above was tried with document variants of MS Word and HTML, rather than JPEG and HTML, the behaviour was different: I'm assuming that this was because MS Word documents are not really opened in the browser, but rather, via some kind of plugin mechanism, after which the browser's Refresh function behaves differently.)
More "refresh" misbehaviour in MSIE is detailed later.
We've recently been seeing some browsers (e.g Mozilla)
tightening-up on their
content-type handling, and for example refusing to display in-line
images if delivered with the wrong content-type, or refusing to
implement a CSS stylesheet if sent with the wrong type.
This behaviour is entirely correct, and my respect to the Mozilla
developers for withstanding short-sighted user demands to take it out again.
This (almost incredibly?) revived an issue which I had first met
around 1996, before the CSS content-type was registered at IANA, of
web servers delivering .css files with a content-type of
Apparently some servers out there are still doing this in 2005!
As late as 2003, a web-site owner who had complained that his
stylesheets were being served-out as
reported his web service provider's reply
that they "don't support CSS and have no plans to in the future".
Talk about clueless?
Meantime there is a W3C note, Common
User Agent Problems from Feb 2001.
In section 3.2 under Protocols
the note explicitly describes the (mis)handling of
as anything other than plain text as wrong.
If you have some special kind of content, which is apt to be opened in an appropriate application - a viewer, say - then it makes a great deal of sense to send it with a characteristic content type. Regular users of such content can then set their browser options permanently to open this kind of content with that specific viewer, with or without the initial user dialog (the details vary from browser to browser, but most browsers offer this facility in one form or other).
I can't recommend trying to twist existing IANA-registered content
types to some other purpose, as I've seen some folks doing:
in my humble opinion it would be better all around if they
just invented their own (for safety, an invented content-type
should probably have an
For a while, versions of the Opera browser had a pair of radio buttons on the Preferences → File-types dialog:
The first choice is the one prescribed by RFC2616; the second one behaved somewhat like (but not the same as) MSIE, and is not compatible with RFC2616.
However, see this discussion from 2002, criticising the second option as a security vulnerability.
When I belatedly reviewed this issue again in reference to Opera 8.5, there was no sign of such a user choice any more.
Relevant to both of the subsequent sections, there exists a
Content-disposition header for the purpose of
proposing client behaviour in regard to a particular resource.
The header isn't officially part of the HTTP protocol, and
RFC2616 warns that the use of this header has "very serious"
security implications: for a long time it was customary
for HTTP user agents to disregard any such header.
However, the header has been increasingly
implemented in client applications (such as MSIE and web
browsers such as Mozilla and Netscape), and information providers
might give consideration (I'm not saying
which way their decision should fall!) to using it either in
conjunction with techniques mentioned below, or instead.
The use of a
Content-disposition value such as
attachment;filename=myfile.ext represents a proposal
for the client to download the file, even if the client would
normally display this content-type.
Indeed: for any of the content-types on the MSIE hit-list (see the notes above), when MSIE has recognised the Content-type which is coming from the server it will disbelieve it on principle (contrary to RFC2616), try to work out what it thinks the content really is, and then do whatever it thinks is proper with such content, regardless of the user's wishes.
When using a normal link, this is no big deal, since one can request a download explicitly (shift/click instead of normal click). If the resource is being returned as the result of a form submission, however, the user is presented with a bit of a puzzle.
One way to get around this is to send a content-type that MSIE doesn't know. Then MSIE (at least, that is what its own documentation, in effect, claims, and this does seem to be consistent with observations) will conform with RFC2616, and ask the user what they want to do with this content, i.e download to file or open with some application, just as a normal world-wide-web-compatible browser would do. For example, I defined a private content-type for a file that contained XML, and told MSIE to use Mozilla as a helper application for opening that content-type. It seemed to work fine, but it was only a brief test. However, a correspondent writes:
If you have a file of a well-known type (e.g. .pdf) and send it with a freely invented MIME-type (say application/xyz), MSIE still detects the type and opens it with the registered viewer (here: Acrobat Reader). The autodetection is only omitted if (a) the MIME-type is not one of the 26 known types AND (b) the MIME-type is in the registry at "
HKEY_CLASSES_ROOT\MIME\Database\Content Type". In this case the behavior depends on the registry entry for that type. So it still seems to be impossible to force a file to download (instead of display).
Also, beware that you might choose a content-type that turns up on a later edition of their hit-list. (See below for an idea for fooling MSIE about the filename extension of content generated by CGI or similar server-side processing). So if you really want to exploit this, choose something way-out as a content type.
I should admit that it's not really proper, in WWW terms, to use private content-types for describing well-known types of content: so it's a two-edged sword. The original idea was that the server would describe honestly the nature of the content they were making available, and leave it to the user to determine what they wanted to do with that content. All this talk of "forcing downloads" and generally choosing perverse options in order to fool MSIE into doing what we want rather than what MS want us to want, goes quite against the original intention of the WWW.
You may find some additional advice on downloading WWW files to MSIE in this rather amusing piece of unsolicited AOL mail which landed in my inbox in July 2002. I find it rather touching that an AOLer considers the great Microsoft to be in need of this kind of support. You'll notice the writer's obsession with making things happen in the way the programmer requires, rather than (as the WWW was designed) in the way that the recipient chooses. The implications for security shouldn't really need to be spelled out.
Nevertheless, as I remarked in the previous section: nowadays the use
Content-disposition header seems to be quite an
effective way of guiding the action of the browser towards the action
which the author wants to promote.
This again is a fairly common question asked in WWW forums: in spite
of the CGI script having supplied an appropriate Content-type
text/plain, MSIE decides to download
it instead of displaying it according to that content-type.
It's another consequence of MSIE's disbelief of server-provided
content-types per RFC2616: IE has, it seems, fooled itself by
trying to make sense of the filename extension on the CGI script.
In this kind of situation, there is a workaround which brings relief in
a number of cases, although there's no guarantee of it always
working reliably: at least it's better than nothing, unless/until MSIE
is fixed to follow the published rules; the technique
doesn't use anything that is otherwise invalid or improper.
This server-based workaround will be applicable in Apache or any other
CGI-conforming web server.
The trick is to supply a URL whose PATH_INFO component looks like a
file of the appropriate type. For example, if you were using a
CGI script called
foo.cgi to return a plain-text file, then
you might supply a PATH_INFO component like
producing a URL along the following lines:
(This assumes, of course, that you didn't need the PATH_INFO component for some other purpose. And it should be noted that some web servers - not, of course, Apache - have an incomplete implementation of the CGI specification which makes it impossible for the server to handle this usage of the PATH_INFO component.)
An email correspondent writes to say that he had file downloading to MSIE working fine over http, but for some reason it stopped working when he tried to do it securely (https). After some discussion and searching, he located a blog entry which addressed exactly this issue, using PHP: here it is Joseph Scott's Blog: Making IE Accept File Downloads. Presumably the corresponding HTTP header could be generated by other server-side processing (CGI etc.) and produce equivalent results. Hope this may help someone.
(the answer turns out to be 'no': the problem was not specific to Big5, but was just an instance of a more general problem caused by IE's refusal to conform with interworking specifications.)
I encountered a Big5 (Taiwanese) page which when first loaded
was rendered as HTML, but when IE's Refresh button was pressed, was
unexpectedly displayed in the browser window as source.
After I described the observation here without knowing its
cause, a reader of this page emailed me to point out that
MSIE sends a different
Accept header on initial retrieval
of a URL than when it reloads it (that's true: it does),
and his hunch was that this may be the cause of the above effect.
That is actually an interesting theory, and indeed could be made the
basis of a crafty puzzle.
However, the mystery turned out to be
caused by something different.
On inspecting the source code of the original page, it
began with a
<!DOCTYPE...> and then there were
a couple of lines containing HTML comments.
After that came the
<HTML> which started the
main body of the document.
The consequence of this was that the HTML tag itself was just over
200 bytes from the start of the document.
Under these conditions, reloading the document caused WinIE 5.5
to treat the document as plain text, even though it had treated it
as HTML on initial loading.
Snipping-out the comments so that the HTML tag was located nearer to the start of the file, caused IE to treat it as HTML both on initial loading and on reloading.
This section refers, except where otherwise stated, to the reloading button which MSIE denotes "Refresh", or its View> Refresh menu item (F5). Note that in the Jscript it's denoted by its rather more usual term, "reload".
In Oct.2005 an email correspondent who had found this page, wrote to report problems of the following kind. He had a server-side process which would, in due course, produce a PDF file, he but wanted to offer the reader some indication of progress while they waited for it. So at first he returned an HTML page, with some Jscript requesting a delayed refresh:
In due course, the server-side process completed, and the server-side script then sent the PDF file (with its proper Content-type header). This worked fine with any JS-enabled WWW-compatible browser; but not with MSIE, which displayed the data from the PDF file in the browser window, instead of displaying the rendered version of the PDF. On subsequent investigation, my informant reported that even a manual use of MSIE's Reload button would trigger this misbehaviour.
I confirmed this myself: when the first loading of a URL returned
an HTML page, and use of MSIE's Refresh
function returned a PDF file (with its correct
MSIE seemed to pay no attention to the new type of the
object, neither by the mandatory HTTP procedure nor by its own
misbegotten guesswork, but merely displayed the raw PDF data in the
The same incorrect behaviour was also seen when the
document type returned on reloading was MS Word
for example, or Excel (
On the other hand, the use of a delayed
in the HTML did not fall foul of this incorrect behaviour, and my
informant decided to use that instead.
Original materials © Copyright 1994 - 2006 by A.J.Flavell