In a post on the IE blog last week, Marc Silbey shares the structure of IE9’s new User-Agent string. In it he writes:
An important change for site developers to know is that IE9 will send the short UA string by default. This change improves overall performance, interoperability and compatibility. IE9 will no longer send additions to the UA string made by other software installed on the machine such as .NET and many others.
Specifically, IE9’s the User-Agent string will look like this: Mozilla/5.0 (compatible, MSIE 9.0; Windows NT 6.1; Trident/5.0. Contrast that my IE8 User-Agent which is: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.21022; .NET CLR 3.5.30729; MDDC; InfoPath.2; .NET CLR 1.1.4322; .NET CLR 3.0.30729). (I have no idea why there is a “Media Center PC 5.0″ identifier in my User-Agent string. I have a Dell laptop using Vista Home). The obvious benefit of the IE9’s User Agent string is that it is shorter. In fact, it’s over 70% smaller (63 bytes opposed to 216 bytes). Why the difference?
The difference is all the superfluous junk on the end of the User-Agent string. From IE4 until IE8 other programs installed on your machine could append an identifier string about themselves to the User Agent string IE would send. Junk such as different .NET runtimes and version numbers, toolbars and browser add-ons, media center strings, tablet string, Office plug-ins, and more would all appear in the User-Agent. Yes, moving to a shorter User Agent string will reduce the amount of bytes IE9 must send with each requests will improve performance. We’ve known about the benefits of reducing an HTTP request to try and make it fit in a single packet. In fact Yahoo has a performance rule around excessively sized cookies.
Of course, there is no way we would write an entire blog post about how the User-Agent string for IE9 is only 63 bytes and try to claim with a straight face that the reduced request size is the most impactful web performance improvement that IE9 brings to the table. You will see that the real benefit of IE9’s shorter User-Agent string, and perhaps the biggest web performance optimization in all of IE9, has to do with HTTP compression and caching.
Caching Compressed Responses
HTTP compression is one of the most impactful performance improvements you can deploy for a web application resulting in bandwidth savings of 50-80% for text resources. However compression makes things more complicated for shared caches like caching proxies. Consider what happens if a browser that supports compression requests a URL and the caching proxy receives a compressed version of the response. Then the caching proxy receives a request for the same URL from a different user. Can it send the compressed version? To solve this, HTTP/1.1 added the Vary header. The Vary header allows web servers to essentially say “To generate this response, we used the URL and the value of the following HTTP request headers.” Caching proxies could then store a copy of a response and know that “This copy is for this URL when requested with this HTTP request header value.” You can think of a caching proxy as having an internal table of what resources are cached and under what conditions to serve the resource. If the incoming request URL matches the URL for a cached resource, and if the value of each incoming HTTP header that was specified by the Vary response header match the values of the HTTP headers used in the original request for the cached resource, then the cache can service the new request locally by returning the cached resource.
What was the solution? Web servers would first look for the presence of an Accept-Encoding header to determine if the browser making the request thinks it can accepted compressed resources. If so the web server would next examine the User-Agent header to determine if see the requesting browser is known to have HTTP compression bugs. Only if the browser says it can accept compressed responses and it is not one of the browsers that has compression bugs will the web server serve a compressed version of the resource. (In retrospect, modifying a perfectly working web server to work around a buggy web browser seems silly. However in the age before automatic updating software this was a reasonable approach).
Unfortunately, this seemingly small change has enormous ramifications.
Different Strokes For Different Folks
Since the web server is varying the response based on both the Accept-Encoding header and the User-Agent header, it needs to indicate this to any downstream caching proxies by using a Vary: Accept-Encoding, User-Agent header. Since the web server could potentially serve a different response based on the User-Agent string, the caching proxy cannot serve the same cached response to web browser’s with different user agent strings. So instead of matching the URL and the value of the Accept-Encoding header on incoming request, now the caching proxy must match the URL, the value of the Accept-Encoding header, and the value of the User-Agent header to be able to return a cached response. To understand the impact of change, consider the following scenario.
Tom and Nick work at the same company and their web traffic passes through a shared caching proxy. Tom is using Apple’s Safari web browser and Nick is using Mozilla’s Firefox web browser. Both of these browsers support HTTP compression and neither have any known compression bugs. Tom visits http://example.com/index.html. The web server returns a compressed response with Cache-Control and Expires headers to enable caching of the resource, and a Vary: Accept-Encoding, User-Agent header to let downstream caches know what values were used to determine the response. The caching proxy stores a copy of the response using the URL http://example.com/index.html and the Accept-Encoding request header value of gzip as the key.
Nick then visits http://examples.com/index.html. The caching proxy sees the request and attempts to service it. The caching proxy does have a cached copy for the URL Nick is requesting and Nick’s request also has the same Accept-Encoding header value as the cached copy. However Nick’s request was made with Safari’s User-Agent string and the cached copy was stored for a request with Firefox’s User-Agent string (due to the Vary: User-Agent value from the originating web server). So even though both web browsers support compression, neither have any caching or compression bugs, and the caching proxy already has a cached response that is perfectly usable by any modern web browser, the caching proxy cannot serve Nick the cached response. Instead Nick’s request is passed on to the web server and that response is also cached. The shared cache now contains two separately cached yet identical responses.
The problem gets worse. Tim, who works at the same company as Tom and Nick, uses Google’s Chrome web browser to request http://example.com/index.html the caching proxy again is not able to locally service the cache. This is because Chrome has a different User-Agent string from Firefox or Safari. Thus the shared cache ends up with 3 separately cached yet identical responses. Caching Proxies went from have to story 2 copies of each distinct URL (one compressed, one uncompressed) based on the Accept-Encoding header, to storing 2 * X copies of a distinct URL, where X is the number of distinct User-Agents.
500 Different Ways To Say The Same Thing
It’s clear from this example that the addition of Vary: User-Agent to a response can significantly reduce the performance of shared caching. How much is dependent on how many distinct User agent strings. Obviously different web browsers will have different user agent strings and there is nothing we can do about that. But what about different User-Agent strings for the same version of the same browser? Sadly, there can be multiple different User-Agent strings for the same version of the same browser. Often this occurs when the operating system is included in the User-Agent string. This means the same version of the same browser can often have half a dozen different user agent strings. While this exasperates the Vary: User-Agent shared caching problem, it is still manageable.
Unfortunately Internet Explorer proceeds to destroy any chance of managing the problem. This is because IE does not have a half dozen or so different User-Agent strings. It has hundreds! Here is a list of over 480 different IE8 User-Agent strings. All of those browsers are Internet Explorer 8 but each has a different User-Agent string due to of all those external programs, toolbars, and browser add-ons that append on junk.
In short, due to the hundreds of different User-Agent variations for the same fundamental versions of Internet Explorer, and the (shrinking) majority market share of IE, currently the use of a Vary: User-Agent HTTP header effectively nullifies shared caching.
Starting To Fix The Problem
IE9’s adoption of a shorter User-Agent string without any inclusion of 3rd party identification banners will significant help matters. This change brings IE9’s User-Agent string in line with other web browser User-Agent strings which only have a few variable components. These variables are:
- Computer Architecture of the computer
- Version number of the web browser
- Operating System of the computer
- Language of Operating System
Computer Architecture is largely stabilizing on a few distinct values for each browser vendor such as i686 or x64. A changing version number of the web browser, include minor build numbers or patch numbers, is only included by a few web browsers (most notably Google Chrome). This can make for a large number of different User-Agent strings for the same major browser version. However automatic updates having largely marginalized this problem: the majority of user’s receive and use an updated version of a browser with 3 weeks of its release. The operating system of the computer also has only a few distinct values, except in the Linux world with different distributions tend to insert their name into the User-Agent string (Ubuntu and Debian being the worst offenders). Finally is the OS language. Ideally this should not be included in the User-Agent string at all, but in the Accept-Language header. The impact of including the language in the User-Agent string is largely a non issue as users behind a shared caching proxy (either instead an ISP or a corporation) tend to speak the same language.
The end result is IE9, and other major browsers, tend to only have 5 or 10 distinct User-Agent strings for each major version.
Why Even Use Vary: User-Agent?
This is perhaps the best question. There are no modern browsers that have HTTP compression issues. The biggest problem browser, IE6, had its HTTP compression issues solved nearly 8 years ago with the release of IE6 SP1 in spring of 2002. All major web browsers that send an Accept-Encoding header do legitimately support compression. There is simply no reason to ever include a Vary: User-Agent header when dealing with a cachable resource. However its easy to see why we have this problem. There are 10 years worth of web servers out there that are configured to inspect User-Agent strings. Just look at Apache’s documentation and you can see why. Below is the sample configuration for Apache’s compression module. It uses regular expressions on the User-Agent to detect bad browsers and then includes a Vary: User-Agent header in the recommended configuration for their compression module. This nullifies any caching of compressible and cachable resources.
<Location / > # Insert filter SetOutputFilter DEFLATE # Netscape 4.x has some problems... BrowserMatch ^Mozilla/4 gzip-only-text/html # Netscape 4.06-4.08 have some more problems BrowserMatch ^Mozilla/4.0 no-gzip # MSIE masquerades as Netscape, but it is fine # BrowserMatch bMSIE !no-gzip !gzip-only-text/html # Make sure proxies don't deliver the wrong content Header append Vary User-Agent env=!dont-vary </Location>
There is one situation where it is appropriate to have a Vary: User-Agent response header. Consider a dynamic HTML page where the web server might return different content to a mobile browser than it would for a desktop browser when serving the same URL. For those pages you would want to include a Vary: User-Agent header. However these dynamic pages are, by definition, not cachable and thus don’t harm shared caches like Caching Proxies. The rule is any resource that is cachable should not have a Vary: User-Agent header.
The Biggest Performance Improvement for IE9?
With widespread adoption of IE9, we move from a world where the major browsers (IE, Firefox, Chrome, Safari, Opera) are represented by several thousand distinct User-Agent strings to a world of roughly 50-100 distinct User-Agent Strings. That is a 10 or 30 times improvement or over an order of magnitude in improvement. We expect that shared caches like caching proxies should get roughly a 10-20 times improvement in hit rate for resources from web servers returning a Vary: User-Agent header.
Want to see what performance problems your website has? Cachable Resource with Vary:User-Agent is just one of the 300+ web performance issues Zoompf can detect when scanning your web applications. Get your instant free web performance assessment at Zoompf.com today or try our free performance scanning bookmarklet.