Poor Choices are Ruining the Web

Posted: February 21, 2012 at 8:18 pm

A recent article by John Naughton has sparked a debate inside the web design and web development community. Are designers, with their image heavy designs, ruining the web?

The answer is yes, but its not why you think. It’s not because designers use big images or even that they use a lot of images. It’s because they are creating and using images poorly.

I get it. Your name is Chris and you spell it Criss. You have a Mac. You wear trendy clothes. You are an awesome web designer. I’m happy to have you on the team.

The problem is you suck at it. Not your art or your design. You are probably awesome at that. Believe me, I’m hardly qualified to judge art. What you suck at is taking your beautiful design and delivering it to your website’s visitors in an efficient way which creates an excellent experience. And that’s what’s ruining the web.

Now I want to see your awesome design. I truly do. Because you have a gift that I don’t to take ideas and visualize how they should look. Even better, you have the skill to express those design ideas in ways that I can only imagine. And often, all I can do is imagine, because when I visit your webpage, all I see is a white screen as my browser downloads megabytes of content.

As Aaron Gustafson says:

"Graphic designers are not ruining the web, but a lack of web professionalism is. Without proper training and an appreciation of the ramifications of each decision that goes into building a website, you more than likely won’t make the right decision regarding optimising the user experience. This isn’t print and it’s not television – bandwidth is a factor."

I couldn’t agree more with Aaron more. What we need here is professionalism and ownership. Performance is not someone else’s responsibility. Performance is your responsibility. Your job doesn’t stop when you create that PSD file. You are creating a User Experience, which is far more that the visual characteristics of your design. You are responsible for the experience of actually engaging with the design.

This is why I’m so glad this discussion started in the middle of our Lose the Wait series. As we have seen, you can lose webpage wait time by shedding page weight. And images are contributing the largest portion of total page weight.

We at Zoompf have made posts and given presentations all about this in the past. There is a lot that can be done to make sure the experience you give to your visitors reflects all the effort and skill that when into the design.

But lets apply some focus. Forget all the things that can be done to optimize images. Lets focus on a single thing. It’s easy to do. It’s obvious to check if it has been done. It has serious and immediate effects. It’s even something you can do right now.

Removing Image Bloat

Image files can contain all sorts of data inside of them that has nothing to do with the rendering of the image. This should not be news. In fact, I even wrote about it last Friday. While the types of non-graphical data present vary with each file format, a few examples are:

  • Unused palette entries
  • Embedded thumbnails
  • Meta data
  • Comments
  • Application settings
  • Camera information

Take a photo of something? Edit it in Photoshop? Well now it has an embedded thumbnail in it hitching along and taking up space. Be careful now! You might accidentally posted naked pictures of yourself to the Internet if you aren’t careful. Talk about a user experience.

So how much can this help? Is this a tempest in a teapot? No. Research shows the average savings by losslessly optimizing an image is 15-20%. That means 1 bytes out of every 4 bytes of an image is wasted bloat.

All of this sounds kinda of amateurish doesn’t it? Is this something novices do or are professional designers at real websites doing this to.

Lets try and experiment. Go check out Best Buy’s website. See those images. The images are bloated by 32% with this non-graphical gunk. 9 months ago, Twitter had the same problem.

Fixing the Problem

There two things we need to think about to fix this problem: finding a way to get rid of the bloat now, and finding a way to make sure we get of it consistently in the future.

The first part is easy. All the tools to optimize images are free. Even better, Chris Sullo of Nikto fame wrote Site Crunch, a script that lets you automatically run image optimization tools over your entire website.

The second part is more challenging. Bloated images get on a website, even when the designer knows better, because of your processes, or lack thereof. Marketing needs this image now, so it goes out the door fast and it doesn’t get optimized. Designers optimize their images, but the product catalog images they get from a 3rd party don’t get optimized. Organizations need a clear policies or procedures about how images and other assets are placed on a production site. Optimization should be incorporated into that process. That is how to fix the problem long term. Ideally, it becomes an automated step in the publish-to-production process. How to do that is another post in and of itself.

Summary

I love designers. You do things I could never do and make the web a better place. But when you create your design without thinking about the other half of the equation, the actual experience of getting that content, you sell yourself and your design short. And that is what’s ruining the web.

Want to see what performance problems your website has? Unoptimzied GIFs, PNGs, and JPEGs are just 3 of the nearly 400 performance issues Zoompf detects when testing your web applications. You can get a free performance scan of you website now. Need more performance goodness? Try our Zoompf WPO product.

Lose The Wait: Optimizing GIF Images

Posted: February 17, 2012 at 8:46 pm

Our Lose the Wait series is all about improving the performance of your web applications. As we have mentioned, a great way to lose the wait is to lose the weight, as in the weight of your page content. In our last post, we talked about using HTTP compression to reduce the size of the data that needs to be sent to the client, as well as the challenges that are involved. Now let’s shift our attention from text content to images.

Images make of the majority of content on the web. According to the HTTP Archive, the total size of an average web page and its supporting content is 968kB. Of that, 601kB, or 62% of the total amount, is image content. Understand images and how to optimize them are skills for any frontend performance advocate. To do this, we start by exploring the GIF image format and how to optimize it.

GIF is a lossless image format created by CompuServe in the late 1987. This makes GIF the oldest image format in common use on the web today. GIF is a palette based image format, allowing an image to contain up to 256 distinct colors defined from a possible 16,000,0000 colors (2^24). GIF images also support binary transparency. This means a pixel of the image can be completely transparency, showing the background behind the image, or completely opaque, showing the color value for that pixel. GIFs feature the ability to contain multiple graphic images inside of a single file

Due to their age, simple nature, and a widespread support, GIFs are widely present on the Internet. According to the HTTP Archive, 33% of all images on the web use the GIF format. Due to their prevalence, understanding how to structure optimize GIF images is an important part of understanding.

The Structure of a GIF Image

To understand how to optimize GIF images, we need to first explore the structure of the image format to identify which areas can be streamlined or optimized.

GIF File Format Structure

GIF are composed of a six byte header, identifying the file format and version. A six byte Logic Screen Descriptor follows, specifying the dimensions of the image, number of colors used, and other flags. The next data structure in a GIF file is the Global Color Table. This defines the color values for each of the up to 256 distinct colors which can referenced by the graphics data.

Now we get to how the content of an image is saved inside the GIF format. This happens in the Graphic Image Data sections. If the GIF is an animated GIF it will contain multiple Graphic Image Data sections for the different frames of animation. If it is a static, non-animated GIF, there is only one Graphic Image Data section. Each Graphic Image Data section is composed of a few other pieces of data. Obviously, it contains the graphics data that represent the image or a frame of animation. It can also optionally contain a local color palette that is specific to that piece of image data. This means that frame 1 of an animation can be composed of colors from a palette of 256 colors, and frame 2 can be composed of a different palette of colors. Local color palette data override the palette in the Global Color Table. In fact, the Global Color Table is actually optional, and each Graphic Image Data section can define it own palette. When you are dealing with a static image this is not really necessary, as it doesn’t matter whether the palette information is stored in the Global Image Data of the single Graphic Image Data section. As we will see later, this can be used to optimize animation.

The graphics data inside of a Graphic Image data section is a bitmap of pixel where each pixel value is a reference to the color palette which defines the actual red, green and blue values which represents the color. This pixel data can be sorted and stored in different ways to enable features like interlacing. All of this pixel data is compressed using the Lempel-Ziv-Welch (LZW) lossless data compression algorithm.

GIFs also have a number of other, optional data sections that can be present. Comments are common section, as are so-called "Application Extension" sections. These sections can be used to a variety of application specific information. The commonly implemented Netscape 2.0 Application Extension which enabled the looping of GIF animations. Adobe products store XMP image metadata inside of The Application Extensions. Embedded thumbnails can be stored in Application Extensions. There are also lesser used GIF features which use additional data sections. For example, the Graphic Image Data section can contain other data sections such as a Plain Text Data section, which allows for text to be rendered on top of an image similar to a closed captioning or subtitling system for video files.

GIF Optimization opportunities

Now that I’ve told you way more than you ever wanted to know about the internals of a GIF image, we can think about optimizing. There are several aspects of the GIF image format that create opportunities to reduce file size while retaining image quality:

  • The LZW compression algorithm has its roots in the 1970s. While impressive for its time, its performance and compression ratios have been eclipsed by more modern lossless compression algorithms. Using an image format with a better compression.
  • Palettes can contain more color definitions than actually used by the image
  • The size of palette entries can be inefficient when the image contains less than 128 colors.
  • GIF comments, metadata, and (most) Application Extension sections don’t contribution to the rendering of the graphic data. This data can be removed.

Funny enough, the GIF specification even suggests avoiding Application Extensions saying:

[Not using Application Extensions] is recommended in favor of using Application Extensions, which become overhead for all other applications that do not process them.

“Overhead” is just a fancy way of saying wasted bytes!

GIF vs. PNG

PNG images are also a lossless image format that very closely mirrors GIF images. The PNG format was defined 10 years after the GIF format allowing PNG images address many of the shortcomings of GIF images, such as:

  • PNG’s DEFLATE algorithm achieves better compression than GIF’s LZW algorithm
  • PNG images supports a precompression filter step, which rearranges graphic data before compression to maximize redundancy and thus improve the efficiency of DEFLATE compression. GIF does not have this feature.
  • The size of PNG palette entries can be smaller than the corresponding GIF palette entries on images with less than 256 colors.
  • PNG’s ancillary data sections support compression allowing their overall size to be reduced.

Additionally, programs that convert from a GIF image to a PNG image, such as gif2png, focus only on converting graphical data. Comments, metadata, embedded thumbnails, Application Extensions, and other non-graphical information present in a GIF are not transferred over into the resulting PNG. In other words, converting from a GIF to a PNG sheds all this "excess baggage" in addition to better compressing the graphical data. All of these factors mean that converting a GIF image to a PNG almost always results in a smaller image.

Bigger as PNG?

PNG images can sometimes be larger than the source GIF image. There are a few reasons for this, some of which can be fixed. Ensure the PNG you convert has an 8bit color depth. PNG supports millions of distinct colors in an image, whereas GIF only supports 256 distinct colors. Since you are converting from a source with a maximum of 256 colors, there is no need to support a larger color depth. If you do, it can needlessly result in a larger file.

It is still possible for GIF images to be smaller than PNG images. This usually only occurs for extremely small images, where the actual graphical data is quite small. For these images, the overhead of various headers and sections inside of the GIF or PNG image contribute to the overall size more so than the graphics data. In this case, very simple GIFs can be smaller than PNGs. In my experience, these extremely small GIF images are used by websites as spacer images or as the response for web services beacons. You shouldn’t be using spacer GIFs at all, and you should be returning HTTP 204 No Content responses instead of using tiny images. In short, if you find any GIFs on your website that are smaller as GIFs than PNGs, you are probably doing something else that is wrong.

The Savings of Converting to PNG

To determine the average savings of converting to PNG, I extract the 2547 GIF images that were recently downloaded by Zoompf’s scanner by users of our free performance scanning service. I then converted these from a GIF to a PNG using Optipng, which conveniently converts to a PNG image and then attempts to losslessly optimize the PNG image in a single step. The median size of the GIF images was 6900 bytes. The median size of the result PNG images was 5546 bytes. This means the median savings of converting all GIFs to PNGs is 21.07%. That’s a pretty awesome result and this aligns nicely with Stoyan’s analysis from 2009. You can download my test results here.

Animated GIFs

So far we have been talking about static GIFs, but what about animated GIFs? Luckily, the use of annoying, eye bleeding under construction GIFs has past us by (a fact for which I, and all of you, should thank whatever God or gods you pray to, every single day). However animated GIFs are still with us, thanks to the ever present status thumper!

While converting to PNG images provides a simple, easy way to bulk optimize GIFs, what can be done for animated GIFs? PNG does not support animation, so that is not an option.

Before you can optimize a GIF you need to know whether it’s animated or not. Ideally we want an easy way to do this from the command line so we can sort and optimize our GIF images and animations in bulk automatically as part of a script. Luckily the identify command of ImageMagick can help us out. The following:

identify [file] | head -n 1 | grep "\] GIF"

This will run identify on the command, and look for the text which indicates this GIF image contains multiple Graphic Image Data sections and thus is part of an animation. Now that we can sort the wheat from the chaff, we can start optimizing animated GIFs.

The list of optimizations that can be applies to animated GIFs is, in many ways, a superset of the optimizations for a static GIF. While you can’t gain the advantage of PNG’s DEFLATE compression algorithm you can:

  • Remove metadata, or unused palette entries from a GIF and write a better optimized GIF.
  • Combine or generalize local palette information in individual Graphic Image Data sections into the Global Color Table.
  • Reuse existing animation frames.
  • Minimize what is changing between animation frames, reducing the size different Graphic Image Data sections.

Gifsicle is a great command line tool which can, among other things, perform several of these optimizations automatically. Its free, open source, and easy to use. Because its a command line tool, this Gifsicle can be scripted for bulk optimization operations or bundled into build scripts. To optimize a GIF, this is all:

gifsicle -O2 orig-animation.gif -o new-animation.gif

While Gifsicle can automatically optimize GIFs, it can only do some of the optimizations discussed above with varying degrees of success. For example, Gifsicle does a great job detecting and combining duplicate local palette information into the Global Color Table. It does a good job removing GIF comment sections, but does a poor job detecting and removing Application Extensions, embedded thumbnails, and XMP metadata. It does a reasonable job trying to optimize differences between animation frames. Better optimization can be achieved by manually reviewing the animation to detect frames which can be reused as well as reduce the total changes applied to an image between animation frames.

I am want to sound like I’m criticizing Gifsicle. Detecting and applying the optimizations discussed above is not easy to do, let alone automate. Gifsicle is an awesome tool and everyone should use it. I just want to be clear that optimizing Animated GIFs is not as easy or straight forward as stripping meta data from a static GIF. Manual examination or optimization may be required to get the results you expect.

Conclusions

There are numerous aspects of the GIF image format which allow for lossless optimizations to reduce file size while maintaining image quality. Converting to PNG is a nice, universal way to do this. Animated GIFs however, cannot be converted. Instead you can use Gifsicle to automate some optimization and use hand optimization techniques.

Want to see what performance problems your website has? Unoptimzied GIF Image and Unoptimized Animated GIF Image are just 2 of the nearly 400 performance issues Zoompf detects when testing your web applications. You can get a free performance scan of you website now and take a look at our Zoompf WPO product at Zoompf.com today!

How Fast is… Target.com

Posted: February 17, 2012 at 6:59 pm

Our regular video series How Fast Is…? examines real world websites and details the cause of their performances issues as well as what should be done to solve them. After all, the best way to learn about front-end web performance is to see what other people are doing right and doing wrong. In this edition of How Fast Is…? we analyze online retailer Target.

Why Target.com?

Because they are the perfect target, obviously! In all seriousness, I wanted to look at Target for a few reasons. The first is that they are a huge and extremely successful retailer with both an online store and traditional brick and mortar locations. When anyone has significant resources to invest in web performance, and would see a huge financial return by doing so, its always interesting to see if they do.

The other reason to look at Target is that Ron Johnson helped to revitalize their stores and brand in the 1990s. Ron Johnson is the guy who helped launch Apple’s incredibly successful retail stores. This is a man who understand the importance of a positive user experience and I wanted to see if the ideals he instilled which at Target remained.

Implementation Issues

In this video, I mainly focus on implementation issues that Target has. They are doing a lot of web performance optimizations. Target’s problem is that they aren’t implementing these consistently and uniformly across the site. Specifically we discuss some implementation problems they have around domain sharding, consistent URL naming, caching, and combining files.

Best Quote from Video: “Because 2 + 2 + 2+ 2 always adds up to…. whatever it adds up to. I forgot how many 2′s I said…”

Know a site we should make a video about? Contact us and you may see a future episode about it.

Lose the Wait: HTTP Compression

Posted: February 10, 2012 at 4:25 pm

One of the ways you can improve website performance is to reduce the amount of data that needs to get delivered to the client. An easy way to reduce the amount of data sent to a client is to compress the content and then transfer it to the client. This can be done with HTTP compression. Despite being a surprising simply feature of HTTP, there are numerous challenges which must be addressed to properly use HTTP compression. These challenges are:

  1. Ensuring you are only compressing compressible content.
  2. Ensuring you are not wasting resources trying to compress uncompressible content.
  3. Selecting the correct compression scheme for your visitors.
  4. Configuring the web server properly so compressed content is sent to capable clients.

In this post, part of our Lose the Wait performance series, I will discuss each of these issues and demonstrate how to configure your web server to implement HTTP compression properly.

Compressing Compressible Things

Let’s start out easy. What should HTTP compression get applied to? The answer is simple: Any content which is not already natively compressed.

Notice I didn’t say "text resources." Text resources, like HTML, CSS, and JavaScript certainly should be compressed because they are not natively compressed file formats. Unfortunately, most people seem to focus on these 3 types of files. In fact, a quick web search shows that most of the top results for ".htaccess compress" include instructions only on compressing HTML, CSS, and JavaScript files. This just reinforces what I’ve said before; you have to be careful where your advice comes from.

Here is a list of common text resource types on the web which should be served with HTTP compression:

  • XML. XML is structured text used in standalone files (like Flash’s crossdomain.xml or Google’s sitemap.xml) or as a data format wrapper for API calls.
  • JSON. JSON is a subset of JavaScript used as a data format wrapper for API calls.
  • News feeds. Both RSS and Atom feeds are XML documents.
  • HTML Components (HTC). HTC files are a proprietary Internet Explorer feature which package markup, style, and code information used for CSS behaviors. HTC files are often used by polyfills such as Pie or iepngfix.htc to fix various problems with IE or to back port modern functionality.
  • Plain Text. Plain text files can come in many forms, from README and LICENSE files, to Markdown files. All should be compressed.
  • Robots.txt. Robots.txt is a specific text file used to tell search engines what parts of the website to crawl. Robots.txt is often forgotten since it is not usually accessed by humans and does not appear in JavaScript-based web analytics logs. Since robots.txt is repeatedly accessed by search engine crawlers and can be quite large, it can consume large amounts of bandwidth without your knowledge.

ICO

As I said, HTTP compression isn’t just for text resources and should be applied to all non-natively compressed file formats. What do I mean by this?

As an example, let’s look at ICO files. ICO files are an image format used originally used for icon images on Windows. The format, as it is in use today, was created over 20 years ago for Windows 3.0. Today, ICO files are used on the web as Favicons for a website, usually displayed in the address bar or browser tab. While modern browsers allow other file formats besides ICO support is not universal. Many sites continue to use ICO files as Favicons for compatibility reasons.

Despite being an image, ICO files are not natively compressed. ICO images are actually a primitive version of a BMP image. Neither ICO nor BMP image formats are natively compressed. While can (and should) avoid using BMP images on your website, you can’t do this with ICO files. Be sure to configure your web server to server ICO images with HTTP compression.

SVG

SVG images are example of an image format which is not natively compressed. SVG images are just XML documents, but they have a different MIME type and file extension. This means, while someone might remember to compress XML documents, they forget to compress SVG documents.

You might be using SVG images on your website and not even know it. This is because of a feature of SVG images, SVG fonts, which allow SVG files to contain font glyphs used to render text. These SVG image-that-really-a-font files can be references in CSS using the @font-face syntax much like a OTF or WOFF font file. Divya Manian has written a comprehensive post about the pros and cons of SVG fonts. For the purposes of this discussion the main take-away from her post is that, until iOS 5, SVG fonts were the only type of custom font supported by iPhone, iPad, and iPod Touch.

Font support is, to put it nicely, a giant mess. Font libraries abstract this away from the web developer and serve the correct format, including SVG fonts, to the correct browser. This mean your website can be using SVG without you even knowing it. Remember to serve your SVG files using HTTP compression.

Compressing already compressed content

Another mistake developers make with HTTP compression is using it on content that is already natively compressed. Apply compression to something that is already compressed doesn’t help improve performance. In fact, it can hurt performance to two ways.

First, HTTP compression has a cost. The web server has to take the content, compress it, and then send it to the client. If the content cannot be compressed further, you are just wasting CPU doing a meaningless task.

Secondly, applying HTTP compression to something that’s already compressed doesn’t make it smaller. In fact, the overhead of adding headers, compression dictionaries, and checksums to response body actually makes it bigger, as shown in the figure below:

Do websites actually do this? Yes, and it’s more common than you would think. I used Zoompf WPO to examine Fox News. Fox News is the 40th most visited website in the United States. As you can see, Fox News is mistakenly applying HTTP compression to PNG images.

This not only wastes CPU, but also increases the size of the PNG images delivered to Fox News visitors by a few dozen bytes:

Zoompf actually has two different checks for this issue. The first check "Compressed Content served with HTTP compression" alerts you that you are wasting CPU time compressing something that is already compressed. The second check, "Bigger with HTTP Compression" identifies content that is actually larger when served using HTTP compression.

Both of these problems usually are the result of a configuration problem with the web server or an inline network device. Something in your environment is applying HTTP compression to all outbound content instead of only content that should be compressed.

GZIP Vs. DEFLATE

So far, we have talked about HTTP compression as if it is an opaque or atomic feature. But that is not the case. HTTP simply defines a mechanism for a web client and web server to agree a compression scheme can be used to transmit content. This is accomplished using the Accept-Encoding and Content-Encoding headers. There are two commonly used HTTP compression schemes on the web today: DEFLATE, and GZIP.

DEFLATE is a patent-free compression algorithm for lossless data compression. There are numerous open source implementations of the algorithm. The standard implementation library most people use is zlib. The zlib library provides functions for compressing and decompressing data using DEFLATE/INFLATE. The zlib library also provides a data format, confusingly named zlib, which wraps DEFLATE compressed data with a header and a checksum.

GZIP is another compression library which compresses data using DEFLATE. In fact, most implementations of GZIP actually uses the zlib library internal to conduct DEFLATE/INFLATE compression operations. GZIP produces its own data format, confusingly named GZIP, which wraps DEFLATE compressed data with a header and a checksum.

Unfortunately, the HTTP/1.1 RFC does a poor job when describing the allowable compression schemes for the Accept-Encoding and Content-Encoding headers. It defines Content-Encoding: gzip to mean that the response body is composed of the GZIP data format (GZIP headers, deflated data, and a checksum). It also defines Content-Encoding: deflate but, despite its name, this does not mean the response body is a raw block of DEFLATE compressed data. According to RFC-2616, Content-Encoding: deflate means the response body is:

[the] "zlib" format defined in RFC 1950 [31] in combination with the "deflate" compression mechanism described in RFC 1951 [29].

So, DEFLATE, and Content-Encoding: deflate, actually means the response body is composed of the zlib format (zlib header, deflated data, and a checksum).

This "deflate the identifier doesn’t mean raw DEFLATE compressed data" idea was rather confusing. Early versions of Microsoft’s IIS web server was programmed to return raw DEFLATE compressed data for Accept-Encoding: deflate requests instead of a zlib formatted response. And naturally versions of Internet Explorer at the time expected responses with a Content-Encoding: deflate header to have raw DEFLATE response bodies.

As Mark Adler, one of the authors of zlib, explains in this StackOver thread:

However early Microsoft servers would incorrectly deliver raw deflate for "Deflate" (i.e. just RFC 1951 data without the zlib RFC 1950 wrapper). This caused problems, browsers had to try it both ways, and in the end it was simply more reliable to only use GZIP.

As Mark says, browsers receive Content-Encoding: deflate had to handle two possible situations: the response body is raw DEFLATE data, or the response body is zlib wrapped DEFLATE. So, how well do modern browser handle raw DEFLATE or zlib wrapped DEFLATE responses? Verve Studios put together a test suite and tested a huge number of browsers. The results are not good.

All those fractional results in the table means the browser handled raw-DEFLATE or zlib-wrapped-DEFLATE inconsistently, which is really another way of saying "It’s broken and doesn’t work reliably." This seems to be a tricky bug that browser creators keep re-introducing into their products. Safari 5.0.2? No problem. Safari 5.0.3? Complete failure. Safari 5.0.4? No problem. Safari 5.0.5? Inconsistent and broken.

Sending raw DEFLATE data is just not a good idea. As Mark says "[it's] simply more reliable to only use GZIP."

It should be also noted that all browsers that support DEFLATE also support GZIP, but all browser that support GZIP do not support DEFLATE. Some browsers, such as Android, don’t include deflate in their Accept-Encoding request header. Since you are going to have to configure your web server to use GZIP anyway, you might as well avoid the whole mess with Content-Encoding: deflate.

Luckily, avoiding DEFLATE isn’t all that difficult.

The Apache module which handles all HTTP compression is mod_deflate. Despite its name, mod_deflate don’t not support deflate at all. It’s impossible to get a stock version of Apache 2 to send either raw DEFLATE or zlib wrapped DEFLATE. Nginx, like Apache, does not support deflate at all. It will only send GZIP compressed responses. Sending an Accept-Encoding: deflate request header will result in an uncompressed response.

Microsoft’s IIS web server can send both gzip and deflate responses and you can enabled or disable each scheme individually. For IIS6, you can , you can edit the metabase to disable DEFLATE support. For IIS7, you can disable DEFLATE support by editing the DEFLATE compression scheme section in the <schemes> element of the <httpCompression> element of the various IIS7 .config files.

Both Zoompf’s free and commercial products have a check built-in, “Obsolete Compression Format”, which will detect if your web server is sending content compressed with DEFLATE.

Netscape 4 and Internet Explorer 6 Are Screwing You. Again.

So by now you should have your web server configured to:

  1. Properly compress what needs to be compressed.
  2. Avoid compressing already compressed content.
  3. Configured to only use GZIP.

Now you need to ensure that your configuration is not actually excluding perfectly capable browsers.

While HTTP compression is a mature feature today, there were some problems early on. Netscape 4 only supported HTTP compression for HTML documents even though it sent an Accept-Encoding: deflate, gzip for all requests. Serving it HTTP compressed CSS or JS documents would make it crash. For reasons that aren’t quite clear, the developers of Apache decided to address this client-side bug with a server-side fix. They added the following seemingly harmless line into the Apache configuration file:

BrowserMatch ^Mozilla/4 GZIP-only-text/html

Any browser calling itself Mozilla/4 would only receive HTTP compressed HTML files. Since Apache was and is the most popular web server on the Internet, this caused enormous problems which still affect us today.

First of all, this was the middle of the browser wars and Internet Explorer 4, Internet Explorer 5 and even Internet Explorer 6 all identified themselves as Mozilla/4 in their User-Agent strings. But these browsers could accept HTTP compression for non-HTML responses. Trying to patch around one buggy browser caused another to be slow! Since IE6 would ultimately achieve over 95% market share, it was a problem that IE6 would download webpages more slowly from Apache than from other web servers. To resolve this, the Apache developers were forced to add another configuration directive:

BrowserMatch \bMSI[E] !no-GZIP !GZIP-only-text/html

This line means: if the User-Agent has MSIE in it, then turn off the no-GZIP and GZIP-only-text/html options, thereby instructing Apache to use HTTP compression for all responses if IE asked for it. And all was good, until it wasn’t.

You see, IE6 on Windows XP also multiple problems with HTTP compression. Most of these issues dealt with compressed CSS or JavaScript files being cached as compressed items and which were then read from the cache assuming they were not HTTP compressed. So again another Mozilla/4 browser had problems with compression, and so again the Apache developers had to "fix" the issue with another configuration directive:

BrowserMatch \bMSIE\s6 GZIP-only-text/html

This directive instructed the web server to only send compressed content for HTML responses if the browser was IE6. While this helps dealt with the majority of the issues, some of these bugs caused so many extreme edge-case problems that, for reliability reasons, larger sites would completely disable HTTP compression for IE6 entirely:

BrowserMatch \bMSIE\s6 no-GZIP

Eventually Microsoft fixed these issues with hot fixes and, comprehensively, with Windows XP Service Pack 2. But this created a fragmentation problem, where some IE6 browsers could handle HTTP compression for all content, and some could not. Another rule was added in an attempt to serve compressed content to IE6 browsers that had SP2 installed. This was done by looking for the poorly named SV1 identifier in IE6′s User-Agent string:

BrowserMatch "^Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1" !no-GZIP !GZIP-only-text/html

This chain of "deny this, but not this, unless it’s this, but not if it is also this" directives made configuring a web server to properly serve compressed documents to the appropriate browsers difficult and prone to error. Since these bug/solution cycles happened numerous times over several years these configuration directives mutated. Blog posts from 2004 would tell you to do one thing and blog posts from 2006 would say another. Much like a child’s game of telephone short comings, errors, missing edge cases, and missing corner cases were magnified as people reused old configuration files and shared the "correct" advice. Even today, many of the top Google search results for configuring HTTP compression for Apache using mod_deflate contain different and incorrect directives.

As I wrote in Advice on Trusting Advice it all comes down to where you get your advice from. Follow the advice on this top search result and IE9+ gets no compression at all. Follow the advice on this top search result and IE6 gets no compression at all. Follow the advice from this search result and no version of IE will get anything using HTTP compression, except for IE7. Follow advice from IBM, no version of IE will ever get a non-HTML file using HTTP compression.

Depending on which directives were used, and how match criteria is configured, you ended up with several possible scenarios:

  • HTTP compression is completely disabled for all Mozilla/4 browsers.
  • HTTP compression is completely disabled for IE6
  • HTTP compression is completely disabled for IE6 except SV1
  • HTTP compression is completely disabled for all versions of IE
  • HTTP compression is completely disabled but all versions of IE, except IE6 (so no compression for IE > 6)
  • HTTP compression for non-HTML files is disabled for all Mozilla/4 browsers.
  • HTTP compression for non-HTML files is disabled for IE6
  • HTTP compression for non-HTML files is disabled for IE6 except SV1
  • HTTP compression for non-HTML files is disabled for all versions of IE
  • HTTP compression for non-HTML files is disabled but all versions of IE, except IE6 (so no compression for IE > 6)

Apache makes it quite easy to mess this up. Nginx is much easier. It completely ignores the old Netscape 4 browsers and does not attempt to work around them. It also has a very simply mechanism to avoid sending compressed content to bad versions of IE6. You don’t need to manually define "this is good" and "this is bad" regexs, allows you to avoid making a mistake.

In practice, you should just not even try to work around these problematic browsers. The problem browser have all been updated or patched. Even the most recent of the affected browsers, IE6, was fixed nearly a decade ago. Even on platforms that are no longer supported, this issue has been fixed. You should review you configuration file and remove any browser filtering code used for HTTP compression.

Hopefully this section has also taught you that fixing a client-side bug with a server-side fix it rarely a good or sustainable idea. As I discussed in The Big Performance Improvement in IE9 No One is Talking About, this approach of using the User-Agent as a factor in content generation forced the widespread use of the Vary: User-Agent header. The Vary header used in this manner effectively nullifies the shared caching which reduces the overall performance of the web.

Extension Vs. MIME Type

It is important to review how your web server is configured to compress content. Most browsers allow you to specify either a list of file extensions to compress, or a list of MIME types to compress, or both. Be careful to review this list.

Let’s say you have configured your application to serve text/javascript responses using compression. Are you sure that’s the only MIME type you application uses when serving for JavaScript files? What about text/x-javascript or application/x-javascript or application/javascript? What MIME type does your API serve for JSON responses? text\json? application\json? Something else? How about HTML? Are all of your HTML files using text/html? Do you have some sections from the XHTML days which use other MIME types like application/xhtml+xml or text\xhtml or application\xhtml? Is all of the markup generated by your application served using a single and consistent MIME type? And let’s not forget about the code you didn’t write. What MIME type does that opaque charting library use to send data to the client? Or that auto-completing textbox widget you got from Github?

If you are configuring the web server to use compression using file extensions, did you get all of them? .htm or .html or is it something else? What about your 404 handler? A request happens for the non-existent file /foo/bar.jpg. Since the file extension is not explicitly defined as something that should be compressed (or, being an image, is explicitly defined not to be compressed), the 404 response isn’t sent with compression.

Care must be taken when configuring your web server to ensure that uncompressed content is not slipping through due to a missing file extension or MIME type declaration.

Properly Configuring HTTP Compression

So, given all these challenges, how should you go about configuring HTTP compression properly?

To see where you might have made a mistake configuring your server, your need a something to compare it to. I am a big fan of the .htaccess file from the HTML5 Boilerplate Project. This is an Apache configuration file specifically crafted for web performance optimizations. It provides a great starting point for implementing HTTP compression properly. It also serves as a nice guide to compare to an existing web server configuration to verify you are following best practices. At the very least, the HTML5 Boilerplate .htaccess file provides a comprehensive list of common web content which should or should not get served using HTTP compression.

Getting a good starting point is only half the battle. The configuration for HTTP compression on a web server only works when it matches the application running on that server. Even the HTML5 Boilerplate configuration file can fail you if there is a discrepancy between the file extensions and MIME types in the configuration file and those used by your application. It’s easy to forget or overlook a MIME type or a file extension that you application uses. To ensure your application matches your configuration, the best thing to do is carefully review:

  1. How is your web server configured to map MIME types to content or file extensions?
  2. How is your web server configured to compress content relative to those MIME types or extensions?
  3. How are your application’s filenames and extensions structured?
  4. How does your application change or override a response’s MIME type?
  5. What third party libraries use MIME types?

Once you think you have properly configured the web server, you need to validate it. Web Sniffer is a great, free, web-based tool that let you make individual HTTP requests and see the responses. Web Sniffer gives you some control over the User-Agent and Accept-Encoding header to ensure that compressed content is delivered properly. Hurl is another web-based HTTP tool you can use. It allows for more control than Web Sniffer, but requires you to manually enter more information to get the same results:

Hurl and Web Sniffer only test a single page at a time. You can use Zoompf’s free scan and Zoompf WPO can be used to scan multiple pages to verify no uncompressed content is slipping through.

Conclusions

As this post shows, there are many challenges which must be overcome to properly configure HTTP compression. Make sure all non-natively compressed content is served using HTTP compression. Don’t waste load time, CPU cycles, and bandwidth compressing content that is already compressed. Only use GZIP compression to ensure compatibility. Don’t try to work around old browsers since it is easy to make a mistake and end up not delivering compressed content to a capable browser. Review your application code and server configuration to make sure the application’s content and structure matches your HTTP compression settings. Don’t forget about compressing 404′s. Finally, don’t just assume your configuration works. Use a tool to validate that is works.

Want to see what performance problems your website has? Content Served Without Compression, Compressed Content Served with Compression, Bigger With Compression, and Obsolete Compression Format are just 4 of the nearly 400 performance issues Zoompf detects when testing your web applications. You can get a free performance scan of you website now and at a look at our Zoompf WPO product at Zoompf.com today!