February 4, 2010

Choosing PNG8 Candidate Images

Have you heard about PNG8 yet? No? Well, PNG8 is a PNG image that is used an indexed palette of 256 colors instead of a true color PNG which can support several million different colors. There has already been a number of excellent articles and blog posts written about PNG8. These articles have discussed things about PNG8 such as its benefits, how to create PNG8, and using them across different browsers.

But this article isn’t about any of that.

This article is about how to choose PNG images that are good candidates for converting to PNG8. Historically this has not been an easy or a straight forward decision.

We already know that PNG images can be optimized. Crunching PNGs (using pngcrush, or optipng, etc) to reduce their size is a lossless operation. This process removes unnecessary data chucks like comments and recompresses the DEFLATE streams using tweaked settings. When optimizing images by crunching them none of the image data changes. The number of colors in the image does not change and their color values and hues are exactly the same.

But converting a true color PNG to a PNG8 image is a lossly operation. You lose data. PNG8 images can have at most 256 distinct colors while true color PNGs can have several million. Because of this not all PNG images should be converted to PNG8. Some images look absolutely horrible when converted. Working with our clients we have created two guidelines to evaluate whether a PNG image can be converted to PNG8 without any noticeable loss in quality. This is a easier and scalable solution than converting all of the PNG images on your website and then manually verifying the resulting PNG8 images are acceptable.

Guideline #1: Number of Colors

To understand the impact of limiting a PNG image to only 256 distinct colors we must understand how many colors PNG images typically have. Converting to PNG8 can quite significantly reduce the number of colors in the image. A PNG8 version of a true color PNG image with 1,000 distinct colors has 25% as many colors of the original image. A PNG8 version of a true color PNG image with 10,000 colors has only 2.5% as many colors of the original image! So the number of distinct colors in the image (and thus the number of distinct colors you are destroying when converting to PNG8) has a huge impact on how acceptable the resulting PNG8 will be. In plain terms the PNG8 version of a 10,000 color PNG image will look worse than the PNG8 version of a 1,000 color PNG image.

While true color PNGs can have several million distinct colors they rarely do. (If you have a PNG image that has even a few tens of thousands of colors it should probably be saved as a JPEG instead). Examining a sample set of PNG images we found that PNGs tend to have several thousand distinct colors. We also have found that images containing only a few thousand colors will easily convert to a PNG8 image without any noticeable loss in quality. Consider the Zoompf logo:

The Zoompf logo is a true color PNG image consisting of 1999 distinct colors. That sounds like a lot. Let’s convert this to a PNG8 image using pngquant. Here is the original logo and the PNG8 version of the logo side by side for comparison. The original logo is on the top and new PNG8 version is on the bottom.

Wow, They look virtually identical even though the PNG8 version’s file size is over 50% smaller and uses 87% less colors than the original. Only if you zoom in very close do you start to see some differences. The greens are slightly lighter and the gray in the swoosh lines are a little different.

We have found that images will less than 2500-3000 distinct colors tend to provide the best trade off in terms of maximum reduction in file size without any noticeable difference to quality. This is purely subjective. There are some true color PNG images with 6,000 colors or more that look just fine when converted to PNG8. You should experiment and see what works best for you.

Guideline #2: Image Dimensions

Another factor is the dimensions of the image. Small images, even if they have more than 7,500 distinct colors, convert to PNG8 with not visible loss of quality. This is because your brain has trouble detecting some many similar colors in such a small area. Consider this true color PNG image of a cat.

This picture consists of 8,853 distinct colors. That’s an enormous amount when you realize this image has only 9900 pixels total! Almost every pixel is a completely unique color. That’s tons of distinct colors given the area they are displayed in. Again we use pngquant to convert this cat image into a PNG8 image and compare it to the original. The original image is on the top and PNG8 version is on the bottom.

Again, they look virtually identical even though the PNG8 version’s file size is over 50% smaller and uses 97.2% less colors than the original. As if the logo, only if you zoom in very close you can start to see the differences between the original image and the PNG8 version.

We have found that images less than 100 pixels by 100 pixels, or an image whose area is less than 10,000 pixels can easily be converted into a PNG8 without any noticeable difference in quality. This is a purely subjective guideline. Some larger true color PNG images look just fine when converted to PNG8.

Using the Guidelines

These guidelines can be used separately. A PNG image does not have to be both small and not using many colors to be a good candidate for converting to PNG8. As as example, Graphviz, a program that generates node-and-edge style graphs, regularly produces images that are thousands of pixels wide by thousands of pixels tall. This would violate our image area guideline. However these images usually contain a few hundred colors. This satisfies our color count guideline. Sure enough, converting the output of Graphviz to PNG8 saves a lot of space with no perceivable loss of quality.

PNG8 and CSS Sprites

A lot of times people want to combine PNG8′s advantage of a very small file size with CSS Sprite’s’ advantage of reducing the number of HTTP requests. At first glance this makes a lot of sense. Individual CSS background images inside of the sprite are often small images, fitting our image dimensions guideline. Also CSS background images are often icon-style images used on buttons, toolbars, etc. This means each individual image tends to have only several dozen distinct colors fitting our image colors guideline.

Unfortunately this is looking at the trees instead of the forest. That’s because a CSS Sprite saved as a PNG8 image has to use only 256 distinct colors for all the sub-images inside the sprite. So while each sub-image that makes up the CSS Sprite might look fine as an individual PNG8 image (each with its own 256 color palette) all the sub-images together in a single PNG8 CSS Sprite using a single common 256 color palette could not. This is especially true with gradients and othe graphics that use different shades for color transitions. In fact, Stoyan pointed out a recent article talked about how converting a CSS Sprite to PNG8 caused a very noticeable loss in quality. The solution was the hand edit the PNG8′s 256 color palette to preserve as many shades of the gradient as possible to improve quality.

Zoompf Color Counter

Since so much savings can occur from converting from PNG24 to PNG8 where appropriate, developers are left with the challenge of trying to quickly find candidate images. While Zoompf’s free web performance scanning service will detect candidate images developers will also want to test images that are not yet uploaded or test images on a website that is not yet in production. To help developers, Zoompf has released the Zoompf Color Counter.

Screen shot of Zoompf's Color Counter program

Zoompf Color Counter is a Windows program that will analyze an image and tell the user how many distinct colors it has. Simply open an image inside Zoompf Color Counter or drag and drop an image on top of Zoompf Color Counter to learn the number of distinct colors. Download Zoompf Color.

Summary

There are a lot of blog posts and articles on the Internet about how converting true color PNG images into PNG8 images is an excellent optimization technique. However knowing how to choose true color PNG images that are good candidates to convert to PNG8 can be difficult and time consuming. In this post we provide 2 guidelines to help:

  1. Images with less than 2500 to 3000 distinct colors can usually be converted to PNG8 without any noticeable differences
  2. Image less than 100 pixels by 100 pixels (or an image whose area is less than 10,000 pixels) can usually be converted to PNG8 without any noticeable differences.

In addition we have release Zoompf Color Counter to help developers find candidate images. Remember that converting to PNG8 is a lossly process. How much loss is tolerable will vary from person to person. You should use our guidelines but also experiment to see if you will tolerate more. Finally, be careful converting your CSS Sprites image into PNG8, especially if you use images with gradients.

Want to see what performance problems your website has? Finding Candidate PNG8 Images based on color count or image dimensions are just two of the 300+ performance issues Zoompf detects when checking your web applications. You can sign up for a free mini web performance assessment at Zoompf.com today!

January 15, 2010

Should You Use JavaScript Library CDNs?

The concept is simple. Hundreds of thousands of websites use JavaScript libraries like jQuery or Prototype. Different websites you visit each download another identical copy of these libraries. You probably have a few dozen copies of jQuery in your browser’s cache right now. That’s silly. We should fix that.

How? Well, if there was a 3rd party repository of common JavaScript libraries, websites could simply load their JavaScript files from them. Now imagine the repository implemented caching. SiteA, SiteB, and SiteC all have <SCRIPT SRC> tags that reference http://some-code-respo.com/javascript/jquery.js. When someone visits any one of these sites, the JavaScript library jQuery is downloaded and cached. If that same person visits one of the other sites, that person will not have to re-download jQuery again. The idea is that sites will load faster because these libraries should not have to be re-downloaded very often at all. Of course, this only works if a lot of people all use the common repository. If only a few people use the common repository, then virtually no one benefits because the library will not have been downloaded and cached by a previous website and has to be re-downloaded.

This is an example of the Network effect. The more people that use a system the more valuable the system becomes.

Implementations of this idea of a central shared repository of common JavaScript libraries are called several different things. Google calls their implementation Google AJAX Library API. Yahoo doesn’t have a clear name for their implementation. I’ve seen “Free YUI hosting” or “YUI Dependencies”, or even Yahoo YUI CDN. Microsoft calls their implementation the Microsoft AJAX CDN. To keep things simple, I will collectively refer to these repositories of common JavaScript libraries as JavaScript Library CDNs.

JavaScript Library CDNs seem like a performance no brainer. Use the service, your site loads faster and consumes less bandwidth. This post will explore if and under what conditions does a JavaScript Library CDN actually improve web performance.

The Choice

Consider this situation. You are speed conscious web developer. You have a website that uses jQuery 1.3.2 as well as some additional site specific JavaScript. Because you value web performance, you know you should concatenate all your JavaScript files into as few files as possible, minify them, and serve them using gzip compression. You have 2 choices:

  1. Serve all your JavaScript locally. You will have a single <SCRIPT SRC> tag that points to a JavaScript file containing jQuery 1.3.2 and your site specific JavaScript.
  2. Serve some of the JavaScript using a JavaScript Library CDN. You will have 2 <SCRIPT SCR> tags. The first tag will point to a single file on your website containing your site specific JavaScript files. The second tag will point to the copy of jQuery 1.3.2 on Google AJAX Library API.

What’s the difference? Well a minified, gipped copy of jQuery 1.3.2 is 19,763 bytes in length. If you choose option 1 all your users will have to download these 19,763 bytes regardless of what other sites they may have already visited. That’s the cost: downloading 19,763 bytes. Notice there is no cost of an additional HTTP request and response or other overhead because those bytes of jQuery content are included inside the response for the site specific JavaScript content which the visitor already has to make. This is important, so I will repeat: The cost of not using a JavaScript Library CDN is only the downloading of JavaScript content and not any additional HTTP requests or overhead.

In the second option, you are going to gamble with a JavaScript Library CDN. You are hoping a visitor has already browsed another website which also uses Google to serve jQuery 1.3.2. If you are right, then that visitor does not need to download 19,763 bytes. If you wrong, the visitor needs to download 19,763 bytes from Google. That’s the prize in a nutshell. And downloading 19,763 bytes doesn’t sound bad! Who cares where it comes from?

The Price of Missing

Unfortunately an HTTP request to Google’s JavaScript Library CDN is more expensive than an HTTP request to your own website! This is because a visitor’s browser has to perform a DNS lookup for ajax.googleapis.com and establish a new TCP connection with Google’s systems. If the additional request was to your site instead the visitor’s browser would not need to make another DNS lookup and the HTTP request would be sent over an existing HTTP connection.

Unfortunately this is a stubborn process. DNS lookups and establishing TCP connections involve a few number of very small packets. Having a faster Internet connection will not significantly impact the speed of these operations. Two different runs on WebPageTest showed that it takes 1/3 of a second for a web browser to make a connection to Google’s JavaScript Library CDN and start downloading it. (And remember, these are CDNs so where I make the request from should not matter as the CDN makes sure I’m downloading the content from a web server that is geographically near me.)

Let me repeat that: Using Google’s JavaScript Library CDN comes with a 1/3 of a second tax on missing. (Note that a tax like this applies to opening connections to a any new host: JavaScript Library CDNs, advertisers, analytics and visitor tracking, etc. This is why you should try to reduce the number of different hostnames you serve content from.) Even if this number is smaller for other users, say, 100 milliseconds, it is still a tax that is paid for using a JavaScript Library CDN and missing.

It gets worse because downloading a file over a new TCP connection with Google is slower than downloading a file over an existing TCP connection with your website! This is due to TCP’s slow start and congestion control. Newly created connections transmit data slower than existing connections do. (This is why persistent connections are so important!)

The Odds of Winning

Since JavaScript Library CDNs utilize the Network Effort, they are only valuable if a large number of websites use them. After all, the only way your visitors can “win” in the JavaScript Library CDN gamble is if they have already been to a site that also uses the same CDN. So, how many people actually use Google?

Well, according to the great folks at BuiltWith, only 13% of all websites use some kind of 3rd party CDN. Of those websites using a CDN, 25.56% of them are using Google’s Ajax Library API. So only 3.89% of all websites surveyed are using Google’s AJAX Library API.

I wanted to gather more data than BuiltWith. I also didn’t like that way they grouped Traditional CDNs (like Akamai) with JavaScript Library CDNs (like Google) with private site-specific CDNs (like Turner’s CDN). So I performed my own survey. I visited the top 2000 sites on Alexa and analyzed each one to see who is using Google’s JavaScript Library CDN. The result? Only 69 sites out of 2000, or 3.45%, are using Google’s JavaScript Library CDN. My data is on track with BuiltWith’s data which is good.

Unfortunately you do not vaguely or abstractly “use a JavaScript Library CDN.” You reference a specific URL for the specific JavaScript Library and version number. You only get a benefit from the CDN if you referencing the specific URL that other websites are referencing. So we have to dig deeper and see what versions of what JavaScript libraries are in use. Below is the a table of JavaScript libraries that Alexa Top 2000 sites use served by Google’s AJAX Library API.

JavaScript LibraryNumber of Alexa Top 2000
sites serving the library
from Google’s CDN
jQuery48
Prototype6
SWFObject6
YUI6
jQuery UI4
Script.aculo.us3
MooTools3
Dojo1

We see that 48 sites are using Google’s JavaScript Library CDN to serve jQuery, and of those 36 sites are using jQuery 1.3.2. That means jQuery 1.3.2 is used by 1.8% of the Alexa 2000 websites. SWFObject and Prototype came in next at 6 sites each, or less than 0.334% of the sites. When you factor in version numbers, their penetration drops to around 0.10%.

So what is the best case here? What are the odds that someone would have jQuery 1.3.2 served from Google’s JavaScript Library CDN sitting in their browser cache? If I have clear browser cache, and I visit 35 randomly selected websites from the Alexa top 2000, and then I visit your site, there is only a 47% chance that I will have a cached copy of jQuery 1.3.2 ready for you to use. You calculate this by first determining the probably of randomly picking 35 websites that don’t have jQuery 1.3.2 and subtracting 1. The formula is: 1 – ( (1 – .018) ^ 35 ).

Those are not very good odds. And they only are applicable if you are using jQuery 1.3.2. Anything else is not practical. You also should consider the makeup of the sites on the list. I have probably only visited 30 or so of the websites listed in the Alexa top 2000 list ever and I probably only visit 5-10 with any regularity. We have determined that the odds of “winning” in the CDN gamble are fairly small. How small the odds are will depend on your site content and your visitors. However I think it is safe to say, as of January 2010, the majority of your users will not have visited a site that uses a JavaScipt Library CDN for the JavaScript library that you use.

Getting More Data

So maybe the odds aren’t good. But is it still worth it to potentially help some people?

Let’s go back to our hypothetical situation where we are deciding if we should use a JavaScript CDN or not. Consider someone with 768 kilobyte per second Internet connection where 768 * 1024= 786,432 bits downloaded per second. Let’s say it is operating at only 80% efficiency to account for overhead like IP, TCP, congestion, packet loss, etc. That 629,145 bits downloaded per second, gives us 78,643 bytes downloaded per second or 26,214 bytes downloaded in 1/3 of a second. A minified and gzipped copy of jQuery 1.3.2 is 19,763 bytes long. This means anyone using a 768 kbps internet connection can download the contents of jQuery 1.3.2 in 1/3 of a second. In other words, downloading jQuery 1.3.2 on that connection takes the same amount of time as simply connecting to Google’s JavaScript Library CDN.

This simplifies the decision in our hypothetical situation on where to host jQuery. In the locally hosted option, we are asking our visitors to download some amount of content X. X is all our HTML, images, site specific JavaScript, and includes the 19,763 bytes of jQuery 1.3.2. In the “use a CDN” option, we still have X amount of content. The only difference is the CDN has the 19,763 bytes of jQuery and our site has X – 19,763 bytes of content. If a visitor does not have cached copy of JavaScript Library they still download a total of X amount of content. It is served from our website and from Google. Under these conditions we are led to the following points:

  1. If you are using a CDN and the visitor does not have cached copy, they download the site 1/3 of a second slower than if they had downloaded all the content from your web server.
  2. If you are using a CDN and the visitor does have cached copy, they download all of the content 1/3 of a second faster than if they had downloaded all the content from your web server.

Or, more simply: If we use Google’s JavaScript Library CDN, we are asking the majority of our website visitors (who don’t have jQuery already cached) to take a 1/3 of a second penalty (the time to connection to Google’s CDN) to potentially save a minority of our website visitors (those who do have a cached copy of jQuery) 1/3 of a second (the length of time to download jQuery 1.3.2 over a 768kps connection).

That does not make sense. It makes even less sense as the download speed of your visitors increases. Try to avoid serving 20 or 30 kilobytes of content at the cost of using a 3rd party just doesn’t make sense.

Conclusions

JavaScript Library CDNs use the network effect. Our survey of the Alexa 2000 shows that right now there are too few people in the network to get any value. Only Google’s AJAX Library API has anywhere near the penetration to provide any benefit and only if you are using a specific version of a single JavaScript library. Even in that remote case, serving jQuery 1.3.2 using Google will slow down the majority of your users at the expense of a possibly nonexistent minority. Zoompf recommends the vast majority of websites avoid using JavaScript Library CDNs until they gain more market penetration.

I will discuss the very select group of sites that should use CDNs, as well as some other interesting data discovered while surveying the Alexa 2000 in posts early next week.

Want to see what performance problems you have? Using JavaScript Library CDNs appropriately are just a few of the 200+ performance issues Zoompf detects while assessing your web applications for performance. You can sign up for a free mini web performance assessment at Zoompf.com today!

December 14, 2009

Performance Questions to Ask Hosting Providers: Secure Website Access

(This is the third article in a series of articles about performance questions you should ask when choosing a hosting provider. The first article, “What control do I have over the web server?” and the second article “What access do you provide to web server logs?” are also available.)

So far in this series we have talked a lot about questions to ask hosting providers to make sure you can configure your website for performance and access the raw traffic logs of your website to spot performance problems. All of this is moot of course if you cannot get content onto your website. That’s why this post of “Questions to ask a hosting provider” is all about:

“Can I Securely Communicate With My Website?”

ethernet-locked

It has happened to everyone. You are out at a coffee shop, a client site, or at a conference and you need to make changes to your website. Perhaps you need to upload a few new PHP files or some images. Perhaps you need to update your web server configuration to set up a new email address for an event. Perhaps you simply saw something cool and want to write a WordPress post. But can you do anything of these things securely using a public network? This question is best answered with an analogy.

Imagine you are at a formal cocktail party. You drift from room to room, through a sea of lavishly dressed party goers and dine on mouth-watering morsels served on silver trays by waiters in white gloves. As you approach a side table of crystal champagne glasses you overhear bits and pieces of the conversations around you.

  • “We cannot wait. It should be a lovely vacation and it’s the perfect time for us to get away for a week.”
  • “That’s right, with the nanny! Walked right in on them! And he tried to say that she was only choking!”
  • “Chris starts there next spring, just like his father.”

Well attended cocktail parties are loud and noisy. Its almost impossible not to hear what everyone else is saying! Of course we are taught that to be polite we should ignore the conversations other people are having unless we are involved. You are on the honor system not to eavesdrop.

Public networks such as wireless networks are just like cocktail parties. Your wireless card is like a party guest. It broadcasts out to the room when it “speaks” and “listens” to everyone within range to hear a response. Like a real party guest, wireless cards are supposed to ignore any conversations that they overhear that is not meant for them. They do this by dropping the data and not bubbling it up to the computer. However nothing forces network devices to ignore data they receive that is not meant for them. In fact, all networking devices (not just wireless devices) can be placed into “Promiscuous Mode” where any data they receive, even data that is not addressed to themselves, is received and bubbled up to the computer to process. This allows any networking device to become a giant listening device that hears and records all the information on the network! Promiscuous mode is not some evil hacker trick. It’s a fully intended feature of networking devices that has many legitimate uses.

Diagram showing how clients in a wireless network hear each others' traffic

But wait! I use Encryption!

“The conference wireless network or the coffee shops wireless network is encrypted. They tell me they use something called WPA2 with a key of a million bits! I’m secure right?”

No, you are not secure.

Let’s go back to the cocktail party analogy. The hosts don’t want just anyone coming into their party and drinking all their fine wines. So they place a bouncer at the door of the party. Only people that know the password are allowed into the party. If you know the password you get into the party and can listen to all the other guests. If you do not know the password you remain outside the building and cannot hear anything that is going on inside.

Encrypted wireless networks are just cocktail parties with bouncers. You need the “password” to join the wireless network. Once you are connected you can listen to everyone else’s traffic just like before because on the network everyone is using the same password to transmit and receive their data. (This is the only scalable solution. Otherwise the wireless network administrator would have to create a new, unique password for each and every person that joins the network). In other words, an encrypted network uses the password solely to protect and restrict “access” to the network. It does nothing to protect the users of the network from themselves or from each other.

The Danger of Sniffing (packets)

So What! Who cares if someone can listen to my network traffic. It’s not a big deal. After all they will just see the blog content I was about to post anyway. Unfortunately this is not true. Using any system that requires a username and a password on a wireless network? You may have shouted to the entire cocktail party that username and password. And chances are you use that same username and password somewhere else on the Internet. Like your bank. Or an online store. Are you already logged into a system like Gmail or your WordPress administration panel? You are shouting your HTTP Cookies to the entire cocktail party. Someone can steal your HTTP session cookies and use session hijacking to access Gmail or WordPress as if they were you without needing your username and password. Next thing you know you are on The Wall Of Sheep!

Secure Communications With Your Website

Remember: network encryption protects networks and application encryption protects applications! You need to make sure you are using encrypted application protocols to properly protect yourself. What protocols you use and how you use them will vary with different use cases.

Uploading Content

How do you upload content to your website? If the answer is FTP you are in trouble. FTP sends usernames and passwords in the clear. You need an encrypted file transfer mechanism like SFTP or SCP. If you have shell access to your web server using SSH you also have the ability to use either SFTP or SCP as they are simply subsets of the functionality of SSH. By default most hosting companies provide an insecure file transfer system like FTP. Ask if they provide (for free) a secure file transfer system like SFTP or SCP. Make sure they understand you don’t need full SSH functionality and are only interested in secure file transfer. If this is not available you might need to upgrade your account or purchase an add-on to get SSH access for your website.

Writing Content

Do you use a web interface to write content for your blog platform or CMS system? Does it use SSL? Check the address bar. Does it start with https? If not you are not using SSL. Do you write your content using other software? Does that software directly publish the content to your blog using a web API like RSD or XMLRPC? Does that use SSL? Check the settings and see if you are using “https” to access the API interface. If you are not using SSL to communicate with these web resources then anyone can capture your username and password or cookies (which are just as good as your username and password).

Website Administration

How do you administer your website? Do you use a web interface like cPanel? These web administration interfaces are most common in shared hosting environments and typically run on a different hostname or an odd port number. Ask the hosting provider if they offer SSL access to the interface. Hosting providers often get confused and think you want to create an SSL certificate for your website. While this would secure a CMS you configure like WordPress (see previous use case) it does not help you secure the web administration interface because that is often running on a separate system. Make sure they understand you want secure access to their interface, not your website. This discussion may take several emails back and forth but most hosting providers are willing to supply SSL access to cPanel or other administration interfaces.

Summary

In conclusion, the questions about secure communications you should ask your hosting provider are:

  • “Do you provide a secure file transfer mechanism like SFTP or SCP? Is it provided for free or is it extra? If you don’t do you offer SSH access to the web server? Is it free?”
  • “If you provide a web-based website administration interface like cPanel do you provide access to it using SSL?”
  • “Do you provide an SSL certificate for my CMS? What is the cost?”

How to judge their answers will vary from person to person based on need. Personally, a secure file transfer mechanism is a requirement. Too many times have I needed to upload a presentation, PDF, or file to my website from a public network at a conference or client site. If you have a heavy blogger secure access to your content management system is going to be critical. After all, it is difficult to write a blog post about an event from the event if you cannot securely access your blog to write the post!

December 3, 2009

Web Performance Book Recommendations

Stoyan has a good blog post today as part of his Performance Advent series about required reading for web developers. He covered some great books. All three of the three books that have been published are currently sitting on my bookshelf and you should buy them immediately if you don’t already own them. I thought I’d share a few more books that I have read that contain more web performance tips and tricks that I have not seen in the books he recommended. Some of helpful some are not. Having written a book myself on Ajax Security I know exactly how difficult it is to create a meaningful and lasting book of substance. All of these authors deserve respect, even if the book no longer is beneficial today. For each of these 4 books I have included my overview and opinion of the book, the key performance tips and ideas it contains, and my recommendation.

Web Caching. By Duane Wessels (O’Reilly, 2001)

Cover of book "Web Caching"

Duane Wessels is the perfect choice to write what is the definitive guide to web caching as he is the creator of the Squid Caching Proxy. While this book is targeted more at IT operational folks (specifically people who install, configure, monitor, and maintain web proxies) it provides excellent background into how caching proxies work and are deployed and what they will and will not cache. It also has, without a doubt, the best explanation about Cache-Control directives I have ever read. It explains what the directives mean, how they interact with each other, and how caching proxies and the browser cache act on those directives. Think you know what “no-cache” does? You are wrong.

Key Performance Information

This book has tidbits here and there that will help front-end performance such as: Using Cache-Control correctly. Adding support for stale resources. What will proxies not cache even if it’s allowed (URL’s with query strings, CGI-bin directories, etc). When is caching used but pointless (varying on cookies, host, etc). How can you improve your hit/miss ratio.

Verdict

BUY! Good background, worth the cost of the book for the exhaustive explanation of caching directives alone. A dozen or so front-end performance tidbits scattered throughout. Find a cheap used copy.

JavaScript: The Good Parts. By Douglas Crockford (O’Reilly 2008)

Cover of book "JavaScript:The Good Parts"

Written by JSON creator Douglas Crockford, JavaScript: The Good Parts provides a detailed analysis of JavaScript as a programming language and explorers what features of the language aid and what features hinder the creation of beautiful code and why. While targeted at JavaScript developers Chapter 10 and Appendixes A, B, and C provide a wealth of performance advice.

Key Performance Information

Half a dozen JavaScript performance tips mixed in throughout such as: Controlling scope chains of variables, dynamic compilation of code at runtime, avoiding type coercion, loop construction, regular expression performance.

Verdict

BUY! Will open your eyes about the elegance of JavaScript. If you like computer science and algorithms you will love this book. If you are only interested in the performance tips you’ll be disappointed if you pay full price for such a small book. Buy it used in that case.

Web Performance Tuning. By Patrick Killelea (O’Reilly 2002)

Cover of book "Web Performance Tuning"

Originally written in 1998 the 2nd edition with seemingly minimal updating was released in 2002. I really wanted to like this book. It is well written with tons of data tables, charts, and graphs. Unfortunately nearly the entirety of the book serves better as a reference manual and contains little and poorly actionable performance advice for web developers. For example, the chapter on “Security” is about SSL. (I will punch the next person in the face who equates web security with SSL and firewalls). This chapter contains some nice graphs of the performance of an obsolete Netscape web server. After all of that the “advice” is to “consider buying an SSL accelerator card.” What about performance of different algorithms? Or how to optimize SSL negotiation? Or the importance of keeping SSL connections open? Nothing (though I’ll be writing a blog post about optimizing SSL performance soon). The book also contains very outdated filler chapters such as choosing a modem, choosing a client and server OS, choosing client and server hardware, and an overview of non-HTTP network protocols.

That is not to say this is a bad book. There are some very enjoyable parts. I found the information about the chain of syscalls Apache makes to process an HTTP request and serve the response to be utterly fascinating. Chapter 19 is only chapter truly applicable to front-end performance. You should know everything in the chapter already but it is interesting largely because the advice it contains predates the current front-end performance movement by a good 7 years.

Key Performance Information

All but a very few bits of performance advice is obsolete and focuses entirely on the back-end. The main nuggets were things like: Use short filenames to save space. Minimize the use of symbolic links on the server. Turn off reverse DNS lookup for log files. Turn off mod_status. Set height and width HTML attributes to avoid repainting/re-rendering.

Verdict

Do Not Buy. This is no longer a useful book about web performance and based on the number of filler chapters I doubt its value when it was published. It is an enjoyable book if you are interested in learing more about how back-end web hardware functions. If so I suggest you find a used copy as the information this book contains is so out of date it’s not worth anywhere near its cover price. I purchased it for $2.77 from Amazon and was happy.

Building Scalable Web Sites. By Cal Henderson (O’Reilly 2006)

Cover of "Building Scalable Web Sites"

Cal is the lead developer of Flickr so he knows a thing or three about building complex web applications that have to performance for millions of users. Don’t pigeon-hole this book as a back-end hardware book. It is a holistic book that covers a lot of ground in just 320 pages. This book is a guide to the development processes and practices, as well as architectural and back-end design of web sites that can be maintained and scaled to immense levels of traffic. Yes there is information about load balancers and database clustering. But there is also information about coding practices: Using source code, branching, supporting international characters, abstracting away translations, abstracting/modulizing your code for easy updating, fail over, A/B testing of new features, and failover. Think of it as a modern version of Web Performance Tuning with current and proper information and no filler.

Key Performance Information

No specific advice per say. Instead this book is about how the design and building of web applications that are easy to maintain, expand and extended, and quickly replace based on the growth of your user base. It will change the way to build web applications.

Verdict

BUY! An excellent survey of the processes needed to build and grow truly scalable applications. Its information on building asynchronous remote systems is worth the price alone. I am using this as my bible as I design the web front-end to Zoompf’s scanning engine. I highly recommend this book to both web developers and IT operations.

Conclusions

There are some obvious must have web performance books available today. However there are additional books that provide insight into the tricks, tips, and processes needed to build high performance web applications that are not published elsewhere. Hopefully this post should help you build out your library of web performance books.

Did I miss one? Please comment below and tell me what other books you recommend that can contain good advice to improve website performance.