html5shiv and Serving Content From Code Repositories

Posted: May 11, 2012 at 4:32 pm

There are a lot of interesting findings that came out of my analysis of how the Alexa Top 1000 is using HTTP compression. One finding was that JavaScript is the most common type of content served without compression. I hypothesize that this is due to websites linking to all these 3rd party JavaScript libraries and widgets. Normally a developer links to a 3rd party file to enable advertising, web analytics, user feedback and chats systems, or even social sharing widgets. But you cannot control these systems, and so those resources might not have compression enabled the way the files on your own website are enabled for compression.

Today I want to focus on a 3rd party library which wasn’t using compression because it leads to an important and often unmentioned performance lesson: Don’t link to resources inside of source code repositories.

html5shiv is a JavaScript library that enables Internet Explorer 8 and earlier to understand and properly render new HTML5 tags like <article> or <aside>. This is incredibly helpful because it allows websites to use these new semantic elements when retaining backwards compatibility. While html5shiv is open source and free to be copied and used anywhere, websites often link to a specific URL: http://html5shiv.googlecode.com/svn/trunk/html5.js.

In fact, the html5shiv article on Wikipedia includes the following source code snippet, which references this URL.

<!--[if lt IE 9]>
     <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->

I would estimate that 95% of the time I witness a site using html5shiv, they are loading it from http://html5shiv.googlecode.com/svn/trunk/html5.js. Unfortunately, is this bad for performance for several reasons.

It’s Not Served Using HTTP Compression.

The reason I noticed this in the first place is that html5shiv.googlecode.com is not properly configured and does not serve JavaScript files using HTTP compression. Reducing the size of content is a well known category of performance optimization and HTTP compression is an easy way to accomplish this. (Well, easy in theory at least).

It’s Not Cached.

Google’s web server instructs the browser (and all shared caches) to only cache this file for 180 seconds. 180 seconds! By the time you are done reading this article, a copy of html5shiv sitting in your cache would have expired.

The whole point of linking to a commonly JavaScript file on a 3rd party instead of hosting it yourself is to use network effects and have that resource already be cached. That’s the entire reason that JavaScript CDNs like Google Libraries API and Microsoft’s (poorly named) Ajax Content Delivery Network exist. Of course it doesn’t always work. Regardless, 180 seconds is way too short to matter even if JavaScript CDNs were useful.

It’s Not the Most Recent Version.

This branch of html5shiv is not actively developed. As Paul Irish mentions, development for html5shiv has shifted to github, under the care of Alexander Farkas and others. You can find the most recent version here: https://raw.github.com/aFarkas/html5shiv/master/src/html5shiv.js.

(And no, don’t link to that either! It’s not minified, and the Content-Type header is wrong so, ironically, Internet Explorer will not execute the JavaScript that’s supposed to help it!)

This new version of the html5shiv has had numerous bug fixes and performance improvements. In addition, if you actually minified this new version of the html5shiv, you’d find it’s also smaller than older one on Google code. In short, the code at html5shiv.googlecode.com is obsolete.

It Says It Supports Range Requests When It Really Doesn’t.

Since we are on the subject, I might as well bring out all the issues. When you request html5shiv.js, the web server responds with an Accept-Ranges header, indicating that it supports HTTP byte ranging (also known as partial responses) as shown in the screen shot below:

This means the server supports conditional requests, allowing you to download just a small portion of the file. This is helpful when the client has already downloaded some of the response, such as when recovering from a network error. As I wrote in a previous post, supporting partial responses is good for performance. Only googlecode.com doesn’t support partial responses. I used REDBot (which is awesome) to detect this issue, and you can see the results here. If you send a request and specify a byte range using the Range header, the web server still gives you the full response.

The Biggest Problem of all

While these are important issues, they are really just a symptom of a much bigger problem: You’re linking to a source code repository.

Think about that for a second. You are directly linking to a resource in an external source code repository, which you do not control! Giant sirens and flashing lights should be going off in your head.

First of all, because this is a source code repository, the contents could be updated at any time. This is why there is not a far-future expires header. To function properly, this URL can never have a far future expires header. This is most likely the reason that byte range requests fail as well. The content may change, and, even though Last-Modified and ETags could be used, the web server may be able to determine whether it can return a subset of the (potentially new) response body. So, by the very nature of being a code repo, you have two performance optimizations that simply can never happen.

Secondly, this is a source code repository! How many times have you accidently checked in code to a repository that broke the build? I’ve done it lots of times. In fact, it’s so common, that many departments have funny forms of punishment when a developer breaks the build, like forcing them to wear an embarrassing hat or shooting them with automatic NERF machine guns (Skynet is nigh!)

(And the real reason this is an embarrassing hat? Comic Sans MS.)

Sure, this problem exists whenever you link to an external resource beyond your control. But it’s not like we are talking about linking to the Google Analytics JavaScript file here. A simple mistake or typo when using SVN and BOOM! Someone at Google didn’t just break the build, they broke the Internet.

Marco Arment, the creator of Instapaper, has a great podcast, Build and Analyze, over at 5by5. Sometimes he and Dan even talk about development, so I highly recommend it. One of the things he has advocated repeatedly on the show (including the most recent episode), is to always avoid external dependencies you cannot control. Direcly linking to a resource in someone else’s code repository is a perfect example of an external dependency that can easily be, and should be, avoided.

Finally, at the end of the day, we are dealing with a library that is 3854 bytes in length. The new version is 2337 bytes. When served with compression it’s only 1166 bytes. Why on earth are you linking to a 3rd party to serve 1166 bytes? In the time it takes your browser to do the DNS lookup on the html5shiv.googlecode.com hostname, you could have already transmitted the file! Creating an HTTP connection to a 3rd party to download 1166 bytes is just silly and wasteful.

The Moral of the Story

The moral of the story is: “never never never link to a resource inside a source code repository”. This includes your own source code repository and 3rd party ones like googlecode or github. The characteristics inherent in a source code repository are fundamentally at odds with frontend performance best practices. Breaking code in a source code repository or version control system is much easier due to its dynamic nature. Additionally, you should not be exposing internal infrastructure systems like code repositories, version control systems, or continuous integration systems to the public internet. This is highly dangerous and increases your attack surface.

The Moral of the Story (part 2)

The other moral of the store is one I’ve written about before: “Trust, but verify”.

Top search results will tell you to link to the copy of html5shiv.js hosted on Google’s servers. Wikipedia includes a code snippet which links to this file. And you might not think that’s a bad idea. After all this is Google we are talking about! “Let’s make the Web Faster” Google! The creator of awesome performance technologies like PageSpeed and mod_pagespeed and SPDY. The employer of frontend performance greats like Steve Souders, Patrick Meehan, and Mike Belshe. That Google should know what they are talking about.

Well, that Google still got 4 things wrong trying to serve you a JavaScript file that’s less than 4 kilobytes in size. Now I’m not trying to belittle or make fun of Google. My point is that “just because” it’s from Google, doesn’t mean its right. For that matter, “just because” it’s from Zoompf doesn’t make it right. People make mistakes. As the Buddha once said:

Believe nothing, no matter where you read it, or who has said it, not even if I have said it, unless it agrees with your own reason and your own common sense.

You can trust, but verify.

Conclusions

All of this was discovered because of a JavaScript file that was served without HTTP compression. You never know what even a minor performance issue will lead you to discover. Thius is where Zoompf can help you. Our performance scanner tests your web application for nearly 400 issues affecting web performance. You can get a free performance scan of you website now and take a look at our Zoompf WPO product.

Should You Use JavaScript Library CDNs?

Posted: January 15, 2010 at 1:40 pm

The concept is simple. Hundreds of thousands of websites use JavaScript libraries like jQuery or Prototype. Different websites you visit each download another identical copy of these libraries. You probably have a few dozen copies of jQuery in your browser’s cache right now. That’s silly. We should fix that.

How? Well, if there was a 3rd party repository of common JavaScript libraries, websites could simply load their JavaScript files from them. Now imagine the repository implemented caching. SiteA, SiteB, and SiteC all have <SCRIPT SRC> tags that reference http://some-code-respo.com/javascript/jquery.js. When someone visits any one of these sites, the JavaScript library jQuery is downloaded and cached. If that same person visits one of the other sites, that person will not have to re-download jQuery again. The idea is that sites will load faster because these libraries should not have to be re-downloaded very often at all. Of course, this only works if a lot of people all use the common repository. If only a few people use the common repository, then virtually no one benefits because the library will not have been downloaded and cached by a previous website and has to be re-downloaded.

This is an example of the Network effect. The more people that use a system the more valuable the system becomes.

Implementations of this idea of a central shared repository of common JavaScript libraries are called several different things. Google calls their implementation Google AJAX Library API. Yahoo doesn’t have a clear name for their implementation. I’ve seen “Free YUI hosting” or “YUI Dependencies”, or even Yahoo YUI CDN. Microsoft calls their implementation the Microsoft AJAX CDN. To keep things simple, I will collectively refer to these repositories of common JavaScript libraries as JavaScript Library CDNs.

JavaScript Library CDNs seem like a performance no brainer. Use the service, your site loads faster and consumes less bandwidth. This post will explore if and under what conditions does a JavaScript Library CDN actually improve web performance.

The Choice

Consider this situation. You are speed conscious web developer. You have a website that uses jQuery 1.3.2 as well as some additional site specific JavaScript. Because you value web performance, you know you should concatenate all your JavaScript files into as few files as possible, minify them, and serve them using gzip compression. You have 2 choices:

  1. Serve all your JavaScript locally. You will have a single <SCRIPT SRC> tag that points to a JavaScript file containing jQuery 1.3.2 and your site specific JavaScript.
  2. Serve some of the JavaScript using a JavaScript Library CDN. You will have 2 <SCRIPT SCR> tags. The first tag will point to a single file on your website containing your site specific JavaScript files. The second tag will point to the copy of jQuery 1.3.2 on Google AJAX Library API.

What’s the difference? Well a minified, gipped copy of jQuery 1.3.2 is 19,763 bytes in length. If you choose option 1 all your users will have to download these 19,763 bytes regardless of what other sites they may have already visited. That’s the cost: downloading 19,763 bytes. Notice there is no cost of an additional HTTP request and response or other overhead because those bytes of jQuery content are included inside the response for the site specific JavaScript content which the visitor already has to make. This is important, so I will repeat: The cost of not using a JavaScript Library CDN is only the downloading of JavaScript content and not any additional HTTP requests or overhead.

In the second option, you are going to gamble with a JavaScript Library CDN. You are hoping a visitor has already browsed another website which also uses Google to serve jQuery 1.3.2. If you are right, then that visitor does not need to download 19,763 bytes. If you wrong, the visitor needs to download 19,763 bytes from Google. That’s the prize in a nutshell. And downloading 19,763 bytes doesn’t sound bad! Who cares where it comes from?

The Price of Missing

Unfortunately an HTTP request to Google’s JavaScript Library CDN is more expensive than an HTTP request to your own website! This is because a visitor’s browser has to perform a DNS lookup for ajax.googleapis.com and establish a new TCP connection with Google’s systems. If the additional request was to your site instead the visitor’s browser would not need to make another DNS lookup and the HTTP request would be sent over an existing HTTP connection.

Unfortunately this is a stubborn process. DNS lookups and establishing TCP connections involve a few number of very small packets. Having a faster Internet connection will not significantly impact the speed of these operations. Two different runs on WebPageTest showed that it takes 1/3 of a second for a web browser to make a connection to Google’s JavaScript Library CDN and start downloading it. (And remember, these are CDNs so where I make the request from should not matter as the CDN makes sure I’m downloading the content from a web server that is geographically near me.)

Let me repeat that: Using Google’s JavaScript Library CDN comes with a 1/3 of a second tax on missing. (Note that a tax like this applies to opening connections to a any new host: JavaScript Library CDNs, advertisers, analytics and visitor tracking, etc. This is why you should try to reduce the number of different hostnames you serve content from.) Even if this number is smaller for other users, say, 100 milliseconds, it is still a tax that is paid for using a JavaScript Library CDN and missing.

It gets worse because downloading a file over a new TCP connection with Google is slower than downloading a file over an existing TCP connection with your website! This is due to TCP’s slow start and congestion control. Newly created connections transmit data slower than existing connections do. (This is why persistent connections are so important!)

The Odds of Winning

Since JavaScript Library CDNs utilize the Network Effort, they are only valuable if a large number of websites use them. After all, the only way your visitors can “win” in the JavaScript Library CDN gamble is if they have already been to a site that also uses the same CDN. So, how many people actually use Google?

Well, according to the great folks at BuiltWith, only 13% of all websites use some kind of 3rd party CDN. Of those websites using a CDN, 25.56% of them are using Google’s Ajax Library API. So only 3.89% of all websites surveyed are using Google’s AJAX Library API.

I wanted to gather more data than BuiltWith. I also didn’t like that way they grouped Traditional CDNs (like Akamai) with JavaScript Library CDNs (like Google) with private site-specific CDNs (like Turner’s CDN). So I performed my own survey. I visited the top 2000 sites on Alexa and analyzed each one to see who is using Google’s JavaScript Library CDN. The result? Only 69 sites out of 2000, or 3.45%, are using Google’s JavaScript Library CDN. My data is on track with BuiltWith’s data which is good.

Unfortunately you do not vaguely or abstractly “use a JavaScript Library CDN.” You reference a specific URL for the specific JavaScript Library and version number. You only get a benefit from the CDN if you referencing the specific URL that other websites are referencing. So we have to dig deeper and see what versions of what JavaScript libraries are in use. Below is the a table of JavaScript libraries that Alexa Top 2000 sites use served by Google’s AJAX Library API.

JavaScript LibraryNumber of Alexa Top 2000
sites serving the library
from Google’s CDN
jQuery48
Prototype6
SWFObject6
YUI6
jQuery UI4
Script.aculo.us3
MooTools3
Dojo1

We see that 48 sites are using Google’s JavaScript Library CDN to serve jQuery, and of those 36 sites are using jQuery 1.3.2. That means jQuery 1.3.2 is used by 1.8% of the Alexa 2000 websites. SWFObject and Prototype came in next at 6 sites each, or less than 0.334% of the sites. When you factor in version numbers, their penetration drops to around 0.10%.

So what is the best case here? What are the odds that someone would have jQuery 1.3.2 served from Google’s JavaScript Library CDN sitting in their browser cache? If I have clear browser cache, and I visit 35 randomly selected websites from the Alexa top 2000, and then I visit your site, there is only a 47% chance that I will have a cached copy of jQuery 1.3.2 ready for you to use. You calculate this by first determining the probably of randomly picking 35 websites that don’t have jQuery 1.3.2 and subtracting 1. The formula is: 1 – ( (1 – .018) ^ 35 ).

Those are not very good odds. And they only are applicable if you are using jQuery 1.3.2. Anything else is not practical. You also should consider the makeup of the sites on the list. I have probably only visited 30 or so of the websites listed in the Alexa top 2000 list ever and I probably only visit 5-10 with any regularity. We have determined that the odds of “winning” in the CDN gamble are fairly small. How small the odds are will depend on your site content and your visitors. However I think it is safe to say, as of January 2010, the majority of your users will not have visited a site that uses a JavaScipt Library CDN for the JavaScript library that you use.

Getting More Data

So maybe the odds aren’t good. But is it still worth it to potentially help some people?

Let’s go back to our hypothetical situation where we are deciding if we should use a JavaScript CDN or not. Consider someone with 768 kilobyte per second Internet connection where 768 * 1024= 786,432 bits downloaded per second. Let’s say it is operating at only 80% efficiency to account for overhead like IP, TCP, congestion, packet loss, etc. That 629,145 bits downloaded per second, gives us 78,643 bytes downloaded per second or 26,214 bytes downloaded in 1/3 of a second. A minified and gzipped copy of jQuery 1.3.2 is 19,763 bytes long. This means anyone using a 768 kbps internet connection can download the contents of jQuery 1.3.2 in 1/3 of a second. In other words, downloading jQuery 1.3.2 on that connection takes the same amount of time as simply connecting to Google’s JavaScript Library CDN.

This simplifies the decision in our hypothetical situation on where to host jQuery. In the locally hosted option, we are asking our visitors to download some amount of content X. X is all our HTML, images, site specific JavaScript, and includes the 19,763 bytes of jQuery 1.3.2. In the “use a CDN” option, we still have X amount of content. The only difference is the CDN has the 19,763 bytes of jQuery and our site has X – 19,763 bytes of content. If a visitor does not have cached copy of JavaScript Library they still download a total of X amount of content. It is served from our website and from Google. Under these conditions we are led to the following points:

  1. If you are using a CDN and the visitor does not have cached copy, they download the site 1/3 of a second slower than if they had downloaded all the content from your web server.
  2. If you are using a CDN and the visitor does have cached copy, they download all of the content 1/3 of a second faster than if they had downloaded all the content from your web server.

Or, more simply: If we use Google’s JavaScript Library CDN, we are asking the majority of our website visitors (who don’t have jQuery already cached) to take a 1/3 of a second penalty (the time to connection to Google’s CDN) to potentially save a minority of our website visitors (those who do have a cached copy of jQuery) 1/3 of a second (the length of time to download jQuery 1.3.2 over a 768kps connection).

That does not make sense. It makes even less sense as the download speed of your visitors increases. Try to avoid serving 20 or 30 kilobytes of content at the cost of using a 3rd party just doesn’t make sense.

Conclusions

JavaScript Library CDNs use the network effect. Our survey of the Alexa 2000 shows that right now there are too few people in the network to get any value. Only Google’s AJAX Library API has anywhere near the penetration to provide any benefit and only if you are using a specific version of a single JavaScript library. Even in that remote case, serving jQuery 1.3.2 using Google will slow down the majority of your users at the expense of a possibly nonexistent minority. Zoompf recommends the vast majority of websites avoid using JavaScript Library CDNs until they gain more market penetration.

I will discuss the very select group of sites that should use CDNs, as well as some other interesting data discovered while surveying the Alexa 2000 in posts early next week.

Want to see what performance problems you have? Using JavaScript Library CDNs appropriately are just a few of the 200+ performance issues Zoompf detects while assessing your web applications for performance. You can sign up for a free mini web performance assessment at Zoompf.com today!