up

Zoompf's Web Performance Blog

html5shiv and Serving Content From Code Repositories

 Billy Hoffman on May 11, 2012. Category: Optimization
TwitterLinkedInGoogle+FacebookShare

There are a lot of interesting findings that came out of my analysis of how the Alexa Top 1000 is using HTTP compression. One finding was that JavaScript is the most common type of content served without compression. I hypothesize that this is due to websites linking to all these 3rd party JavaScript libraries and widgets. Normally a developer links to a 3rd party file to enable advertising, web analytics, user feedback and chats systems, or even social sharing widgets. But you cannot control these systems, and so those resources might not have compression enabled the way the files on your own website are enabled for compression.

Today I want to focus on a 3rd party library which wasn’t using compression because it leads to an important and often unmentioned performance lesson: Don’t link to resources inside of source code repositories.

html5shiv is a JavaScript library that enables Internet Explorer 8 and earlier to understand and properly render new HTML5 tags like <article> or <aside>. This is incredibly helpful because it allows websites to use these new semantic elements while retaining backwards compatibility. While html5shiv is open source and free to be copied and used anywhere, websites often link to a specific URL: http://html5shiv.googlecode.com/svn/trunk/html5.js.

In fact, the html5shiv article on Wikipedia includes the following source code snippet, which references this URL.

<!--[if lt IE 9]>
     <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->

I would estimate that 95% of the time I witness a site using html5shiv, they are loading it from http://html5shiv.googlecode.com/svn/trunk/html5.js. Unfortunately, is this bad for performance for several reasons.

It’s Not Served Using HTTP Compression.

The reason I noticed this in the first place is that html5shiv.googlecode.com is not properly configured and does not serve JavaScript files using HTTP compression. Reducing the size of content is a well known category of performance optimization and HTTP compression is an easy way to accomplish this. (Well, easy in theory at least).

It’s Not Cached.

Google’s web server instructs the browser (and all shared caches) to only cache this file for 180 seconds. 180 seconds! By the time you are done reading this article, a copy of html5shiv sitting in your cache would have expired.

The whole point of linking to a commonly JavaScript file on a 3rd party instead of hosting it yourself is to use network effects and have that resource already be cached. That’s the entire reason that JavaScript CDNs like Google Libraries API and Microsoft’s (poorly named) Ajax Content Delivery Network exist. Of course it doesn’t always work. Regardless, 180 seconds is way too short to matter even if JavaScript CDNs were useful.

It’s Not the Most Recent Version.

This branch of html5shiv is not actively developed. As Paul Irish mentions, development for html5shiv has shifted to github, under the care of Alexander Farkas and others. You can find the most recent version here: https://raw.github.com/aFarkas/html5shiv/master/src/html5shiv.js.

(And no, don’t link to that either! It’s not minified, and the Content-Type header is wrong so, ironically, Internet Explorer will not execute the JavaScript that’s supposed to help it!)

This new version of the html5shiv has had numerous bug fixes and performance improvements. In addition, if you actually minified this new version of the html5shiv, you’d find it’s also smaller than older one on Google code. In short, the code at html5shiv.googlecode.com is obsolete.

It Says It Supports Range Requests When It Really Doesn’t.

Since we are on the subject, I might as well bring out all the issues. When you request html5shiv.js, the web server responds with an Accept-Ranges header, indicating that it supports HTTP byte ranging (also known as partial responses) as shown in the screen shot below:

This means the server supports conditional requests, allowing you to download just a small portion of the file. This is helpful when the client has already downloaded some of the response, such as when recovering from a network error. As I wrote in a previous post, supporting partial responses is good for performance. Only googlecode.com doesn’t support partial responses. I used REDBot (which is awesome) to detect this issue, and you can see the results here. If you send a request and specify a byte range using the Range header, the web server still gives you the full response.

The Biggest Problem of all

While these are important issues, they are really just a symptom of a much bigger problem: You’re linking to a source code repository.

Think about that for a second. You are directly linking to a resource in an external source code repository, which you do not control! Giant sirens and flashing lights should be going off in your head.

First of all, because this is a source code repository, the contents could be updated at any time. This is why there is not a far-future Expires header. To function properly, this URL can never have a far future Expires header. This is most likely the reason that byte range requests fail as well. The content may change, and, even though Last-Modified and ETags could be used, the web server may not be able to determine whether it can return a subset of the (potentially new) response body. So, by the very nature of being a code repo, you have two performance optimizations that simply can never happen.

Secondly, this is a source code repository! How many times have you accidently checked in code to a repository that broke the build? I’ve done it lots of times. In fact, it’s so common, that many departments have funny forms of punishment when a developer breaks the build, like forcing them to wear an embarrassing hat or shooting them with automatic NERF machine guns (Skynet is nigh!)

(And the real reason this is an embarrassing hat? Comic Sans MS.)

Sure, this problem exists whenever you link to an external resource beyond your control. But it’s not like we are talking about linking to the Google Analytics JavaScript file here. A simple mistake or typo when using SVN and BOOM! Someone at Google didn’t just break the build, they broke the Internet.

Marco Arment, the creator of Instapaper, has a great podcast, Build and Analyze, over at 5by5. Sometimes he and Dan even talk about development, so I highly recommend it. One of the things he has advocated repeatedly on the show (including the most recent episode), is to always avoid external dependencies you cannot control. Direcly linking to a resource in someone else’s code repository is a perfect example of an external dependency that can easily be, and should be, avoided.

Finally, at the end of the day, we are dealing with a library that is 3854 bytes in length. The new version is 2337 bytes. When served with compression it’s only 1166 bytes. Why on earth are you linking to a 3rd party to serve 1166 bytes? In the time it takes your browser to do the DNS lookup on the html5shiv.googlecode.com hostname, you could have already transmitted the file! Creating an HTTP connection to a 3rd party to download 1166 bytes is just silly and wasteful.

The Moral of the Story

The moral of the story is: “never never never link to a resource inside a source code repository”. This includes your own source code repository and 3rd party ones like googlecode or github. The characteristics inherent in a source code repository are fundamentally at odds with frontend performance best practices. Breaking code in a source code repository or version control system is much easier due to its dynamic nature. Additionally, you should not be exposing internal infrastructure systems like code repositories, version control systems, or continuous integration systems to the public internet. This is highly dangerous and increases your attack surface.

The Moral of the Story (part 2)

The other moral of the story is one I’ve written about before: “Trust, but verify”.

Top search results will tell you to link to the copy of html5shiv.js hosted on Google’s servers. Wikipedia includes a code snippet which links to this file. And you might not think that’s a bad idea. After all this is Google we are talking about! “Let’s make the Web Faster” Google! The creator of awesome performance technologies like PageSpeed and mod_pagespeed and SPDY. The employer of frontend performance greats like Steve Souders, Patrick Meehan, and Mike Belshe. That Google should know what they are talking about.

Well, that Google still got 4 things wrong trying to serve you a JavaScript file that’s less than 4 kilobytes in size. Now I’m not trying to belittle or make fun of Google. My point is that “just because” it’s from Google, doesn’t mean its right. For that matter, “just because” it’s from Zoompf doesn’t make it right. People make mistakes. As the Buddha once said:

Believe nothing, no matter where you read it, or who has said it, not even if I have said it, unless it agrees with your own reason and your own common sense.

You can trust, but verify.

Conclusions

All of this was discovered because of a JavaScript file that was served without HTTP compression. You never know what even a minor performance issue will lead you to discover. This is where Zoompf can help you. Our performance scanner tests your web application for nearly 400 issues affecting web performance. You can get a free performance scan of you website now and take a look at our Zoompf WPO product.

Comments

Have some thoughts, a comment, or some feedback? Talk to us on Twitter @zoompf or use our contact us form.