up

Zoompf's Web Performance Blog

html5shiv and Serving Content From Code Repositories

 Billy Hoffman on May 11, 2012. Category: Optimization
TwitterLinkedInGoogle+FacebookShare

There are a lot of interesting findings that came out of my analysis of how the Alexa Top 1000 is using HTTP compression. One finding was that JavaScript is the most common type of content served without compression. I hypothesize that this is due to websites linking to all these 3rd party JavaScript libraries and widgets. Normally a developer links to a 3rd party file to enable advertising, web analytics, user feedback and chats systems, or even social sharing widgets. But you cannot control these systems, and so those resources might not have compression enabled the way the files on your own website are enabled for compression.

Today I want to focus on a 3rd party library which wasn’t using compression because it leads to an important and often unmentioned performance lesson: Don’t link to resources inside of source code repositories.

html5shiv is a JavaScript library that enables Internet Explorer 8 and earlier to understand and properly render new HTML5 tags like <article> or <aside>. This is incredibly helpful because it allows websites to use these new semantic elements while retaining backwards compatibility. While html5shiv is open source and free to be copied and used anywhere, websites often link to a specific URL: http://html5shiv.googlecode.com/svn/trunk/html5.js.

In fact, the html5shiv article on Wikipedia includes the following source code snippet, which references this URL.

<!--[if lt IE 9]>
     <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->

I would estimate that 95% of the time I witness a site using html5shiv, they are loading it from http://html5shiv.googlecode.com/svn/trunk/html5.js. Unfortunately, is this bad for performance for several reasons.

It’s Not Served Using HTTP Compression.

The reason I noticed this in the first place is that html5shiv.googlecode.com is not properly configured and does not serve JavaScript files using HTTP compression. Reducing the size of content is a well known category of performance optimization and HTTP compression is an easy way to accomplish this. (Well, easy in theory at least).

It’s Not Cached.

Google’s web server instructs the browser (and all shared caches) to only cache this file for 180 seconds. 180 seconds! By the time you are done reading this article, a copy of html5shiv sitting in your cache would have expired.

The whole point of linking to a commonly JavaScript file on a 3rd party instead of hosting it yourself is to use network effects and have that resource already be cached. That’s the entire reason that JavaScript CDNs like Google Libraries API and Microsoft’s (poorly named) Ajax Content Delivery Network exist. Of course it doesn’t always work. Regardless, 180 seconds is way too short to matter even if JavaScript CDNs were useful.

It’s Not the Most Recent Version.

This branch of html5shiv is not actively developed. As Paul Irish mentions, development for html5shiv has shifted to github, under the care of Alexander Farkas and others. You can find the most recent version here: https://raw.github.com/aFarkas/html5shiv/master/src/html5shiv.js.

(And no, don’t link to that either! It’s not minified, and the Content-Type header is wrong so, ironically, Internet Explorer will not execute the JavaScript that’s supposed to help it!)

This new version of the html5shiv has had numerous bug fixes and performance improvements. In addition, if you actually minified this new version of the html5shiv, you’d find it’s also smaller than older one on Google code. In short, the code at html5shiv.googlecode.com is obsolete.

It Says It Supports Range Requests When It Really Doesn’t.

Since we are on the subject, I might as well bring out all the issues. When you request html5shiv.js, the web server responds with an Accept-Ranges header, indicating that it supports HTTP byte ranging (also known as partial responses) as shown in the screen shot below:

This means the server supports conditional requests, allowing you to download just a small portion of the file. This is helpful when the client has already downloaded some of the response, such as when recovering from a network error. As I wrote in a previous post, supporting partial responses is good for performance. Only googlecode.com doesn’t support partial responses. I used REDBot (which is awesome) to detect this issue, and you can see the results here. If you send a request and specify a byte range using the Range header, the web server still gives you the full response.

The Biggest Problem of all

While these are important issues, they are really just a symptom of a much bigger problem: You’re linking to a source code repository.

Think about that for a second. You are directly linking to a resource in an external source code repository, which you do not control! Giant sirens and flashing lights should be going off in your head.

First of all, because this is a source code repository, the contents could be updated at any time. This is why there is not a far-future Expires header. To function properly, this URL can never have a far future Expires header. This is most likely the reason that byte range requests fail as well. The content may change, and, even though Last-Modified and ETags could be used, the web server may not be able to determine whether it can return a subset of the (potentially new) response body. So, by the very nature of being a code repo, you have two performance optimizations that simply can never happen.

Secondly, this is a source code repository! How many times have you accidently checked in code to a repository that broke the build? I’ve done it lots of times. In fact, it’s so common, that many departments have funny forms of punishment when a developer breaks the build, like forcing them to wear an embarrassing hat or shooting them with automatic NERF machine guns (Skynet is nigh!)

(And the real reason this is an embarrassing hat? Comic Sans MS.)

Sure, this problem exists whenever you link to an external resource beyond your control. But it’s not like we are talking about linking to the Google Analytics JavaScript file here. A simple mistake or typo when using SVN and BOOM! Someone at Google didn’t just break the build, they broke the Internet.

Marco Arment, the creator of Instapaper, has a great podcast, Build and Analyze, over at 5by5. Sometimes he and Dan even talk about development, so I highly recommend it. One of the things he has advocated repeatedly on the show (including the most recent episode), is to always avoid external dependencies you cannot control. Direcly linking to a resource in someone else’s code repository is a perfect example of an external dependency that can easily be, and should be, avoided.

Finally, at the end of the day, we are dealing with a library that is 3854 bytes in length. The new version is 2337 bytes. When served with compression it’s only 1166 bytes. Why on earth are you linking to a 3rd party to serve 1166 bytes? In the time it takes your browser to do the DNS lookup on the html5shiv.googlecode.com hostname, you could have already transmitted the file! Creating an HTTP connection to a 3rd party to download 1166 bytes is just silly and wasteful.

The Moral of the Story

The moral of the story is: “never never never link to a resource inside a source code repository”. This includes your own source code repository and 3rd party ones like googlecode or github. The characteristics inherent in a source code repository are fundamentally at odds with frontend performance best practices. Breaking code in a source code repository or version control system is much easier due to its dynamic nature. Additionally, you should not be exposing internal infrastructure systems like code repositories, version control systems, or continuous integration systems to the public internet. This is highly dangerous and increases your attack surface.

The Moral of the Story (part 2)

The other moral of the story is one I’ve written about before: “Trust, but verify”.

Top search results will tell you to link to the copy of html5shiv.js hosted on Google’s servers. Wikipedia includes a code snippet which links to this file. And you might not think that’s a bad idea. After all this is Google we are talking about! “Let’s make the Web Faster” Google! The creator of awesome performance technologies like PageSpeed and mod_pagespeed and SPDY. The employer of frontend performance greats like Steve Souders, Patrick Meehan, and Mike Belshe. That Google should know what they are talking about.

Well, that Google still got 4 things wrong trying to serve you a JavaScript file that’s less than 4 kilobytes in size. Now I’m not trying to belittle or make fun of Google. My point is that “just because” it’s from Google, doesn’t mean its right. For that matter, “just because” it’s from Zoompf doesn’t make it right. People make mistakes. As the Buddha once said:

Believe nothing, no matter where you read it, or who has said it, not even if I have said it, unless it agrees with your own reason and your own common sense.

You can trust, but verify.

Conclusions

All of this was discovered because of a JavaScript file that was served without HTTP compression. You never know what even a minor performance issue will lead you to discover. This is where Zoompf can help you. Our performance scanner tests your web application for nearly 400 issues affecting web performance. You can get a free performance scan of you website now and take a look at our Zoompf WPO product.

Comments

    May 11, 2012 at 5:55 pm

    So the google code repos were out there early and have now become a distribution point managed by Remy. The code is developed on Github, but when we do a release of html5shiv, the new version goes into both Modernizr (baked into core) and the google code projects. So they shouldn’t be out of date. (But pulling from the github project will certainly be the latest version).

    Right now the google code projects (and Modernizr) need html5shiv v3.5 final. Alexander Farkas, the html5shiv maintainer has been doing great work battling many issues. One thing to point out is that html5shiv now includes the innerShiv technique that currently required handwork when innerHTML’ing into the DOM. Now it’s all just handled automatically. And it works across the entire web. Pretty impressive. :)

    Anyway,

    I 100% agree about not linking to Google Code. I mean, I’m not even convinced a proper CDN would be good either.

    I’ll talk to Remy about changing the text on the html5shiv project page to recommend self-hosting.

    May 11, 2012 at 6:25 pm

    Great to hear! I agree about you concerns over using a CDN. At the end of the day, this is < 1200 bytes when minified+gzip. The overhead of a new HTTP connection (DNS lookup, TCP handshake, possible SSL negotiation, TCP slow start) all to download 1200 bytes seems like a waste.

    May 12, 2012 at 8:49 pm

    Paul, if the request volume is high enough, we could actually look into adding html5shiv to our hosted libraries.

    Billy, to your point about size.. That’s exactly why mod_pagespeed defaults to inlining CSS + JS resources under 2000 bytes by default (and you can raise that limit). This works for local and remote resources.

    May 12, 2012 at 9:37 pm

    Ilya,

    By “remote sources” do you mean mod_pagespeed will inline JS/CSS files even on website’s you don’t control? As in, I use mod_pagespeed on an Apache box at example.com, and link to a JS analytics library on foobar.com using a standard <SCRIPT SRC>. Are you saying mod_pagespeed would inline that? When does it do this analysis of 3rd parties to determine what to inline?

    I am not convinced of the performance value of Google hosted JS libraries. When we looked back in 2010, the network effect just wasn’t there yet. And more recent research (http://statichtml.com/2011/google-ajax-libraries-caching.html) shows this is still the case. Serving jQuery from you own site seems to be the best option right now.

    May 13, 2012 at 5:17 am

    Billy, looks like part of your message gut cut off, but I think the answer is yes. Assuming the script or CSS you’re referring to provides correct caching headers, and is below your configured inlined bytesize limit, then we will inline it directly into the page.

    You can configure the bytesize limits for both CSS and JS independently.

    September 25, 2012 at 10:21 am

    I’m a bit late to this party, but cdnjs.com hosts html5shiv at //cdnjs.cloudflare.com/ajax/libs/html5shiv/3.6/html5shiv.min.js

    September 25, 2012 at 2:58 pm

    Thanks for the feedback. With this article I was trying to point out the specific problems of linking to a file in a source code repository. But there is a larger question, should you be using a JavaScript library CDN at all? (Note I’m not talking about general CDNs like Akamai, but specific CDNs of shared, common JavaScript libraries for everything one link to).

    The answer to that question, at least right now, is now. JavaScript CDNs do not improve performance, because they rely on the network effect and right now there is not enough critical mass. See these excellent articles for specifics about why JavaScript CDNs don’t work:

    http://statichtml.com/2011/google-ajax-libraries-caching.html

    http://zoompf.com/2010/01/should-you-use-javascript-library-cdns

    f055 Reply
    May 12, 2012 at 2:23 pm

    IMHO using Modernizr shiv + print shiv is the easiest solution and should be wildly advertised. Modernizr building script forces you to download it instead of using some repo links. Kind of scary a lot of frameworks (Boilerplate, Bootstrap, Foundation) are/were using the repo link…

    May 12, 2012 at 3:07 pm

    Sorry if I missed something, but are you saying never to link to the raw.github link because its an active source code repository, whereas linking to “//ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js” might be more acceptable because there’s an expectation that a version is not going to change?

    May 12, 2012 at 9:44 pm

    Linking to a code repo is bad because as I its doesn’t have (among other things, a far-future caching header).

    Shared libraraies on Google’s ajax.googleapis.com domain *do* have far future caching headers, but you still should not link to them. Our research back in 2010 and more recent research from late 2011 shows that a vistor coming to your site rarely has a library from ajax.googleapis.com in there cache.

    This means that downloading JS libraries for ajax.googleapis.com actually hurts performance. This is because it takes longer to do a DNS look-up, make a TCP connection, and to then make an HTTP request and get a response than simply serving the file from your own website. The only advantage is banhdwidth savings, but in most cases pages will take longer to load.

    Orlando Reply
    May 31, 2012 at 12:14 pm

    Which is why we should *all* be using the same CDN, so in the future they all have it. Using a CDN makes the Internet faster (still not a repo, though)

    May 12, 2012 at 3:59 pm

    I prefer Modernizr myself, too…I’m curious though — I’ve always heard it recommended to serve jQuery from Google precisely because it *would* improve load times if the user already had it cached from another site…do these same problems apply to the Google jQuery library or is that a different setup?

    May 12, 2012 at 9:47 pm

    Rosalind,

    Web interfaces to code repositories and version control systems have there own performance problem, just based on how they function. So definitely don’t link directly to anything in something like github or googlecode.

    Now, serving JavaScript libraries like jQuery from so-called JavaScript CDNs like ajax.googleapis.com has its own set of problems. The performance advantage of using a JavaScript CDN is that some other site will have also referenced the library you are using, and so it will be in a visitors cache already.

    Our research back in 2010 and more recent research from late 2011 shows that a vistor coming to your site rarely has a library from ajax.googleapis.com in there cache. This means that downloading JS libraries for ajax.googleapis.com actually hurts performance, because it takes longer to do a DNS look-up, make a TCP connection, and to then make an HTTP request and get a response than simply serving the file from your own website. The only advantage is banhdwidth savings, but in most cases pages will take longer to load.

    May 13, 2012 at 4:23 am

    Good to know. :) Thanks for the response!

    May 15, 2012 at 3:28 pm

    Like Rosalind, I too have worked on the opinion CDN’s are good are the best option for items such as js libraries. Your research is interesting and has made me rethink my approach.

    Is it possible to check if the user has a CDN hosted library (eg. jQuery v. 1.7.2) cached and to then serve that library or the locally hosted version if the library isn’t already cached?

    May 15, 2012 at 3:37 pm

    No. There is no mechanism for a website or web page to detect if a resource is already stored in the browser’s cache. This is by design and very important. If website’s could detect items in the cache, it would be a massive privacy violation, as they could determine what other websites you have visited.

    May 12, 2012 at 5:33 pm

    [...] html5shiv and Serving Content From Code Repositories | Zoompf [...]

    May 12, 2012 at 9:07 pm

    [...] html5shiv and Serving Content From Code Repositories – [...]

    CRS Reply
    May 13, 2012 at 11:03 pm

    Also, don’t make your bold text the same colour as your hyperlinks!

    May 14, 2012 at 7:48 am

    Great read about a topic which hasn’t been explored yet. Thanks!

    I just wrote a follow-up article on the same subject, but from a security standpoint. An expiration period of 180 seconds means your users have three minutes before they get any vulnerability or malicous code that sneaks into a third-party codebase.

    Here’s the article – http://frontend.co.il/articles/hotlinking-to-source-code-repos-is-dangerous

    May 15, 2012 at 3:39 pm

    [...] html5shiv and Serving Content From Code Repositories Never just link to scripts hosted on other domains, because you won’t always get the advantages, such as caching and Gzip. This is explained in detail in this excellent article. Yes, you should definitely read it. [...]

    May 15, 2012 at 5:05 pm

    [...] html5shiv and Serving Content From Code Repositories Never just link to scripts hosted on other domains, because you won’t always get the advantages, such as caching and Gzip. This is explained in detail in this excellent article. Yes, you should definitely read it. [...]

    May 15, 2012 at 5:45 pm

    [...] html5shiv and Serving Content From Code Repositories Never just link to scripts hosted on other domains, because you won’t always get the advantages, such as caching and Gzip. This is explained in detail in this excellent article. Yes, you should definitely read it. [...]

    May 15, 2012 at 8:01 pm

    [...] remember the excellent advice of Rachel Andrew: “Stop solving problems you don’t yet have.”html5shiv and Serving Content From Code Repositories Never just link to scripts hosted on other domains, because you won’t always get the advantages, [...]

    May 16, 2012 at 1:42 pm

    [...] html5shiv and Serving Content From Code Repositories Never just link to scripts hosted on other domains, because you won’t always get the advantages, such as caching and Gzip. This is explained in detail in this excellent article. Yes, you should definitely read it. [...]

    May 16, 2012 at 1:49 pm

    [...] Permalink: http://zoompf.com/blog/2012/05/html5shiv-and-serving-content-from-code-repositories [...]

    May 17, 2012 at 4:58 am

    [...] html5shiv and Serving Content From Code Repositories Never just link to scripts hosted on other domains, because you won’t always get the advantages, such as caching and Gzip. This is explained in detail in this excellent article. Yes, you should definitely read it. [...]

    June 14, 2012 at 6:55 am

    [...] אל תכניסו סקריפטים מאתרי ניהול קוד (כמו Github או google code) – מאמר נחמד שמסביר למה לא נכון לקשר ישירות לcode repo כמו google code או github וגם למה זה דופק את הביצועים, בטח בספריה פופלארית וקטנטנה כמו html5shiv #לחובבי ביצועי רשת. [...]

    July 25, 2012 at 4:03 pm

    [...] html5shiv – Serving Content From Code Repositories [...]

    August 6, 2012 at 5:30 pm

    Actually Compression has become a separate line of study in JS and all other programming languages since its requirements and importance has increased.

    October 6, 2012 at 9:11 am

    [...] de la librairie, impliquant forcement une mise à jour manuelle si besoin était. L’article html5shiv and serving content from code repositories dresse une analyse de cette situation. Quoiqu’il en soit l’utilisation de la librairie [...]


Leave a Reply