April 14, 2010

Smashing Roaches

Since releasing our 100% online web performance scanning service 3 days ago we have checked over 1600 websites for performance defects. This is awesome beyond words! In the course of scanning those 1600 websites, we found some bugs in our own technology was well. That is not so awesome, but it is kinda funny. In this post I’ll share with you some of the issues that came up over the last few days.

Cartoon picture of a software bug

Hosting Provider Failure

When you submit a website for scanning to Zoompf, our Zoompf.com website contacts our scanning server and adds the job to the queue. When the scanning server is done with a job it uploads the results to Zoompf.com. While you wait, looking at a spinning progress bar, behind the scenes the web page is using Ajax to see if the report has been uploaded to the website. Unfortunately, our scanning server was uploading so much report data so quickly, it exceeded some settings our hosting provider for Zoompf.com had set. And so our hosting provider had the brilliant idea to just black list the IP address of our scanning box. When the scanning server tried to upload the results, our hosting provide kept terminating the connection so we had no way of uploading the reports onto the website. Worse, this was only a one way block! Job requests could get sent from Zoompf.com to the scanning server, but reports coming back could not be uploaded! This was the main cause of the outage we had yesterday afternoon. Needless to say we have a very stern conversation with our hosting provider so this shouldn’t happen again.

Far Future means FAR!

Sometimes people take performance advice a little to far. Take for example esb-alumni.de. When we assessed the site our performance scanner would crash. Turns out this website is setting a far future date using the max-age directive in the Cache-Control header. The only problem the website tells browser to cache resources for 316,224,000,000 seconds or around 10,000 years! (Here is an example). This threw an exception while trying to calculate the date the resource would expire on! While the HTTP spec doesn’t provide a maximum value for max-age, 10,000 years is a little excessive. Also, as Eric Lawrence blogged, IE and Opera can’t handle values larger than 2^31. Luckily Zoompf already had a “malformed max-age” performance check, so we made sure it would flag on max-age values larger than 2^31 seconds. We fixed this bug as of 11:00pm on Tuesday April 11.

XML Nodes of a 3rd kind

Another controlled crash was when processing this atom feed. Zoompf follows feeds and analyzes them for several performance issues. In this case, our code which minifies RSS and Atom feeds did not understand how to process XML Entity nodes. We fixed this bug as of 11:00pm on Tuesday April 11.

Run Away Crawl

At its core Zoompf uses a web crawler to find and fetch web resources to analyze for performance problems. Most people don’t know this because we artificially limit how much gets crawled for our free service. However, like any web crawler, there is a danger of the crawler getting stuck in loops and endlessly requesting pages. While we have built several safe guards into the crawler, a few scans on Sunday and Monday showed a new problem. The problem occurs on websites that are missing a resource. Zoompf tries to fetch a CSS file, say http://example.com/foo/style.css, and we get a 404 error page. However, on this error page is a relative URL to another CSS file. The relative URL is foo/style.css, which corresponds to the full URL http://example.com/foo/foo/style.css. Of course, this CSS file doesn’t exist, and returns a 404 when we request it, and that 404 response contains anoter relative URL to a CSS file that resolves to http://example.com/foo/foo/foo/style.css. You see where this is going (Here is an example of that behavior). Our scanner was crashing when the URL for a request would grow to such a ridiculous length it exceeded the column size in our scan database. We fixed this bug as of 11:00pm on Tuesday April 11.

Closing

The Internet can be a fairly dirty place that’s full of surprises, both in terms of its structure and its content. The Zoompf team will continue to fix these issues as they come up and keep you informed. Thanks for all the feedback and support and enjoy our free performance scanning service.

March 8, 2010

META Refresh Nullifies Caching for IE6 and IE7

There has been some interesting discussion recently on the mailing list for Google’s Page Speed performance tool. Brian Brophy rediscovered a critical performance bug in Internet Explorer that Joseph Smarr had found nearly 3 years ago. Both Internet Explorer 6 and 7 are affected by this bug . IE8 is not affected.

To summarize, the bug is this: When a site uses a <META> refresh tag to send the visitor to a URL, IE6 and IE7 treat that as if the user had clicked the “Refresh” or “Reload” button on the browser. This means IE does use any items that are in the cache and instead re-requests everything on that page. In short, for IE6 and IE7, a <META> refresh will nullify any HTTP caching.

The word "META" written on a luggage tag

Its best to see an example. Let’s say we have a page, start.html, which contains a <META> refresh tag that redirects to main.html. The <META> Refresh tag looks like this <META http-equiv=”Refresh” content=”0;main.html”> Let’s say main.html has 3 images on it. All of those images are served with a far future Expires header. This means repeat visitors should have all 3 images referenced by main.html cached. Here is what happens:

  • The visitor clicks a link to start.html.
  • start.html uses a <META> refresh to send the visitor to main.html.
  • Visitor’s IE browser fetches main.html.
  • Visitor’s IE browser does not use the cached images. Instead it sends 3 conditional GET requests to the web server for the 3 images with If-Not-Modified headers.

There were already several reasons not to use a <META> tag to perform a refresh. Zoompf Check #99 (one of the first checks we wrote) flags on web pages that used <META> tag for redirects. Originally we flagged META refreshes because of it was a bloated and oversized solution as well all the problems <META> refreshes cause with web crawlers and accessibility. Zoompf’s remediation advice was to use an HTTP redirect and we flagged this as a low severity issue. In light of these IE performance problems, we have changed the severity to a high (which is the same severity as not using caching at all).

Want to see what performance problems your website has? META Refresh Tag Used As Redirect is just one of the 300+ web performance issues Zoompf detects when scanning your web applications. Get your instant free web performance assessment at Zoompf.com today!