Since releasing our 100% online web performance scanning service 3 days ago we have checked over 1600 websites for performance defects. This is awesome beyond words! In the course of scanning those 1600 websites, we found some bugs in our own technology was well. That is not so awesome, but it is kinda funny. In this post I’ll share with you some of the issues that came up over the last few days.

Hosting Provider Failure
When you submit a website for scanning to Zoompf, our Zoompf.com website contacts our scanning server and adds the job to the queue. When the scanning server is done with a job it uploads the results to Zoompf.com. While you wait, looking at a spinning progress bar, behind the scenes the web page is using Ajax to see if the report has been uploaded to the website. Unfortunately, our scanning server was uploading so much report data so quickly, it exceeded some settings our hosting provider for Zoompf.com had set. And so our hosting provider had the brilliant idea to just black list the IP address of our scanning box. When the scanning server tried to upload the results, our hosting provide kept terminating the connection so we had no way of uploading the reports onto the website. Worse, this was only a one way block! Job requests could get sent from Zoompf.com to the scanning server, but reports coming back could not be uploaded! This was the main cause of the outage we had yesterday afternoon. Needless to say we have a very stern conversation with our hosting provider so this shouldn’t happen again.
Far Future means FAR!
Sometimes people take performance advice a little to far. Take for example esb-alumni.de. When we assessed the site our performance scanner would crash. Turns out this website is setting a far future date using the max-age directive in the Cache-Control header. The only problem the website tells browser to cache resources for 316,224,000,000 seconds or around 10,000 years! (Here is an example). This threw an exception while trying to calculate the date the resource would expire on! While the HTTP spec doesn’t provide a maximum value for max-age, 10,000 years is a little excessive. Also, as Eric Lawrence blogged, IE and Opera can’t handle values larger than 2^31. Luckily Zoompf already had a “malformed max-age” performance check, so we made sure it would flag on max-age values larger than 2^31 seconds. We fixed this bug as of 11:00pm on Tuesday April 11.
XML Nodes of a 3rd kind
Another controlled crash was when processing this atom feed. Zoompf follows feeds and analyzes them for several performance issues. In this case, our code which minifies RSS and Atom feeds did not understand how to process XML Entity nodes. We fixed this bug as of 11:00pm on Tuesday April 11.
Run Away Crawl
At its core Zoompf uses a web crawler to find and fetch web resources to analyze for performance problems. Most people don’t know this because we artificially limit how much gets crawled for our free service. However, like any web crawler, there is a danger of the crawler getting stuck in loops and endlessly requesting pages. While we have built several safe guards into the crawler, a few scans on Sunday and Monday showed a new problem. The problem occurs on websites that are missing a resource. Zoompf tries to fetch a CSS file, say http://example.com/foo/style.css, and we get a 404 error page. However, on this error page is a relative URL to another CSS file. The relative URL is foo/style.css, which corresponds to the full URL http://example.com/foo/foo/style.css. Of course, this CSS file doesn’t exist, and returns a 404 when we request it, and that 404 response contains anoter relative URL to a CSS file that resolves to http://example.com/foo/foo/foo/style.css. You see where this is going (Here is an example of that behavior). Our scanner was crashing when the URL for a request would grow to such a ridiculous length it exceeded the column size in our scan database. We fixed this bug as of 11:00pm on Tuesday April 11.
Closing
The Internet can be a fairly dirty place that’s full of surprises, both in terms of its structure and its content. The Zoompf team will continue to fix these issues as they come up and keep you informed. Thanks for all the feedback and support and enjoy our free performance scanning service.


Hello,
is the web crawler you’re using closed source or not ?
When i want to get all web page content by command line i do wget -px url which act very close as a web browser. (need recent version)
Thanks for sharing experiences !
Vincent.
Our technology is not open source.