From Performance Issue to Bug to 3 New Features
“It has to be a bug” I thought. “This makes absolutely no sense!” We were about to go live with a major new product, and I had just found what looked like a show-stopping bug. But it wasn’t a bug. In fact, it was something so odd and cool, I ended up creating 3 new features for the Zoompf Alerts Beta and Zoompf WPO. This is the story of how a performance defect on the BBC’s website became a bug and then became 3 new features.
An Odd Finding
When we launched our Zoompf Alerts Beta, we created some demo accounts so that people could use Zoompf Alerts without signing up. For our demo accounts, we choose the BBC, NewEgg, and The Weather Channel. I didn’t look at these demo accounts very closely. I mainly watch our own internal test accounts, which are running scans against known sites with specific known issues. This way we can detect regressions or bugs on our performance scanning logic. So, beyond checking to see that they at least had some issues to explore when I initially created the demo accounts, I didn’t really investigate any of the findings.
That is, of course, until the day before we went live. We were fixing a bunch of little presentation bugs and just kind of grinding through the Zoompf Alerts interface, looking for any sharp corners or rough edges that needed to be cleaned up before going live. That’s when I took at closer look at the BBC’s demo account (it’s a public account, so you can look too) and saw this:
Two things are very strange about the top issue. First, it is a response that is missing HTTP compression and which could be over 99% smaller. HTTP compression usually saves around 60-75%. This is because while text contains a lot of redundancy that can be losslessly compressed, there are some parts of the content that are unique and diverse. Compression savings over 80% is very unusual, and would imply the entire response that is very similar, like a HTML document consisting of nothing but 100 KB of the letter ‘A’.
The second strange thing is the URL. Specifically, the file has a
jpg file extension. JPEG images files are natively compressed. You don’t want to serve them with HTTP compression; you are just wasting CPU compression on something that is not compressible, and you can actually make the response larger due to the overhead of GZIP. In fact, our free report and Zoompf WPO will specifically flag an issue, Compressed Content Served with HTTP Compression, if you try to do this.
Clicking into the response, we see more confusing info:
The image is served with a
image/jpeg MIME type. The size is also 484 KB, which is huge given the dimensions of the image. The browser was rending the image just fine as well.
At this point I was worried. Why were we flagging an performance defect on something that couldn’t possibly have that specific performance defect? Was this just a bug in the UI? Did we have a deep bug in our flagging logic or in how we stored relationships between items and defects in the database? We were about to go live, and I had found what looked like a show stopping bug.
A bloated file in JPEG clothing
I decided to look more closely at this image. So I downloaded it, and opened it in a hex editor, and this is what I saw:
This is a BMP image! You can tell my the
BM prefix that acts as the magic number. The BMP image format dates from the late 80’s and is not natively compressed. See all those repeating sequences of
00 00 99 FF? That’s the raw, uncompressed pixel data! It is stored in the form of
BB GG RR AA, which has the RGB value for the pixel, and the alpha channel. In this case, the red color of the BBC logo has the HTML notation of
#990000, and the color is completely solid, so the alpha is set to 0xFF for fully opaque.
This why the file compresses so well! Beside a short header, the vast majority of this file is nothing but repeated sequences of
00 00 99 FF for all the red in the logo!
Well I solved the mystery. Zoompf Alerts Beta is correctly identifying an response that is not natively compressed and which should be served with HTTP Compression. Clearly the BBC should not be using a BMP image on their website (they are using as their OpenGraph image, so if you share BBC stories on social networks this is the image that gets used). However, most importantly, this issue exposes a problem with our user interface: There was no easy way to understand this is a BMP image. With the existing UI, it looks like this is a JPEG and that Zoompf has a bug. I had to download an image and load it in a Hex editor to figure out this wasn’t a real JPEG and that this wasn’t a bug. I needed to expose this information somehow.
Three New Features
Ultimately, the BMP-as-a-JPEG image on the BBC’s website lead my to add 3 new features to both Zoompf Alerts and Zoompf WPO.
Displaying Detected File Format
First, we now display the detected file format for a given response when displaying the response details. This tells the user exactly what kind of file they are dealing with.
Zoompf, just like a web browser, uses a magic number database to detect file formats. In fact, we detected that this image was a BMP, and our internal data files have a flag that lets the Zoompf scanner know that the BMP file format is not natively compressed. This is why our scanner flagged the missing HTTP compression issue in the first place. For all responses that are not natively compressed, we check to see if the it was sent with HTTP compression. If not, we flag the issue.
Before, the only indication of file format a user saw was a
Content-Type: image/jpeg response header, and the file extension in the URL. Internally, we know this is a BMP. image It was simply a matter of exposing this detected file type to the user.
Warn on File Format Mis-match
Second, we will display a warning when the detected file format is different than the file extension or the MIME type.
In the case of the BBC. The problem was the file format was a BMP image, but the file extension and the MIME type said it was a JPEG. While our first feature tells the user the detected file format, we wanted to include a separate visual warning to let the user know the file format is different than what they might expect. It looks like this:
Finally, I added the ability to view a response as a hex dump. I did this for 2 reasons. I wanted to preempt any support requests from users telling us we got the file format wrong: “This is a JPEG, because it ends with
.jpg!” This way we could, from inside the product, show them the response bytes and definitely prove what type of file it is.
Second, a hex view is a pretty cool way to visualize bloat inside of an image. As seen above, the massive amount of redundancy in the raw pixel data is immediately apparent when looking at the hex view. Embedded meta data is also visible. I showcased these feature quite a bit in an earlier blog post, Visualizing image optimizations with hex editors and strings.
It’s great to have a job where you see things that still surprise you. I started with what I thought was a bug. It turned into a legitimate performance problem with the BBC’s website. And that caused me to add 3 new features to our product to better showcase what was happening.
Remember, just because an image has a
jpg file extension, that doesn’t mean its actually a JPEG. There are a number of image formats that are unsuitable for use on the web so make sure you are using the proper formats. And remember, a hex editor can be a great way to inspect an image, both to determine it’s file type, and to see unnecessary data that is bloating file size.
Do you care about finding things like unoptimized or unsuitable images on your website? Then you’ll love our new Zoompf Alerts beta. Zoompf Alerts continuously scans your website through the day, looking for specific front-end performance issues, and alerts you when new problems are introduced. We just launched the public beta of Zoompf Alerts and you can join Zoompf Alerts now for free!