up

Zoompf's Web Performance Blog

Note: Archived Content

This is the archived version of the Zoompf blog. Since our acquisition by Rigor, all our new research and posts on web performance are being published on The Rigor Blog

Detected and Optimizing Mismatched Image Formats

 Billy Hoffman on February 3, 2015. Category: Uncategorized

Serving the wrong image type can really hurt web performance. An improperly saved image can waste bandwidth and delay the page load time. In this blog post, I’ll show how you can use the file and grep commands to quickly, easily, and automatically find mismatched images that look like one image type but are actually another format.

pink zerbas
Image By: crash the rocks

A few months back I wrote about an image on the BBC’s website whose MIME type and file extension said it was a JPEG image but in fact was a 484 KB BMP image. This is a performance issue as BMP images are unsuitable for use on the web.

I got a lot of questions about how the browser could even render this kind of mismatched image, so I a wrote a follow up piece about content detection. In short, most binary files like images contain so-called magic numbers: a sequence of bytes that are unique to a specific file format. Browsers largely ignore the file extension and the MIME type of a response and instead look for different magic numbers to determine what kind of file it is and how it should be rendered.

This status quo is great for the end user. If a designer misnames the file or if the IT team misconfigures the MIME type, a visitor’s browser will still be able to display everything correctly. But unfortunately the browser’s behavior masks the fundamental problem: the BBC wants to use a JPEG but is serving a BMP by mistake. Everything works except that image is about 10x bigger than it needs to be, wasting bandwidth and slowing down page load times.

In my original article I demonstrated using a hex editor to see inside the file to show how it was indeed the wrong image format. While using a hex editor works well for a small number of files, that process is rather technical and doesn’t scale well for dozens or hundreds of images. Luckily there is a solution.

File to the rescue

Most Unix-like systems, including OS X, Linux,and Cygwin on Windows include the file command. file uses a database of magic numbers to figure out what is the actual type of format for a file. You can see the output in the screen shot below when I run file on the contents of a directory.

normal-file-prog

file has revealed a lot of great data. We see Windows binaries, an HTML document, and various images. We also see additional meta data about the files as well, like the dimensions of the images. If you look closely, you will also see the bbc.jpg image is identified as PC Bitmap which is another way to say a Windows Bitmap (BMP) image. So file can help us identify files that were saved incorrectly. But right now, that information is buried. What we want is a way to call attention to files whose file extension does not match the detected file type.

Enter grep the sidekick

To pull out this information, let’s use the grep command. grep is a super handy program that lets you match or filter any input and display the results. For example, you can use grep to filter a text file and only display lines of text that contain the word “awesome”. We can use grep to highlight when the file extension doesn’t match the file type using the following code snippet:

file *.jpg *.jpeg | grep -v JPEG

This runs file on all files with a .jpg or .jpeg file extension and passes the output to grep. We then use grep with the -v option which inverts the matching. Basically we are telling grep to display any line of text which doesn’t contain the word “JPEG”. Since we told file to only look at files that have a common JPEG extension, every line of output should be identified as a JPEG image and contain the word “JPEG”. Our filter means that the only output will be files which have a .jpg or .jpeg file extension, but which are not actually JPEG images. You can see this below:

filter-with-grep

Awsome! That shows us JPEG imposters, but what about other image formats? We can do the same thing. file *.png | grep -v PNG will find .png files that are not really PNG images. file *.gif | grep -v GIF will find .gif files that are not really GIF images. We can execute these commands all at once by separating them with semicolons, like this:

file *.jpg *.jpeg | grep -v JPEG; file *.gif | grep -v GIF; file *.png  | grep -v PNG

Using this approach, I was able to detect 2 mismatched image files, as shown below:

chaining-together

This mismatched image format detector is a great way to quick find problem images on your website. Just run it against you different image directories for your website. It is also a great script to add to a build process like grunt, so you can verify that all of your images are the correct format.

Conclusions

Mismatched image files are very tricky to detect. This is because the browser does what it is supposed to do and renders the image, even if it has the wrong file extension or MIME type. Since the image renders, it is not obvious that there is a performance problem. However, as the BBC was, you could be wasting bandwidth and reducing page load times. Using file and grep we can detect file which were saved in the wrong format.

If you want to make sure your images stay optimized, consider signing up for Zoompf Alerts. Zoompf Alerts monitors your site throughout the day, notifying you if your CSS, JavaScript, HTML or Images ever change in a manner that hurts your website performance. It’s free and you can opt-out at any time!

Comments

Have some thoughts, a comment, or some feedback? Talk to us on Twitter @zoompf or use our contact us form.