Lickity Split
Web Performance and Optimization Guidance from Zoompf

Rezipping Web Resources for Fun and Profit

November 30, 2009

One large area of web performance optimization is reducing the size of your content. Most people know about obvious techniques like HTTP compression, minifying, or removing extra data from images. However there is one size-reduction technique that does not seem to be common knowledge for most web performance junkies: Rezipping.

zipper

Let us start with a little background. Zip archives consist of multiple compressed files that are package together into a single file. Zip archives are compressed using the DEFLATE compression algorithm. Deflate supports different compression levels from 1-9. These compression levels provides a trade-off between CPU and memory resources used to create the Zip file and the size of the resulting Zip file. Using a higher compression level consumes more resources but you end up with a smaller file. Most Zip programs tend to create Zip archives using a compression level of 5 or 7. While this can be a good trade off as the file is created quickly and is reasonable compressed it will not produce the smallest file possible.

Now all that is well and good. But why should frontend web developers care about Zip file optimization? Simple: Many of the most common files on the Internet are actually Zip files. By creating methods to make smaller Zip files we are actually optimizing multiple different types of web files. Optimizing these files will reduce bandwidth consumption and server load while improving page load times.

These “Files that don’t end in .zip but really are Zip Files” use the Zip file format as kind of a wrapper to collect all the bits and pieces that really make up the file and store them in a single compressed unit. For example, Silverlight applications have a XAP file extension. However Silverlight applications are just a Zip file containing compiled byte code, resources like images and sounds, and other configuration. Java Applets contained in JAR files are Zip files. All of the Microsoft Office’s OOXML documents (DOCX, XLSX, PPTX, etc) are Zip files. All of OpenOffice.org’s ODF documents (ODT, ODP, ODS, etc) are Zip flies. You can rename any of these types of files to “.zip” and open them with any Zip program.

Since all of these common web files are simply Zip files we can optimize them to improve web performance and operational costs. This is where Rezipping comes in. Rezipping is process of recompressing a Zip file to create a smaller file. The process is simple: you take any Zip file, unzip the contents, and then rezip the content at a higher compression level. To accomplish this, I am using the command line version of 7zip. 7zip’s implementation of the DEFLATE compressor is generally considered to compress files better than other Zip programs by 5% to 10%. The process looks like this:


//unzip the contents of the original zip into a temporary directory
7za.exe X original.zip -o"c:\tmp\"
//rezip using maximum compression
7za.exe A -mx9 new.zip "c:\tmp\*"

To see how much this could help web performance, I download several samples of different types of zip files off of the internet.

Silverlight

Name Original Size (kb) ReZipped Size (kb) % improvement
cached – SilverlightApplication1.xap 3,972 3,899 1.84%
Everything-SilverlightApplication1.xap 825,801 782,594 5.23%
Examples.CS.xap 4,752,262 3,376,411 28.95%
GeoReference.xap 388,898 288,977 25.69%
HoldemSimulatorUI.xap 1,280,714 1,243,967 2.87%
ImageGallery_v25_9458063489vC.xap 18,226 17,538 3.77%
SilverlightControl.xap 678,995 557,791 17.85%

On average rezipping reduces a Silverlight application by 12.32%. This is quite good given that XAP files can contain many binary files like images or sounds that will not be recompressed. Some files created from Visual Studio saw an improvement or more than 25%! Also notice that “ImageGallery_v25″ is the Silverlight application used by Bing to change Bing’s background image. This heavily served file could be slimmed by nearly 4% simply be rezipping the XAP file!

Microsoft Excel Documents

Name Original Size (kb) ReZipped Size (kb) % improvement
Listedescourselearning.xlsx 55,618 40,753 26.73%
ParticipatingMembers.xlsx 170,382 123,275 27.65%
PartnerReadinessAndTrainingFY09.xlsx 26,673 21,349 19.96%
PermissionTemplate.xlsx 22,570 15,969 29.25%
Presentation_Skills_Providers.xlsx 33,092 27,144 17.97%

On average rezipping Excel files saves about 25%. This makes sense as most Excel spreadsheets contain predominately text and not uncompressable binary data.

Microsoft PowerPoint Documents

Name Original Size (kb) ReZipped Size (kb) % improvement
AMP 8.0 Project Kickoff Template v1.2 07102009.pptx 112,637 96,753 14.10%
CL01.pptx 1,918,440 1,692,785 11.76%
CL02.pptx 5,872,228 5,448,818 7.21%
EC2.pptx 123,137 100,013 18.78%
MSDN_Admin_08.pptx 2,006,091 1,862,496 7.16%
SharePoint_Buzz.pptx 2,123,778 2,040,234 3.93%
speedgeeks-20091026.pptx 3,408,365 3,271,384 4.02%
SupportingDistributedTeamwork.pptx 2,454,360 2,387,257 2.73%

On average rezipping PowerPoint files saves about 9%. This can vary widely depending on the number of images that are contained inside the PPTX file as images are not recompressed (more on that in another article).

Microsoft Word Documents

Name Original Size (kb) ReZipped Size (kb) % improvement
ASC_3.0_Demo_Image_Release_Notes.docx 431,220 412,034 4.45%
implementationchecklist.docx 126,981 120,075 5.44%
MSCOM_Virtualizes_MSDN_TechNet_on_Hyper-V.docx 115,230 89,572 22.27%
CompProposal.docx 25,548 21,395 16.26%
Web content redline 2009-10-28.docx 201,304 180,868 10.15%
WindowsSharePointServicesDatasheet.docx 198,837 172,082 13.46%

On average rezipping Word documents saves about 12%.

Conclusions

Always Use Rezipping! Stop sending bytes down the pipe you don’t have to! The savings you receive from ReZipping is driven by the contents of the Zip file. Files with a large number of binary objects that will not be compressed (like images) will have a lower improvement. Also note that higher compression levels increase the time and memory to compress data. but they do not increase the time it takes to decompress data. This is because all the work is in finding out what can be reduced during compression, not in recreating the original data during decompression. There is no reason not to use rezipping.

By rezipping your files you can reduce the size of your content. This reduces bandwidth consumption and server load while improving page load times! There is more work to be done. There are a number of web flies that contain raw Deflate streams like Flash files, WOFF font files, SVGZ, and more. All of these could be redeflated using a compression level of 9 and make smaller, faster files. Stay tuned as we investigate this more.

Comments (0)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment