Rezipping Web Resources for Fun and Profit

Posted: November 30, 2009 at 4:48 pm

One large area of web performance optimization is reducing the size of your content. Most people know about obvious techniques like HTTP compression, minifying, or removing extra data from images. However there is one size-reduction technique that does not seem to be common knowledge for most web performance junkies: Rezipping.

zipper

Let us start with a little background. Zip archives consist of multiple compressed files that are package together into a single file. Zip archives are compressed using the DEFLATE compression algorithm. Deflate supports different compression levels from 1-9. These compression levels provides a trade-off between CPU and memory resources used to create the Zip file and the size of the resulting Zip file. Using a higher compression level consumes more resources but you end up with a smaller file. Most Zip programs tend to create Zip archives using a compression level of 5 or 7. While this can be a good trade off as the file is created quickly and is reasonable compressed it will not produce the smallest file possible.

Now all that is well and good. But why should frontend web developers care about Zip file optimization? Simple: Many of the most common files on the Internet are actually Zip files. By creating methods to make smaller Zip files we are actually optimizing multiple different types of web files. Optimizing these files will reduce bandwidth consumption and server load while improving page load times.

These “Files that don’t end in .zip but really are Zip Files” use the Zip file format as kind of a wrapper to collect all the bits and pieces that really make up the file and store them in a single compressed unit. For example, Silverlight applications have a XAP file extension. However Silverlight applications are just a Zip file containing compiled byte code, resources like images and sounds, and other configuration. Java Applets contained in JAR files are Zip files. All of the Microsoft Office’s OOXML documents (DOCX, XLSX, PPTX, etc) are Zip files. All of OpenOffice.org’s ODF documents (ODT, ODP, ODS, etc) are Zip flies. You can rename any of these types of files to “.zip” and open them with any Zip program.

Since all of these common web files are simply Zip files we can optimize them to improve web performance and operational costs. This is where Rezipping comes in. Rezipping is process of recompressing a Zip file to create a smaller file. The process is simple: you take any Zip file, unzip the contents, and then rezip the content at a higher compression level. To accomplish this, I am using the command line version of 7zip. 7zip’s implementation of the DEFLATE compressor is generally considered to compress files better than other Zip programs by 5% to 10%. The process looks like this:

//unzip the contents of the original zip into a temporary directory 7za.exe X original.zip -o"c:\tmp\" //rezip using maximum compression 7za.exe A -mx9 new.zip "c:\tmp\*" To see how much this could help web performance, I download several samples of different types of zip files off of the internet.

Silverlight

NameOriginal Size (kb)ReZipped Size (kb)% improvement
cached – SilverlightApplication1.xap3,9723,8991.84%
Everything-SilverlightApplication1.xap825,801782,5945.23%
Examples.CS.xap4,752,2623,376,41128.95%
GeoReference.xap388,898288,97725.69%
HoldemSimulatorUI.xap1,280,7141,243,9672.87%
ImageGallery_v25_9458063489vC.xap18,22617,5383.77%
SilverlightControl.xap678,995557,79117.85%

On average rezipping reduces a Silverlight application by 12.32%. This is quite good given that XAP files can contain many binary files like images or sounds that will not be recompressed. Some files created from Visual Studio saw an improvement or more than 25%! Also notice that “ImageGallery_v25″ is the Silverlight application used by Bing to change Bing’s background image. This heavily served file could be slimmed by nearly 4% simply be rezipping the XAP file!

Microsoft Excel Documents

NameOriginal Size (kb)ReZipped Size (kb)% improvement
Listedescourselearning.xlsx55,61840,75326.73%
ParticipatingMembers.xlsx170,382123,27527.65%
PartnerReadinessAndTrainingFY09.xlsx26,67321,34919.96%
PermissionTemplate.xlsx22,57015,96929.25%
Presentation_Skills_Providers.xlsx33,09227,14417.97%

On average rezipping Excel files saves about 25%. This makes sense as most Excel spreadsheets contain predominately text and not uncompressable binary data.

Microsoft PowerPoint Documents

NameOriginal Size (kb)ReZipped Size (kb)% improvement
AMP 8.0 Project Kickoff Template v1.2 07102009.pptx112,63796,75314.10%
CL01.pptx1,918,4401,692,78511.76%
CL02.pptx5,872,2285,448,8187.21%
EC2.pptx123,137100,01318.78%
MSDN_Admin_08.pptx2,006,0911,862,4967.16%
SharePoint_Buzz.pptx2,123,7782,040,2343.93%
speedgeeks-20091026.pptx3,408,3653,271,3844.02%
SupportingDistributedTeamwork.pptx2,454,3602,387,2572.73%

On average rezipping PowerPoint files saves about 9%. This can vary widely depending on the number of images that are contained inside the PPTX file as images are not recompressed (more on that in another article).

Microsoft Word Documents

NameOriginal Size (kb)ReZipped Size (kb)% improvement
ASC_3.0_Demo_Image_Release_Notes.docx431,220412,0344.45%
implementationchecklist.docx126,981120,0755.44%
MSCOM_Virtualizes_MSDN_TechNet_on_Hyper-V.docx115,23089,57222.27%
CompProposal.docx25,54821,39516.26%
Web content redline 2009-10-28.docx201,304180,86810.15%
WindowsSharePointServicesDatasheet.docx198,837172,08213.46%

On average rezipping Word documents saves about 12%.

Conclusions

Always Use Rezipping! Stop sending bytes down the pipe you don’t have to! The savings you receive from ReZipping is driven by the contents of the Zip file. Files with a large number of binary objects that will not be compressed (like images) will have a lower improvement. Also note that higher compression levels increase the time and memory to compress data. but they do not increase the time it takes to decompress data. This is because all the work is in finding out what can be reduced during compression, not in recreating the original data during decompression. There is no reason not to use rezipping.

By rezipping your files you can reduce the size of your content. This reduces bandwidth consumption and server load while improving page load times! There is more work to be done. There are a number of web flies that contain raw Deflate streams like Flash files, WOFF font files, SVGZ, and more. All of these could be redeflated using a compression level of 9 and make smaller, faster files. Stay tuned as we investigate this more.

This entry was posted in Optimization and tagged , , . Bookmark the permalink.

Leave a Reply