This post is a follow up to the previous post “Rezipping Web Resources for Fun and Profit.” In that article, we showed that many common web files, such as MS Office documents, Silverlight applications, Java Applets, and more are really just Zip files with a special structure of files inside. By rezipping a file (unzipping the contents and rezipping those contents using a higher compression level) web developers can reduce size of those files by 5-30%!
An obvious, but less useful expansion of rezipping is to extend it to other compression types, namely GZip compressed files or BZip2 compressed files. We can use 7-zip’s command line version 7za to accomplish this. It looks something like this:
//gunzip the file into temporary directory
7za X -tgzip original.gz -o"c:\tmp\"
//regzip using maximum compression
7za A -tgzip -mx9 new.gz "c:\tmp\original" This approach can be extended to BZip2 using “-tbzip2″ switch. I collected a few samples of GZip archives and using rezipping was able to reduce their size by an average of 5.03% as shown in the table below.
| Archive | Original Size(kb) | Rezipped Size(kb) | % Savings |
|---|---|---|---|
| bochs-2.4.2.tar.gz | 4,035,010 | 3,879,123 | 3.863% |
| dojo-release-1.3.2.tar.gz | 2,618,493 | 2,471,078 | 5.630% |
| expsummarytalk.ps.gz | 130,247 | 121,528 | 6.694% |
| httpd-2.2.14.tar.gz | 6,684,081 | 6,420,948 | 3.937% |
Using rezipping on GZip or BZip2 archives is unfortunately less useful and beneficial than on Zip files. This is because so many files that served or downloaded on the web use Zip files as a wrapper. Finding ways to optimize Zip files lets you optimize a dozen other file types on the web. These files are either directly loaded and executed by the browser (like Silverlight or Applets) or are very common downloadable content like documents or presentations. However I know of no web content that uses a GZip file or BZip2 file as a wrapper file. While downloadable programs, source code, or other archives might use GZip or BZip2 you will not find any widely deployed document or content format that uses these as the wrapper file. This limits the usefulness of rezipping GZip or BZip2 archives.
As mention in the last post, one positive note is that while no widely deployed web files use GZip as a wrapper, many files contain raw GZip or DEFLATE streams. Flash files use GZip to compress the contents of the SWF tags. PDF’s uses DEFLATE to compress text streams. This means with a little parsing and some glue code proven tools like 7-zip should be able to be used to reduce the size of other files that are very common on the web today!

