James Psota



James Psota

CS222 Assignment 2

11/6/2003

1.

Overview

For this problem, I investigated the performance of some common compression utilities available today. Specifically, I evaluated the following programs in three ways, compression ratio, encoding time, and decoding time:

• bzip2

• gzip (default paramenters)

• gzip –fast (fast encoding (compression))

• gzip –best (best compression ratio)

• compress

Note that I essentially evaluated three algorithms, but tried three different versions of gzip, probably the most popular compression algorithm today. gzip –fast is gzip with parameters set to optimize for encoding speed rather than compression ratio, while gzip –best optimizes for compression ratio with less of an emphasis on speed. I felt looking at these variations would be interesting, and the results of such simulations are in accordance with intuition.

Methodology

I ran these programs on a Pentium III Linux server that already had all of the above programs installed. I wrote a Perl script (see below) that performed the basic commands for such an experiment: it called each compression program on each file in a given directory. I chose to run the code on the following files, where the file size is specified in Bytes.

|81920 |AMStorage.dll* |

|12769128 |binary.exe* |

|437600 |bitmap.bmp* |

|4966 |eps.ai* |

|318671 |jpeg.jpg* |

|83267 |largetext.txt* |

|7369 |png.png* |

|9491968 |powerpoint.ppt* |

|9989028 |schnellbahn.mp3* |

|1521674 |tif.tif* |

|46040458 |wav.wav* |

I felt these files gave a good cross section of different file types for this experiment, as I included both already-compressed and highly redundant file types. After collecting the data an post processing it with some other Perl scripts, I plotted the data using Excel.

Results

I graphed the data in a few ways. First and foremost, I was interested in how well the different algorithms were able to actually compress the data. The following figure shows the compression ratios for each file, for each compression program I used.

[pic]

Figure 1: Compression Ratios

First of all, notice that some of the files achieved significant compression, while others achieved hardly any. Of course, the files that were already compressed (e.g., jpeg, mp3) were barely compressed at all, while other files that were not yet compressed (e.g., tif, txt) achieved a significant compression ratio.

Also, we see that bzip2 performs the best in almost all cases, and does especially well in cases where the data is highly compressible (which is probably where most users care about anyway). On the other hand, compress generally performs the worst.

Also note that the three flavors of gzip that we looked at perform reasonably well, and intuition is upheld in that gzip –best has the best compression ratio of the three variants studied. It’s interesting to note that gzip and gzip –best are very close in performance on compression ratio, which means that the implementers probably tweaked the program carefully already to achieve quite good compression for the default settings.

Finally, note that compress seemingly gives up and doesn’t compress the file if the file’s entropy is above some threshold, as it simply doesn’t return a compressed (*.Z) file when called on such files (e.g., jpeg).

Now, let’s take a look at the encoding times. In Figures 2 and 3, I plot both the raw compression size and the encoding time for each file/compression program pair. I broke up the plots into two, one for small files and one for large files in order to maintain precision on the vertical axes.

[pic]

Figure 2

[pic]

Figure 3

These plots again echo the fact that bzip2 usually produces the smallest file size. However, they also show that bzip2 also takes the longest to encode. Compress and the three variants of gzip took about the same amount of time, as the bars generally level out for those four experiments.

Finally, I looked at the decompression times for the above files. Figure 4 portrays these times.

[pic]

Figure 4

Perhaps this data is better viewed in the following chart:

|file-compression_type |decompression time (s) |

|AMStorage.dll-bzip2.bz2 |0.026 |

|AMStorage.dll-compress.Z |0.012 |

|AMStorage.dll-gzip.gz |0.011 |

|binary.exe-bzip2.bz2 |6.319 |

|binary.exe-compress |0.003 |

|binary.exe-gzip.gz |0.664 |

|bitmap.bmp-bzip2.bz2 |0.103 |

|bitmap.bmp-compress.Z |0.032 |

|bitmap.bmp-gzip.gz |0.027 |

|eps.ai-bzip2.bz2 |0.009 |

|eps.ai-compress.Z |0.007 |

|eps.ai-gzip.gz |0.008 |

|jpeg.jpg-bzip2.bz2 |0.16 |

|jpeg.jpg-compress |0.003 |

|jpeg.jpg-gzip.gz |0.026 |

|largetext.txt-bzip2.bz2 |0.026 |

|largetext.txt-compress.Z |0.012 |

|largetext.txt-gzip.gz |0.012 |

|png.png-bzip2.bz2 |0.011 |

|png.png-compress.Z |0.008 |

|png.png-gzip.gz |0.008 |

|powerpoint.ppt-bzip2.bz2 |4.484 |

|powerpoint.ppt-compress |0.004 |

|powerpoint.ppt-gzip.gz |0.513 |

|schnellbahn.mp3-bzip2.bz2 |4.816 |

|schnellbahn.mp3-compress |0.003 |

|schnellbahn.mp3-gzip.gz |0.599 |

|tif.tif-bzip2.bz2 |0.31 |

|tif.tif-compress.Z |0.057 |

|tif.tif-gzip.gz |0.062 |

|wav.wav-bzip2.bz2 |19.863 |

|wav.wav-compress.Z |4.081 |

|wav.wav-gzip.gz |5.011 |

Once again, we see that bzip2 takes the longest to, while compress and gzip take considerably shorter. In fact, the bzip2 decompression times seem to be larger than those of uncompress and gunzip by an order of magnitude.

Overall, I found bzip2 to be the clear winner in terms of compression ability, but found it to be somewhat slower than the older algorithms.

#!/usr/bin/perl

$origdir = "./files";

$compdir = "./compressed";

$tempfile = "temp";

$outfile = "output";

@comp_cmds = ("compress", "gzip", "gzip --fast", "gzip --best", "bzip2" );

opendir(DIR, $origdir) or die "can't opendir $dirname: $!";

while (defined($file = readdir(DIR))) {

if (($file eq ".") || ($file eq ".."))

{

next;

}

open (OUT, ">>$outfile") or die "can't open file $outname: $!";

print OUT "\n\nFILE: $file\n\n";

close OUT;

foreach $com (@comp_cmds)

{

$thisFile = $origdir . "/" . $file;

$comns = $com;

$comns =~ s/ /_/g;

$destFile = $compdir . "/" . $file . "-" . $comns;

system ("cp $thisFile $destFile");

$command = "time " . $com . " " . $destFile;

print "** $command\n";

open (OUT, ">>$outfile") or die "can't open file $outname: $!";

print OUT "\n\n$command\n\n";

close OUT;

system("$command >> $outfile");

}

}

closedir(DIR);

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download