Applications 23 - Compression with gzip

J

Jarret W. Buse

Guest
Applications 23 - Compression with gzip

With many systems, drive space as well as upload and download times can be important. To help in these areas with file sizes, files can be compressed to make them smaller.

It is easy to think that if files are smaller, they take up less drive space. Small files take less time to upload and download over the Internet as well. It really does seem that small files are better.

So, why aren't all files compressed?

With certain file systems, nearly all files on the drive can be compressed; this is disk compression. Examples are Ext3, Ext4, ZFS and some others. Not only does on-the-fly compression save space, it can make reading files faster. On most systems, the hard drive is slow. The drive reads smaller files and then extracts them by the Central Processing Unit (CPU) making the reads seem faster. When writing, the CPU compresses the files and the drive writes a smaller file. The system can seem faster if you have a fast CPU and enough Random Access Memory (RAM) to manage the disk compression.

What about making compressed files which can stay compressed off of the hard drive such as on a thumb drive? Now, we get to file compression.

There are numerous file compression algorithms. Algorithms are the methods used to compress the file. The algorithm is a mathematical and logical set of instructions to perform on the file contents. Certain bits of information can be common in many files; these bits of information are replaced with a special code. The special code can be mapped back to the full set of information for which it represents.

Such algorithm sets are “gzip” and “zip”. Even though they both contain the same “zip” part, they are not compatible.
The files “gzip” and “zip” are usually found on most systems by default. For this reason, they are commonly used more than other compression types that may actually work better.

NOTE: Even though I mention “zip” it is not covered in this article.

The gzip program has the syntax of:

gzip [OPTION]... [FILE]...

The options are as follows:

  • -c (--stdout) - write on standard output, keep original files unchanged
  • -d (--decompress) - decompress
  • -f (--force) - force overwrite of output file and compress links
  • -h (--help) - give this help
  • -k (--keep) - keep (don't delete) input files
  • -l (--list) - list compressed file contents
  • -L (--license) - display software license
  • -n (--no-name) - do not save or restore the original name and time stamp
  • -N (--name) - save or restore the original name and time stamp
  • -q (--quiet) - suppress all warnings
  • -r (--recursive) - operate recursively on directories
  • -S (--suffix=SUF) - use suffix SUF on compressed files
  • -t (--test) - test compressed file integrity
  • -v (--verbose) - verbose mode
  • -V (--version) - display version number
  • -1 (--fast) - compress faster
  • -9 (--best) - compress better
  • --rsyncable - make rsync-friendly archive

NOTE: With no FILE or when FILE is “-” read standard input.

The “-c” option is used to output the compressed file to standard output, which is by default the screen. If I take a file called “gzip.txt” and compress it to the screen, the command would be “gzip -c gzip.txt”. The contents of the compressed file can be seen on the screen and you can see how much the compressed file has changed from the original.

When you want to “uncompress” the file and get the original files and file structure, you use the “-d” option. To extract the file from the compressed file “gzip.txt.gz”, the command would be “gzip -d gzip.txt.gz”.

NOTE: If the extracted files already exist, you will be prompted to overwrite the existing files.

If you want to overwrite the files without being prompted, use the “-f” option. The overwrite can occur when compressing or uncompressing files. If the “gzip.txt.gz” were being uncompressed and the files must be overwritten, the command would be “gzip -d -f gzip.txt.gz”.

To get help for using gzip, use the “-h” option as in “gzip -h”.

By default, all input files are deleted. This occurs when compressing or uncompressing. When compressing, the files placed into the “.gz” file are deleted when done. When uncompressing, the “.gz” file is deleted when all files are uncompressed from it. To keep the original files, use the “-k” option. For example, to compress the “gzip.txt” file and keep it when done, the command would be “gzip -k gzip.txt”. When done, the “gzip.txt” will remain after the compression is complete.

To list the files within a compressed file, use the “-l” option. To see the contents of the “gzip.txt.gz” file, the command would be “gzip -l gzip.txt.gz”. The output shows not only the files, but the compression ration as follows:

Code:
compressed  uncompressed  ratio  uncompressed_name
  430                  829          51.4%    gzip.txt
Here, you can see the gzip.txt is compressed to 51.4% of its original size.

To see the software license for gzip, use the command “gzip -L”.

When a compressed file is made, the original name and timestamp is used from the file. To change the timestamp to the current time when compressing or uncompressing the file use the “-n” option. For example, “gzip -n gzip.txt.gz”.

If you want the original name and timestamp saved, use the “-N” option. The example is similar to the “-n” option.
If a file is compressed or uncompressed, errors can be generated. To repress the error messages, use the “-q” options. For example, “gzip -q -d gzip.txt.gz” would not show any errors if any are generated.

If you wish to compress all the files in each folder and those within it, use the “-r” option. For example, to compress all files in the folder and sub-folders as well as keep the files of /test, use the command “gzip -k -r test”. Here, if there is a folder called “Test1” and a file called “Test-1.txt” in the test folder, then there would be made a “Test-1.txt.gz” file. Only files are compressed within each folder and sub-folder.

By default, the filename of the file to be compressed is used with a new extension of “.gz” added on to the existing name. If you wish to add a different extension, use the “-SEXT” option where “EXT” will be replaced with the new extension. For example, to add the extension “jwb” the command would be “gzip -s.jwb gzip.txt”. The compressed filename would be “gzip.txt.jwb”. The extension can be more than three letters.

Sometimes, a file can have issues when uncompressing them. The first option is to try to test the integrity of the compressed file to verify the file is complete. To test the file, use the “-t” option. For example, to test the file “gzip.txt.gz” the command would be “gzip -t gzip.txt.gz” If no message is generated, the file integrity is fine.

To get more details with any command you use, use the verbose option of “-v”. When used in creating a compressed file, the information returned is the same as the listing of the files within the compressed archive. The command to use to see the listed details when compressing is “gzip -v -k gzip.txt”. Remember, the “-k” is used to keep the files being compressed.

To get the gzip version number use the “-V” option. The command is “gzip -V”.

For gzip to compress faster or better, use the “-1” to “-9” options. The “-1” option compresses faster, but not as small (on very large files). The “-9” option compresses smaller, but can take longer (on larger files). If you use the “-5” option, you get a mix of the two for faster and better. For small files, use “-1” and for large files, use “-9”.

When using “rsync” to transfer files, gzip can create a file compatible for use with “rsync”. If a file has to be re-transferred because of a change, “rsync” will download only the bytes which have changed to help increase transfer speed. To create the “rsync compatible file, use the option “--rsyncable”. For example, to make “gzip.txt” compatible with “rsync” the command would be “gzip --rsyncable -k gzip.txt”.

Try using gzip and its options to become familiar with it.
 

Attachments




Top