D
DevynCJohnson
Guest
Many file compression and archive formats exist, each one with its advantages and disadvantages. GNU/Linux and most Unix systems support most compression and archive formats. It is helpful to know the commands used to manipulate the different formats. This article will at least show the basic/general method of performing a compression and extraction, plus various other tricks.
Archives:
Files that hold other files together as one file and lack compression. Example: Tar
ar - ARchiver (ar) is the predecessor to tar. Debian packages (*.deb) still use ar.
Archive:
Extract:
CPIO - "CoPy In and Out" was once used on magnetic tape drives. CPIO is used today in for simple backups (other uses exist). The tar utility has largely replaced CPIO. The creation of CPIO files is more complicated than other archive formats. By that, I mean, the command is lengthier than other commands. As you read on, you will see that using CPIO is not like other commands.
This may be a little off topic, but RPM files can be converted to CPIO files using the “rpm2cpio” command like this – “rpm2cpio FILE.rpm”. Not all systems have rpm2cpio.
Archive:
"FILE-LIST" may be the echo command with the file paths or a "find" command that will output a list of files.
Extract:
List Contained Files:
Shell Archive (shar) - A shar file is a self-extracting archive that only depends on the sh utility. This file is a plain-text shell script that recreates the files on execution. This format is not secure. If the user desires an archive format to hide the contained files, do not use shar files. Obviously, since this is self-extracting, the file needs to be executed like any other application/script.
Archive:
ISO - An ISO file is a optical disc image or a file that mimics an optical-disc (think of virtual machines).
Archive:
"if=/dev/cdrom" reads the optical disc and "of=/tmp/cdimg1.iso" is where the data is written. The ISO file will contain the disc's data including the filesystem.
Extract:
tar - The tar archive is the most popular archive format for Unix and Unix-like systems. A tarball is a tar file. Most tarballs are compressed with some other compression format (the new file is still called a tarball). The reason for this is most compression formats are unable to compress multiple files together as one file. Also, Tar cannot compress files, so using Tar with a compression format solves such issues. Tarballs have various file extensions as seen below.
Short (Long) – Compression
.taz (.tar.z) - zip
.tbz, .tb2, .tbz2 (.tar.bz2) - bzip2
.tgz (.tar.gz) - gzip
.tlz (.tar.lzma, .tar.lz) - lzma
.txz (.tar.xz) - xz
Archive:
Extract:
List Contents:
Compression Files:
Files that compress data so the resulting file consumes less storage (bytes).
bz2 – Bzip2 offers more efficient compression than zip or gzip. Bzip2 can only compress one file. That means multiple files compresses with bzip2 will produce individual bzip2 files. However, Tar is used to archive the files into one (the *.tar file) and Bzip2 compresses this file. That is why a tar extension is on most bzip2 files. The “bzcat” command can be used on Bzip2 files the same way “cat” is used on regular files.
Compress:
Extract:
Freeze (F) – Freeze is a compression format that can only compress one file (just like Bzip2). To view the contents of a file (like a text file) compressed as Freeze, use the “fcat” command - “fcat FILE.f | less”
Compress:
NOTE: The "-x" makes the compression better at the cost of speed. This parameter can be removed if desired.
Extract:
gz – GNU Zip (Gzip) is a popular compression format that is faster but less efficient than Bzip2. Like Bzip2, Gzip can only compress one file. This is why Tar is often used with Gzip. Gzip was made to replace the “compress” utility/command in earlier Unix systems. To view the contents of files inside of Gzip files like using “cat” on other files, use the “gzcat” command. Gzip is the most popular compression utility in Gnu/Linux, so it is highly recommended that user familiarize themselves with Gzip's commands.
Compress:
Extract:
lzma - The Lempel–Ziv–Markov chain algorithm (LZMA) compresses data better than Bzip2 and Gzip, but is slower than those two algorithms. Like Bzip2 and Gzip, LZMA can only compress one file which is why Tars are used when compressing many files (the *.tar counts as one file). The command “lzcat” acts like “cat” by showing users the contents of the compressed files.
Compress:
Extract:
TIP: Make an alias for lzma to make it easier to use - “alias lzma='xz --format=lzma '”. With this alias, users can type “lzma” instead of “xz --format=lzma”.
NOTE: Using the different compression cat-like commands (such as gzcat and others) even works on tar-containing files (like *.tar.lzma).
xz - XZ uses the LZMA2 algorithm thus making it better than LZMA. Again, like other compression formats, to compress multiple files, use Tar and then XZ. The “xzcat” command is a cat-like command that allows users to view the contents of the compressed files.
Compress:
Extract:
Compression + Storage Files:
Files that store and compress files. These formats do not require tar when compressing multiple files into one file.
7z – The 7z compression format was first used in an application named “7-Zip”, hence the filetype's name. 7z can compress multiple files together without the need of tar. 7z does not store file permissions, unlike other compression formats.
Compress:
Extract:
apk - Android installation packages are apk files. Android uses *.apk files just as Debian uses *.deb and Fedora uses *.rpm. APK files are created in the Android developer's IDE.
arc – Arc does not require multiple file be merged together (tar) to be compressed.
Compress:
Extract:
jar - Java classes (compiled Java-code files) are held together using a Jar file. Jar is not intended to be used as storage. It is meant to be an application file containing executable code. Jar files can also be used as addons/plugins for various applications.
Compress:
Extract:
kgb - This compression archive uses the PAQ6 algorithm to gain high compression ratios. Self-extracting archives and AES-256 encryption is supported.
Compress:
rar – The Roshal Archive (RAR) is a commonly seen format, especially among Windows users. RAR is not open-source, but it is free. RAR supports password protection.
Compress:
Extract:
xar - eXtensible ARchive format is open-source archiving format. RPM5 uses xar.
Compress:
Extract:
zip - ZIP is a widely used compression format. ZIP supports password protected archives.
Compress:
Extract:
zpaq - ZPAQ is a special type of compressor. When a newer version of a file is added to a ZPAQ file that contains the older version, ZPAQ performs an incremental update. This means the older version of the file exists in the archive.
Compress:
zz - Zzip is a self-extracting archive. This means the file is “executed” and it will uncompress itself.
Compress:
Archives:
Files that hold other files together as one file and lack compression. Example: Tar
ar - ARchiver (ar) is the predecessor to tar. Debian packages (*.deb) still use ar.
Archive:
Code:
ar rcs NAME_ARCHIVE_FILE FILES
Extract:
Code:
ar -x ARCHIVE.a
CPIO - "CoPy In and Out" was once used on magnetic tape drives. CPIO is used today in for simple backups (other uses exist). The tar utility has largely replaced CPIO. The creation of CPIO files is more complicated than other archive formats. By that, I mean, the command is lengthier than other commands. As you read on, you will see that using CPIO is not like other commands.
This may be a little off topic, but RPM files can be converted to CPIO files using the “rpm2cpio” command like this – “rpm2cpio FILE.rpm”. Not all systems have rpm2cpio.
Archive:
Code:
FILE-LIST | cpio -o > ./FILE.cpio
"FILE-LIST" may be the echo command with the file paths or a "find" command that will output a list of files.
Extract:
Code:
cpio -id < FILE.cpio
List Contained Files:
Code:
cpio -it < FILE.cpio
Shell Archive (shar) - A shar file is a self-extracting archive that only depends on the sh utility. This file is a plain-text shell script that recreates the files on execution. This format is not secure. If the user desires an archive format to hide the contained files, do not use shar files. Obviously, since this is self-extracting, the file needs to be executed like any other application/script.
Archive:
Code:
shar FILES
ISO - An ISO file is a optical disc image or a file that mimics an optical-disc (think of virtual machines).
Archive:
Code:
dd if=/dev/cdrom of=/tmp/cdimg1.iso
Extract:
Code:
mkdir /mnt/iso #make a directory
mount -o loop FILE.iso /mnt/iso #Mount your iso file (FILE.iso) to the new directory
cp * ~/ISO/ #copy all files in the ISO to a folder in $HOME
tar - The tar archive is the most popular archive format for Unix and Unix-like systems. A tarball is a tar file. Most tarballs are compressed with some other compression format (the new file is still called a tarball). The reason for this is most compression formats are unable to compress multiple files together as one file. Also, Tar cannot compress files, so using Tar with a compression format solves such issues. Tarballs have various file extensions as seen below.
Short (Long) – Compression
.taz (.tar.z) - zip
.tbz, .tb2, .tbz2 (.tar.bz2) - bzip2
.tgz (.tar.gz) - gzip
.tlz (.tar.lzma, .tar.lz) - lzma
.txz (.tar.xz) - xz
Archive:
Code:
tar -cf NEW_ARCHIVE.tar FILE
Extract:
Code:
tar -xf ARCHIVE.tar #regular tar
tar -xfz ARCHIVE.tar.gz #compressed -> uncompress -> untar
List Contents:
Code:
tar -tvf ARCHIVE.tar
Compression Files:
Files that compress data so the resulting file consumes less storage (bytes).
bz2 – Bzip2 offers more efficient compression than zip or gzip. Bzip2 can only compress one file. That means multiple files compresses with bzip2 will produce individual bzip2 files. However, Tar is used to archive the files into one (the *.tar file) and Bzip2 compresses this file. That is why a tar extension is on most bzip2 files. The “bzcat” command can be used on Bzip2 files the same way “cat” is used on regular files.
Compress:
Code:
bzip2 -z NEW_ARCHIVE.bz2 FILE
Extract:
Code:
bzip2 -d FILE.bz2 #bunzip2 = "bzip2 -d"
Freeze (F) – Freeze is a compression format that can only compress one file (just like Bzip2). To view the contents of a file (like a text file) compressed as Freeze, use the “fcat” command - “fcat FILE.f | less”
Compress:
Code:
freeze -x FILE #the new file will be FILE.f
NOTE: The "-x" makes the compression better at the cost of speed. This parameter can be removed if desired.
Extract:
Code:
freeze -d FILE.F #melt = unfreeze = "freeze -d"
gz – GNU Zip (Gzip) is a popular compression format that is faster but less efficient than Bzip2. Like Bzip2, Gzip can only compress one file. This is why Tar is often used with Gzip. Gzip was made to replace the “compress” utility/command in earlier Unix systems. To view the contents of files inside of Gzip files like using “cat” on other files, use the “gzcat” command. Gzip is the most popular compression utility in Gnu/Linux, so it is highly recommended that user familiarize themselves with Gzip's commands.
Compress:
Code:
gzip -z NEW_ARCHIVE.gz FILE
Extract:
Code:
gunzip FILE.gz
lzma - The Lempel–Ziv–Markov chain algorithm (LZMA) compresses data better than Bzip2 and Gzip, but is slower than those two algorithms. Like Bzip2 and Gzip, LZMA can only compress one file which is why Tars are used when compressing many files (the *.tar counts as one file). The command “lzcat” acts like “cat” by showing users the contents of the compressed files.
Compress:
Code:
xz --format=lzma NEW_ARCHIVE.lzma FILE
Extract:
Code:
unlzma FILE.lzma
TIP: Make an alias for lzma to make it easier to use - “alias lzma='xz --format=lzma '”. With this alias, users can type “lzma” instead of “xz --format=lzma”.
NOTE: Using the different compression cat-like commands (such as gzcat and others) even works on tar-containing files (like *.tar.lzma).
xz - XZ uses the LZMA2 algorithm thus making it better than LZMA. Again, like other compression formats, to compress multiple files, use Tar and then XZ. The “xzcat” command is a cat-like command that allows users to view the contents of the compressed files.
Compress:
Code:
xz NEW_ARCHIVE.xz FILE
Extract:
Code:
unxz FILE.xz
Compression + Storage Files:
Files that store and compress files. These formats do not require tar when compressing multiple files into one file.
7z – The 7z compression format was first used in an application named “7-Zip”, hence the filetype's name. 7z can compress multiple files together without the need of tar. 7z does not store file permissions, unlike other compression formats.
Compress:
Code:
7za a NEW_ARCHIVE.7z FILES
Extract:
Code:
7za e FILE.7z
apk - Android installation packages are apk files. Android uses *.apk files just as Debian uses *.deb and Fedora uses *.rpm. APK files are created in the Android developer's IDE.
arc – Arc does not require multiple file be merged together (tar) to be compressed.
Compress:
Code:
arc NEW_ARCHIVE.arc FILES
Extract:
Code:
arc x FILE.arc
jar - Java classes (compiled Java-code files) are held together using a Jar file. Jar is not intended to be used as storage. It is meant to be an application file containing executable code. Jar files can also be used as addons/plugins for various applications.
Compress:
Code:
jar cf NEW_ARCHIVE.jar FILES
Extract:
Code:
jar xf FILE.jar
kgb - This compression archive uses the PAQ6 algorithm to gain high compression ratios. Self-extracting archives and AES-256 encryption is supported.
Compress:
Code:
kgb ARCHIVE.kgb FILES
rar – The Roshal Archive (RAR) is a commonly seen format, especially among Windows users. RAR is not open-source, but it is free. RAR supports password protection.
Compress:
Code:
rar a NEW_ARCHIVE.rar FILES
Extract:
Code:
rar x FILE.rar
xar - eXtensible ARchive format is open-source archiving format. RPM5 uses xar.
Compress:
Code:
xar -cf FILE.xar FILES
Extract:
Code:
xar -xf FILE.xar
zip - ZIP is a widely used compression format. ZIP supports password protected archives.
Compress:
Code:
zip ARCHIVE_NAME FILES
Extract:
Code:
unzip ARCHIVE_NAME
zpaq - ZPAQ is a special type of compressor. When a newer version of a file is added to a ZPAQ file that contains the older version, ZPAQ performs an incremental update. This means the older version of the file exists in the archive.
Compress:
Code:
zpaq ARCHIVE_NAME FILES
zz - Zzip is a self-extracting archive. This means the file is “executed” and it will uncompress itself.
Compress:
Code:
zzip ARCHIVE_NAME FILES
Attachments
Last edited: