If you have thousands of ZIP or TAR.GZ files to process, needing to extract
one file from
each archive, it's not very specific which file to extract and each file archive involves multiple directories inside -- you would do well to do it in Python. IJS.
Because I just came out of using Debian MATE and stuck using Engrampa as file archiver "middleman" which is a PITA. Out of shelling with a QB64 program Engrampa cooperates, but using the same exact command from a "bash" script doesn't, it comes back with a lame-ass dialog box because the file extension of the file archive is not what it likes. Using 7-Zip instead on Linux is not fun neither. I am a registered user of WinRAR but sadly only the command-line "rar" program is multi-platform and that deals with RAR format only.
Case in point is XRNI and XRNS format files created by RENOISE music creation application. Each is a ZIP file that could hold, in the very least, an XML file and optionally, a "SampleData" folder which contains FLAC or WAV files. The first thing I have said in this post was to get to those audio format files.
But this is just one case and it would require marriage to a module, and extensive knowledge about string parsing in Python. But that could be said for any sane programming language. (I'm sorry but "bash" scripting is not sane in my opinion.)
Code:
[/media/xxxxx/xxxxx/xxxxx]$ 7za l series-fun001.xrni
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Pentium(R) CPU B980 @ 2.40GHz (206A7),ASM)
Scanning the drive for archives:
1 file, 3448170 bytes (3368 KiB)
Listing archive: series-fun001.xrni
--
Path = series-fun001.xrni
Type = zip
Physical Size = 3448170
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2022-01-26 21:48:50 ....A 10712 1099 Instrument.xml
2022-01-26 21:48:51 D.... 0 0 SampleData
2022-01-26 20:18:39 ....A 1411244 734505 SampleData/Sample00 (series-fun307-1).wav
2022-01-26 20:18:45 ....A 1411244 883494 SampleData/Sample01 (series-fun307-2).wav
2022-01-26 20:18:52 ....A 1411244 957150 SampleData/Sample02 (series-fun307-3).wav
2022-01-26 20:19:00 ....A 1411244 870850 SampleData/Sample03 (series-fun307-4).wav
------------------- ----- ------------ ------------ ------------------------
2022-01-26 21:48:51 5655688 3447098 5 files, 1 folders
In the situation where you have a lot of zipped, archive files to decompress/extract - especially if they’re different types of archives, then I’d use dtrx in bash.
dtrx (Do The Right eXtraction) is a Python script that can decompress/extract pretty much any archived format.
dtrx is available as a package in the software repositories of some Linux distributions. If your distribution does not have a native package for it, you can also install it as a Python module via pip.
Once you have dtrx installed, the only other thing you have to ensure is that you have the appropriate command line tools installed for extracting each archive format that you want to be able to manipulate (gunzip, 7z, tar, unrar, xz, bz2 etc etc).
Then it’s a case of using:
Bash:
cd /path/to/destination
dtrx /path/to/archives/*.{zip,tar.gz,tar.xz,tar.bz2,rar,7z}
Where /path/to/destination is the directory you want to extract the archives to. And /path/to/archives is the path to the compressed archives you want to extract.
Add any other archive format types to the comma-separated list between the { and the }.
Then bash will generate a list of file-names of archives in the specified directory and will pass it to dtrx.
And dtrx will use the file-type of each archive in the list, to determine which tool to use to extract each archive.
If you use dtrx, there’s no need to remember all of the fiddly, command line parameters/switches for each individual extraction/decompression tool.
Unfortunately I don’t think dtrx will extract single files from an archive. It only extracts the entire thing.
And as dtrx is Python based, you should also be able to import it as a module in your own Python scripts. So you aren’t limited to only using it in Bash.
In the past, if I’ve got archives in different formats and I only want selective contents, I’ve written bash scripts to use dtrx to fully extract the archives and used standard bash tools/commands to remove any unwanted/unneeded files/subdirectories from each extracted archive, leaving only the files/subdirectories I’m interested in.
Other times, where I’ve got archives that are all of the same type, and the tools for that specific format support extracting individual files/directories. I’ve written bash scripts/long one-liners that use those specific tools to determine whether each archive contains a particular file/files. And if it does, extract them.
Also, if an archive contains nested, compressed archives, or the name of the archive doesn’t match the name of the base directory it contains, dtrx will interactively ask you questions about what it should do. So you may have to answer questions as dtrx extracts certain archives.
But there is a way to specify the action to take upfront, which will fully automate the extraction process, without asking you what to do each time it hits one of these situations. But without looking it up, I can’t remember the exact way to do it.
For extracting a large number of compressed archives, especially if they are in multiple formats, I highly recommend dtrx.