Binary file, wc and pipes

Violett

New Member
Credits
0
I have a binary file. (backup)
My task is to list the number of names in the file using wc and piping.
I have researched various ways of doing so but none of which provide the desired output.
Following a numbered output, I am to list the first 5 and last 5 of the names.

ls -l of file:
-rw-r--r-- . 1 root root 3252373504 Jan 21 22:04 /mnt/tape/backup

Currently I have the numbered output:
strings /mnt/tape/backup | wc -w
10123456

The example for the first 5 names is:
lib
lib64
usr/lib64/libgcc_s-4.8.5-20150702.so.1
usr/lib64/libgcc_s.so.1​

I see the result is a list of files that are backed up. However, I have not determined the proper command to specifically display these path names as the backup.



Thank you.
 
Last edited:


atanere

Well-Known Member
Credits
18
Hi @Violett, and welcome to the site!

I'm not a programmer, so I can't help too much... but I don't think you've explained the problem well enough. What kind of "binary file" are you using? For example, a JPG image is a binary file... and maybe the image shows 20 lines of text... but I don't think command line tools can parse the image to extract that text. But again, I'm not a programmer so maybe I am wrong about this, and I'm sure someone will correct me.

I also don't know what you mean by "names".... when you say to "list the number of names" and to "list the first 5 and last 5 of the names." Do you mean "lines" instead? Your example shows the wc -l command which does indeed accurately count lines in a text file, but it seems to give unpredictable results in a binary LibreOffice document (it outputs a number, but it does not match the number of actual lines).

So maybe someone will be better able to help if you can give us a bit more info on your project. Thanks!

Cheers
 

Violett

New Member
Credits
0
Hi @Violett, and welcome to the site!

I'm not a programmer, so I can't help too much... but I don't think you've explained the problem well enough. What kind of "binary file" are you using? For example, a JPG image is a binary file... and maybe the image shows 20 lines of text... but I don't think command line tools can parse the image to extract that text. But again, I'm not a programmer so maybe I am wrong about this, and I'm sure someone will correct me.

I also don't know what you mean by "names".... when you say to "list the number of names" and to "list the first 5 and last 5 of the names." Do you mean "lines" instead? Your example shows the wc -l command which does indeed accurately count lines in a text file, but it seems to give unpredictable results in a binary LibreOffice document (it outputs a number, but it does not match the number of actual lines).

So maybe someone will be better able to help if you can give us a bit more info on your project. Thanks!

Cheers
Thank you,
Your answer suggested a syntax error.
As a result, I found the wc -w command that gave the required result.

I have edited the question to be more clear.
 
Last edited:

atanere

Well-Known Member
Credits
18
Hi again! That does help to clarify the task, but since this isn't my thing, it still takes me a bit to register everything. Sorry. Maybe others will join in soon with better understanding... but until then, I'll just try to muddle along and see if I can learn something here too.

So, you say that wc -w is the correct syntax, but I am only having success with your code on a text file (not on any binary or .tar files that I have tried). The code below gives me an accurate word count:

Code:
strings mytextfile.txt | wc -w
But this output is a number (word count)... and in your case above you show over 10 million "words" in your large backup file. But this code doesn't list the words inside like your example shows the filenames inside the backup. Are you using the strings command alone? (That works for me, but only on a text file.)

Your example of a list of filenames also shows one filename per line... which makes me think that your first syntax of wc -l might count the files more accurately. For example, this list below (in a text file) shows 3 filenames, but it has 10 words. These are verified using the wc - l and the wc -w commands respectively. Your example did not show any examples of filenames with spaces as part of the name which creates this difference in numbering.

My homework assignment.txt
More-homework.odt
How to get the best count.pdf


OK, so if the first step of your task is to number the output, you will choose which of these numbering methods is most appropriate for you.

And now the second step of your task... to list the first 5 and last 5 names (filenames) in the large backup file. Again, I'm back to questioning how you showed the list of the first 5 in the example above? The strings command will list the contents of a text file for me, but I can't get it to cleanly show the contents of a binary or .tar archive.... and maybe this is really where I'm goofing up. But also, at this point, I can't see how to separate the first 5 and last 5 with only the commands that you have offered.... are you permitted to use other commands? Without digging online, this part of your task seems like a job for sed or awk to help you out.

OK, I've rambled on enough... much of this has just been me thinking out loud, but trying to edit for clarity. I'm sure I am not all that clear though. :confused::confused:
 

JasKinasis

Well-Known Member
Credits
7,583
Assuming strings gives you a line for each entry in your binary file and that each entry is a filename then the command:
Code:
strings /path/to/binfile | wc -l
Should tell you how many file-names are in there. Disclaimer: I'm not familiar with the strings command. Currently on holiday, so I don't have access to a Linux terminal... However if the file in question is a text file, I'd just use cat instead of strings.... But for now, I'll just go with strings.

To see only the first or last five entries, you would need to use the head or tail commands.
Here's how you'd use head to see the first five file-names:
Code:
strings /path/to/file | head -n 5
Tail works in exactly the same way and will show the last file-names. I'll leave you to work that out...
 
Last edited:
$100 Digital Ocean Credit
Get a free VM to test out Linux!

Members online


Top