Why are linux man pages often so bad?

C

CrazedNerd

Guest
I'm not discounting them as a resource altogether...sometimes they are your only friend when you're using command line tools...but i'm going to use a specific example just to show you that i'm not just being judgmental and unfair.

So here are the important parts of the man page for the "cut" command, don't worry it's a very short man page in comparison to some others:

Code:
NAME
       cut - remove sections from each line of files

SYNOPSIS
       cut OPTION... [FILE]...

DESCRIPTION
       Print selected parts of lines from each FILE to standard output.

       With no FILE, or when FILE is -, read standard input.

       Mandatory arguments to long options are mandatory for short options too.

       -b, --bytes=LIST
              select only these bytes

       -c, --characters=LIST
              select only these characters

       -d, --delimiter=DELIM
              use DELIM instead of TAB for field delimiter

       -f, --fields=LIST

           -n     (ignored)

       --complement
              complement the set of selected bytes, characters or fields

       -s, --only-delimited
              do not print lines not containing delimiters

       --output-delimiter=STRING
              use STRING as the output delimiter the default is to use the input delimiter

       -z, --zero-terminated
              line delimiter is NUL, not newline

       --help display this help and exit

       --version
              output version information and exit

       Use one, and only one of -b, -c or -f.  Each LIST is made up of one range, or many ranges
       separated by  commas.   Selected  input  is  written in the same order that it is read, and is written exactly
       once.  Each range is one of:

       N      N'th byte, character or field, counted from 1

       N-     from N'th byte, character or field, to end of line

       N-M    from N'th to M'th (included) byte, character or field

       -M     from first to M'th (included) byte, character or field

So reading this in a straightforward and trusting way, you would think that the options "-b" and "-c" actually do something different, but they both just cause cut to print character columns to standard output using either a text file or something else as input. Here is an example, I'm doing both like "cut <whatever> | cat -n" just so you can see the line numbers. Here is the output for "cut -b 1":

Code:
     1  k
     2  e
     3  k
     4  e
     5  k
     6  e
     7  k
     8  e
     9  k
    10  e
    11  k
    12  e

Now, here is the output for "cut -c 1":

Code:
     1  k
     2  e
     3  k
     4  e
     5  k
     6  e
     7  k
     8  e
     9  k
    10  e
    11  k
    12  e

My point being that the "-b" option has NOTHING to do with bytes. The file I'm using has 72 bytes because there are exactly 72 ASCI characters in the file.

Also, consider this from the point of view of someone who has not tested the command:

Code:
  -c, --characters=LIST

              select only these characters

How could anyone understand what that means just by reading it? The "list" here could mean an html list, or something like "bread, butter, oatmeal, bananas...". Such a horrible and cryptic man page! I feel like linux has had more than enough time to clean up its act on this.
 


I use cut quite a bit. Sometimes I use awk to do the same thing. But I prefer cut for legacy reason I guess.

Consider these lines. In a file named "list".

Computer33: 10.92.87.4, 255.255.255.0, 10.92.87.1
Computer86: 10.92.87.12, 255.255.255.0, 10.92.87.1

Now there could but hundred lines or more formatted similarly. But all I want is the IP address of one of these computers.

grep Computer86 list | cut -f2 -d: | cut -f1 -d,

gives me what I want.

Lets say I just wanted the default gateway for one of these computers.

grep Computer33 list | cut -f3 -d,

I tend to use "f" for fields and "d" for delimiter, more than I use b or c.

If I had a computer named Comp1 that would mess up my character count.
Also if I had an IP address like 1.1.1.1 that might mess up my character count.

Of course you could also do this with awk.

awk '{print $2}' list
 
I use cut quite a bit. Sometimes I use awk to do the same thing. But I prefer cut for legacy reason I guess.

Consider these lines. In a file named "list".

Computer33: 10.92.87.4, 255.255.255.0, 10.92.87.1
Computer86: 10.92.87.12, 255.255.255.0, 10.92.87.1

Now there could but hundred lines or more formatted similarly. But all I want is the IP address of one of these computers.

grep Computer86 list | cut -f2 -d: | cut -f1 -d,

gives me what I want.

Lets say I just wanted the default gateway for one of these computers.

grep Computer33 list | cut -f3 -d,

I tend to use "f" for fields and "d" for delimiter, more than I use b or c.

If I had a computer named Comp1 that would mess up my character count.
Also if I had an IP address like 1.1.1.1 that might mess up my character count.

Of course you could also do this with awk.

awk '{print $2}' list
Its a great command and im not disputing that, i completely understand how to use it, especially with fields, it pretty much does the same thing as standard awk.
 
More info:
Code:
info cut
or
Code:
info '(coreutils) cut invocation'
 
@CrazedNerd :
The fact that one byte is the size of a character is purely down to the file encoding and the character set used.

If your file was encoded using multi-byte character encoding, one character could span two or more bytes.

So in that case, reading a single byte will yield only part of the character code. Whereas reading one character will read a single character from the file.

Likewise, if you're parsing through a binary data-file, which could potentially contain data in any kind of crazy format, you may want to read a certain number of bytes at a time. In which case, you'd use the -b option, rather than the -c option.

So if you're reading a double precision float value as binary data on 64 bit architecture, you’ll probably end up needing to read 8 bytes for each individual floating point value.

So -b and -c do NOT do the exact same thing. The only reason it appears that way is because of the encoding of your file and the fact that each character just happens to be 1 byte!
 
@CrazedNerd :
The fact that one byte is the size of a character is purely down to the file encoding and the character set used.

If your file was encoded using multi-byte character encoding, one character could span two or more bytes.

So in that case, reading a single byte will yield only part of the character code. Whereas reading one character will read a single character from the file.

Likewise, if you're parsing through a binary data-file, which could potentially contain data in any kind of crazy format, you may want to read a certain number of bytes at a time. In which case, you'd use the -b option, rather than the -c option.

So if you're reading a double precision float value as binary data on 64 bit architecture, you’ll probably end up needing to read 8 bytes for each individual floating point value.

So -b and -c do NOT do the exact same thing. The only reason it appears that way is because of the encoding of your file and the fact that each character just happens to be 1 byte!
I thought these text editing tools were just meant for text files, can you give me an example of when someone would need to use this on a binary program?
 
Last edited by a moderator:
CrazedNerd wrote:
I thought these text editing were just meant for text files
On the info pages, referred to in post #4, where there is more information on the cut command than appears on the man page, one finds the following:
‘--bytes=BYTE-LIST’
Select for printing only the bytes in positions listed in
BYTE-LIST. Tabs and backspaces are treated like any other
character; they take up 1 byte. If an output delimiter is
specified, (see the description of ‘--output-delimiter’), then
output that string between ranges of selected bytes.

‘-c CHARACTER-LIST’
‘--characters=CHARACTER-LIST’
Select for printing only the characters in positions listed in
CHARACTER-LIST. The same as ‘-b’ for now, but internationalization
will change that. Tabs and backspaces are treated like any other
character; they take up 1 character. If an output delimiter is
specified, (see the description of ‘--output-delimiter’), then
output that string between ranges of selected bytes.
Here one can see that -c and -b will be differentiated by internationalization, which refers to the fact that characters in various character sets other than latin based ones will not necessarily be single byte characters.

It's worth noting that GNU free software utilities have "info" pages to describe those utilities but in order to maintain the consistency with the traditional use of man pages, man pages are written for each GNU utility but they are typically more concise and less explanatory.
 
In relation to using cut on binary files, here's a trivial example for an unknown file:
Code:
[flip@flop ~]$ cut -b 2-5 testfiledunno 
PNG
The file is a png file according to this output. The command: file testfiledunno, would have been a more regular approach, but you'll get the idea.
 
In relation to using cut on binary files, here's a trivial example for an unknown file:
Code:
[flip@flop ~]$ cut -b 2-5 testfiledunno
PNG
The file is a png file according to this output. The command: file testfiledunno, would have been a more regular approach, but you'll get the idea.
I appreciate the attempt to help, but what exactly is that supposed to accomplish? That would answer the question, it's still kinda cryptic to me.
 
This may be helpful.
That is a really nice website, unfortunately i already understand most of the top part. I'm currently going through william shotts "the linux command line" a second time because the notes when going through most of the book were so bad...this time, i'm making everything ultra clear for myself in the notes (even if it feels kinda unnecessary), and also going to make sure i have sed and awk down completely, because those both really make the command line powerful.

I will certainly be looking at the "packages" tutorial there, that's perhaps the most important part of running linux...and then all the stuff in the "journeyman" section as those are things i don't clearly understand. Such a great forum, i said it before and i'll say again!
 
I appreciate the attempt to help, but what exactly is that supposed to accomplish? That would answer the question, it's still kinda cryptic to me.
There wasn't meant to be anything cryptic about the example, it's just an example to illustrate that cut can access bytes throughout a binary file which may or may not be of use to a user.
 
There wasn't meant to be anything cryptic about the example, it's just an example to illustrate that cut can access bytes throughout a binary file which may or may not be of use to a user.
I wasnt criticizing your intention, i was just commenting on the fact that i didn't understand...ive realized with all computing stuff, understanding the end goal, or use value, is the key to totally understanding it.

When i say something is cryptic, its not necessarily an insult, i wouldn't be here if i didn't admire cryptography to a degree. I realize i will never understand everything (even just linux or computer related), yet there's pleasure in trying to figure things out for me.
 
I thought these text editing were just meant for text files, can you give me an example of when someone would need to use this on a binary program?

I wasn’t talking about using it on a binary program, I was talking about using it on a binary data file.

If you know the file format, you may want to extract information from certain byte positions/ranges.

As reading byte ranges is supported, cut should work on binary data files as well as text files. As per the PNG example posted above by @osprey.

Though it is a highly specialised use-case.
I doubt too many people would use it particularly often. But programmers/hackers/pen-testers/infosec/forensics types may occasionally need to parse out/extract information from binary data files.
 
Last edited:

Members online


Latest posts

Top