What is the difference between the option -w and \b with egrep?

eawedat

New Member
Joined
Feb 6, 2023
Messages
5
Reaction score
2
Credits
51
Is there difference between:

egrep -w "word" filename
egrep "\bword\b" filename

or both are equal?
 


I bet Google will help. Copy and paste the 2nd line into your favorite search engine and read some of the results.

'Cause, I agree, this looks like homework. We don't do homework, as a general rule. (And we're pretty good at detecting homework, but not always completely accurate.)

Anyhow, this is something you can both search for and test to verify without a whole lot of effort.
 
  • Like
Reactions: Rob
Even if this is homework - I'll explain it as it may benefit the community here too.

Both are exactly the same.
-w matches the whole word
\b matches any word boundary. So that could be a start, or end boundary.

So surrounding any word with the \b boundaries e.g. "\bword\b" is pretty much exactly the same as the -w (whole word) option.

In regex syntax, you also have \< which matches the start of word boundary and \> which matches the end of word boundary. Which means you could search for words that begin or end with a particular pattern.

So if you consider a file with the following lines:
Code:
endanger
endearing
endurance
lands-end
legend
Pete Townsend
the end
weekend

Here are some egrep commands, using the various word boundary options above and the output we'll see (with the matching parts in bold):

Using egrep with the -w / whole word operator
Bash:
egrep -w "end" ./file
would output:
lands-end
the end
The apostraphe - counts as a word boundary, so the line lands-end is matched, because it contains the whole word end.

Likewise, using the \b operators:
Bash:
egrep "\bend\b" ./file
would yield:
lands-end
the end
Exactly the same as the -w option.

Now let's take a look at the start/end boundary operators:
If we use both of them together, we'll get the same as the -w and \b operators:
Bash:
egrep "\<end\>" ./file
As expected, yields:
lands-end
the end

So what are the point of the boundary operators?
Well, if you use them carefully, you can build more powerful search patterns.
So for example, if we just used the start boundary operator, we can find all lines which have the word "end" at the start of the word boundary.
So only using the start boundary operator to find lines containing words that start with "end":
Bash:
egrep "\<end" ./file
This yields the following:
endanger
endearing
endurance
lands-end
the end
So now we've got any lines containing words that start with "end".

Likewise, the end boundary operator:
Bash:
egrep "end\>" ./file
Would yield lines containing words that ended with "end":
lands-end
legend
Pete Townsend
the end
weekend

OK, now lets take things a bit further.
Consider this text file:
Code:
The endurance of olympic athletes.....
He finally reached Lands-End.
He was a legend.
The frontman of The Who was Pete Townsend.
Therein, he realised how he could end his problems.
The end......
..... of the weekend!

And now let's try to find ALL lines that have "the" at the start of a word boundary and that have the sequence "end" at the end of a word boundary, but with any number of characters in between those two conditions. And we'll do this with a case-insensitive search:
Bash:
egrep -i "\<the.*end\b" ./file2

That will output the following lines (matching parts in BOLD):
The frontman of The Who was Pete Townsend.
Therein, he realised how he could end his problems.
The end......
..... of the weekend!

In that previous egrep I could have explicitly used the end-of-word-boundary operator \> at the end, but \b was slightly less effort to type! Ha ha!

Also if we only wanted to see the matching parts of the line instead of the whole line, we could have added the -o option.

Like I say, all of those boundary operator variants have their uses. And when combined with other extended regex operators, you can build extremely powerful regexes.

Sometimes you might want/need to be explicit in whether a boundary is a start-of-word, or end-of-word boundary, so you'd use \< and/or \>.
Other times, you might not care and just use \b.
And if you're just looking for lines containing a simple, whole word, you'd probably just use the -w option.
 
Last edited:

JasKinasis

Even if this is homework - I'll explain it as it may benefit the community here too.

Both are exactly the same.
-w matches the whole word
\b matches any word boundary. So that could be a start, or end boundary.

So surrounding any word with the \b boundaries e.g. "\bword\b" is pretty much exactly the same as the -w (whole word) option.

In regex syntax, you also have \< which matches the start of word boundary and \> which matches the end of word boundary. Which means you could search for words that begin or end with a particular pattern.

So if you consider a file with the following lines:
Code:
endanger
endearing
endurance
lands-end
legend
Pete Townsend
the end
weekend

Here are some egrep commands, using the various word boundary options above and the output we'll see (with the matching parts in bold):

Using egrep with the -w / whole word operator
Bash:
egrep -w "end" ./file
would output:

The apostraphe - counts as a word boundary, so the line lands-end is matched, because it contains the whole word end.

Likewise, using the \b operators:
Bash:
egrep "\bend\b" ./file
would yield:

Exactly the same as the -w option.

Now let's take a look at the start/end boundary operators:
If we use both of them together, we'll get the same as the -w and \b operators:
Bash:
egrep "\<end\>" ./file
As expected, yields:


So what are the point of the boundary operators?
Well, if you use them carefully, you can build more powerful search patterns.
So for example, if we just used the start boundary operator, we can find all lines which have the word "end" at the start of the word boundary.
So only using the start boundary operator to find lines containing words that start with "end":
Bash:
egrep "\<end" ./file
This yields the following:

So now we've got any lines containing words that start with "end".

Likewise, the end boundary operator:
Bash:
egrep "end\>" ./file
Would yield lines containing words that ended with "end":


OK, now lets take things a bit further.
Consider this text file:
Code:
The endurance of olympic athletes.....
He finally reached Lands-End.
He was a legend.
The frontman of The Who was Pete Townsend.
Therein, he realised how he could end his problems.
The end......
..... of the weekend!

And now let's try to find ALL lines that have "the" at the start of a word boundary and that have the sequence "end" at the end of a word boundary, but with any number of characters in between those two conditions. And we'll do this with a case-insensitive search:
Bash:
egrep -i "\<the.*end\b" ./file2

That will output the following lines (matching parts in BOLD):


In that previous egrep I could have explicitly used the end-of-word-boundary operator \> at the end, but \b was slightly less effort to type! Ha ha!

Also if we only wanted to see the matching parts of the line instead of the whole line, we could have added the -o option.

Like I say, all of those boundary operator variants have their uses. And when combined with other extended regex operators, you can build extremely powerful regexes.

Sometimes you might want/need to be explicit in whether a boundary is a start-of-word, or end-of-word boundary, so you'd use \< and/or \>.
Other times, you might not care and just use \b.
And if you're just looking for lines containing a simple, whole word, you'd probably just use the -w option.
thank you very much for this detailed explanation, you are really contributing to the community,

and for the members who commented here that it is a homework, so the answer no, it is simply not.
it came from personal experience and testing, i just tested the both commands and i saw that they gave the same results, so i wanted to understand the philosophy and logic behind giving two options -w and boundaries,
 

Members online


Latest posts

Top