BASH 03 – Command-line Processing

Jarret B

Well-Known Member
Staff member
Joined
May 22, 2017
Messages
339
Reaction score
369
Credits
11,689
It is important to understand how the command-line processing occurs. The procedure of examining the command issued is not as straight-forward as you might think.

Going over a command to see how processing is achieved can show you how the command can fail. It can help you determine why it failed and how to fix it, so it works.

The processor has seven steps that it goes through for each command you give it.

The Seven-Step Process

I broke some steps down into more steps, but let’s look at these individually.
  1. Reads command
  2. Tokenization
  3. Command Identification
  4. Command Expansion
  5. Quote Removal
  6. Redirections
  7. Command Execution
We need to look at the seven steps in more detail. Be sure you understand each step as you go along.

Step 1: Reads Command

The command is input from a file or string.

A file would be a script file, which is mostly made up of a set of individual commands or strings. A command string is a string typed in the terminal or a single line of a script.

Step 2: Tokenization

The command issued to Bash comprises characters. We must split the characters into parts, or tokens. Tokens are separated by ‘|, &, ;, (, ), <, >, space, tab or newline’. Separation characters are known as meta-characters. A token that doesn’t contain meta-character and isn’t in quotes is a ‘word’. A token that contains no quotes and at least one meta-character is an ‘operator’.

There are two types of operators: control and redirection.

Control operators are:

newline, |, ||, &, &&, ;, ;;, ;&, ;;&, |&, (, )

Redirection operators are:

<, >, <<, >>, <&, >|, <←, <>, >&

NOTE: An Operator only matters if it is not in quotes.

A command example, such as:

echo $USER > output.txt

There are three tokens: ‘echo’,‘$USER’ and ‘output.txt’. None are quoted or contain meta-characters, so they are ‘words’. There are four meta-characters: three spaces and ‘>’. There is one operator: ‘>’.

Step 3: Command Identification

There are two types of commands: simple and compound. Simple commands comprise a single command with arguments. A simple command can also comprise multiple simple commands.

Compound commands comprise programming constructs such as ‘if’ statements or ‘loops’. These types of statements we will cover in a later article.

Commands are typically always the first word in a string. Other words after the command are usually arguments. For example:

echo $USER

The command is ‘echo’ and the argument is ‘$USER’.

Step 4: Command Expansion

The Command Expansion is split up into four sections.

  1. Brace
  2. Parameter, Arithmetic, Command Substitution, Tilde
  3. Word Splitting
  4. Globbing
The items are done in order from 1 to 4. If Arithmetic was to be performed and it is placed in Braces, then it wouldn’t work. Such as ‘echo {3..5+4}’. The result would be ‘{3..5+4}’.

Let’s look at examples of these four items.

The first two sections we covered in a previous article found here, except for tilde substitution. In Linux, the tilde (~) is replaced with the full path to the Home Folder for the current user.

Word Splitting

Word Splitting allows you to take a list of arguments and treat them as if they were individual arguments one at a time as a single argument. For example, let’s say we had a variable named ‘numberset’ and set a value of “1 2 3 4 5 6 7 8” to it with the command:

numberset = “1 2 3 4 5 6 7 8”

Be aware that the numbers are all separated by spaces and the quotes are necessary. We can create eight folders with the names 1, 2, 3, 4, 5, 6, 7 and 8. The command to make the folders is:

mkdir $numberset

Each number between the spaces is considered a unique number. After performing the command, execute ‘ls’ and see that the eight folder have been created. To easily remove the folders, use the command:

rmdir $numberset

Word Splitting also works with letters as well. Try:

months=“Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec”

You can create folders or files with this technique:

mkdir $months #Folders
touch $months #Files


To remove the folders and files, use:

rmdir $months#Folders
rm $months#Files


The space, tab and newline are used to separate values when word splitting.

Globbing

Globbing is the ability to perform a command in multiple files.

There are three special characters or character set you need to know:

  • *Everything
  • ?Everything for a single character
  • []match a specified character

Let’s look at the previous example of Word Splitting. Perform the following commands:

months=“Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec”
touch $months


Open the file ‘Apr’ and add the line ‘April’. Edit the file ‘Aug’ and put the word ‘August’ in it. Choose one other file and place the month's name into the file. Now, execute the command:

cat A*

The result should be two lines. They should be ‘April’ and ‘August’. The command is specifying to print the contents of the files starting with ‘A’ (and it is case-sensitive).

If you edit the files ‘Jan’, Jun’ and ‘Jul’ placing the full month name in the file, we can test more Globbing characters.

If you use the command ‘cat J?’ you get an error. The question mark (?) represents only one character. All the files contain three characters, so there isn’t a match. Now try ‘cat J??’ and you should get three responses for ‘January’, ‘June’ and ‘July’. The next command to try is ‘cat Ju?’. The response should only be ‘June’ and ‘July’ since they start with ‘Ju’ and comprise three letters. If we made a file name ‘Ju’ it would not be listed since it only has two letters and not three.

The last section is the brackets. Any characters placed in the bracket can match the position of placement. Such as:

cat [AJ]*

The command would return the contents of any file that started with an ‘A’ or ‘J’ and had any characters after them. You can use multiple brackets, such as:

cat [AJ][pu]*

The result would be ‘April’, ‘August’, ‘June’ and ‘July’. They all start with either ‘A’ or ‘J’ and have a second letter of either ‘p’ or ‘u’.

These sets can have as many multiples as you wish. There can be more than two, as in the example.

You can use any valid command with Globbing or any other examples I give. I am sticking to simple commands that are easy to follow. These commands can perform any task you wish and make your scripts perform an infinite number of tasks. The sky is the limit. We will build on these things and hopefully create some scripts at the end of the Bash series that will be useful to you.

Step 5: Quote Removal

In this step, any quotes that are not part of an expansion are removed.

There are three types of quotes that can be removed:

  1. Single Quote (‘)
  2. Double Quote (“)
  3. Backslash (\)

Single Quotes will remove all the meaning of the special characters between the quotes. For example, the command ‘echo $HOME’ will result in ‘$HOME’. The single quotes remove the meaning of the dollar sign ($) so expansion is not performed.

Double Quotes will remove all special meaning of the string, except for dollar signs ($) and beggar-ticks (`).

NOTE: Back-ticks are used to enclose a command that is executed.

We use backslashes to ‘escape’ a special character to allow it to not be removed. For example, to use the dollar sign to be printed, you must ‘escape’ it. The command would be ‘echo You made \$15.00’. If you removed the backslash, the dollar sign and the ‘1’ would be removed. If you want a backslash to print out, you need to escape it.

Step 6: Redirection

If you remember from Step 1, the operators were marked out from the words. Now, the operators are being looked at in this step.

Most of this might seem familiar to those who have used Linux, but I’ll cover some of the various redirection operators.

To start, there are three streams of data:

  1. Standard Input (stin) (0<) – input method to receive data (keyboard, files, etc)
  2. Standard Output (stout) (1>) – output method to place output data for use (screen, file, etc)
  3. Standard Error (sterr) (2>) – output method to place error messages (screen, file, etc)
An example of an input redirector is the less than symbol (<). If we have a file named ‘test.txt’ that has data in it that consists of a few lines of characters.

We can use the ‘cat’ command to show the content of the file. Of course, just using the command ‘cat test.txt’. We can redirect the data with the command ‘cat < test.txt’, which works the same way.

We can place the output of a command into a file. Placing the output into a file allows us to view or it use as input for another command. If the data is redirected to a file, the data will not appear on the screen as normal. Let’s look at creating a man page for a command. Let’s look at the command ‘wc’. We can perform the command ‘wc --help > man-wc.txt’ You can now look at the output using an input ‘cat < man-wc.txt’.

If you want to perform multiple commands and place the output into a single file, you need to append the data using the double greater-than symbols (>>). A single greater than symbol (>) overwrites the file.

Handling error messages can be managed and output somewhere other than the screen. To do this, the redirector is the number 2 with a greater than symbol (>). If we were to issue the command ‘echohello’, you can see that there is a space missing between ‘echo’ and ‘hello’. An error is generated similar to ‘echohello: command not found’. We can issue the command ‘echohello 2> error.txt’. No error will appear on the screen but is placed into the file ‘error.txt’.

NOTE: You can see that the sterr redirector is ‘2>’ The redirectors can be used for the other standard streams: stin (0<) and stout (1>).

You are also able to place all output, both Standard Out and Standard Error, to a single place, by using the redirector (&>). If you want to use the redirector for multiple commands and have the output appended, then use ‘&>>’.

NOTE: Multiple redirectors can be used in one command.

Step 7: Command Execution

At this point, the command is processed. The remaining string that is left after all the steps have been gone through, is used to be executed.

The execution is done and the output is shown or redirected.

If you want to see the actual output command as it would appear in Step 7, you can enable the tracing with the command ‘set -x’. Tracing can be disabled with the command ‘set +x’. The setting causes the expanded command to be printed that is being executed. After the execution command is displayed, the output is printed.

If the output is not correct, as you expect it to be, you can go through the steps, one at a time, and hopefully find the fault.

Example Command

Create a file in your Home Folder named ‘test.txt’ and placed text into the file so you easily know when the file has been displayed. Change folders to somewhere other that your Home Folder.

From this new folder, issue the command ‘cat “~/test.txt”’. You should get back the result ‘~/test.txt’. You may expect the contents of the ‘test.txt’ file, but it didn’t happen.

NOTE: The command will work with just ‘cat ~/test.txt’, but we need to see how the command processing works.

You can turn on tracing with ‘set -x’ and reissue the command.

The result of the processing steps leave the command ‘+ echo '~/test.txt'’. Here, you can see there was no tilde expansion performed in Step 4. If you look over Step 5, we look at how Bash manages quotes. The thing about Step 4, anything inside double quotes is not expanded unless it is a dollar sign ($) or a back-tick (`). In this case, the tilde is ignored in single or double quotes.

To fix the command, we need to remove the quotes around the tilde and the forward slash (/).

Let’s try the command as ‘cat ~/”test.txt”’. The tilde is not in quotes so it should be expanded.

The command should now work properly.

Conclusion

I hope you learned something from this article and realized that with Bash, it is not just simply taking your command and executing it. There is a lot of things occurring in the background after a command is issued and before it is executed.

Think back about any commands you issued that resulted in an error and you could not determine a cause. Now, you might better understand and determine the cause of the error.
 

Members online


Top