A brief introduction to the Unix command-line¶
Introduction¶
The purpose of this introductory guide is to provide a gentle explanation of concepts fundamental to using the Unix command-line, in a manner that will hopefully be accessible to beginners. It relies heavily on small examples to illustrate these ideas with hopefully relevant situations. It does not provide a comprehensive description of everything that can be done with the Unix command-line, and does not even mention most of the many useful commands available on any Unix system. The hope is that by explaining the concepts behind how things actually work, readers will quickly be able to figure out the rest by themselves, and find out anything else they might need from the enormous amount of excellent documentation available online. It’s only a Google search away – but hopefully this document will help you to figure out what to ask in the first place.
What is the command-line?¶
The command line is simply a way of passing instructions to the computer. It is a text-only interface, and commands are supplied to the shell by typing them in and pressing the return key. Any output produced by these commands is printed out on the same text interface. Although daunting at first, the command line provides a very rich interface to the available functionality, and it is worth learning how to use it.
The command line interface will print out a prompt, after which commands can be typed. After the command has completed, the prompt will be printed out again, indicating that the shell is ready to accept new commands.
An important point to note is that Unix is case-sensitive: a file called
Data
is not the same as a file called data
.
The command line prompt can vary from system to system, but usually consists of
some text followed by the $
character. In many systems, it might look like
this:
username@machine:path $
where username
corresponds to your own username, machine
corresponds to
the name of the system you are currently logged onto, and path
indicates your
current working directory. Note that this can be configured
differently, do not be surprised or alarmed if it looks different on your
system.
Warning
If the prompt ends with a #
, this most likely means that you are logged
in as root (the super-user or Administrator). If this is the case, please log
out before doing anything irreversible - unless you know what you are doing.
Note
A session will typically involve two separate programs:
- the shell: the program that actually interprets the commands that are typed in, and takes the appropriate action. There are a number of different shells available, each with their own syntax rules. The default shell in most Linux distributions is the Bourne Again Shell (bash). This is also the default on macOS and MSYS2 (used in Windows installations), and for this reason is the default assumed in this guide.
- the terminal: the program responsible for displaying the output of the shell, and what the user types in. There are a number of different terminals available, each with different levels of sophistication.
This leads to a bewildering array of combinations between the different shells and terminal programs available. Nonetheless, the general principles are fairly universal.
How to access the command line¶
There are many ways to start a command line session, depending on your OS and desktop environment.
GNU/Linux
Different Linux distributions include different desktop environments. For instance Ubuntu installations will typically be running Unity, Red Hat and derivatives will typically be running GNOME, while SuSE would typically come with KDE.
Given the many different possible configurations, it is impossible to give specific instructions for each case. Nonetheless, in general you ought to be able to identify an application called ‘Terminal’ or similar in the desktop’s main menu.
macOS
You will find the ‘Terminal’ application within the Utilities folder in the Applications folder in the Finder.
Windows (MSYS2)
You will need to use the MSYS2 MinGW-w64 Win64 shell, either by double-clicking on the shortcut (on the Desktop or in the Start menu), or by searching for it (hit the Windows key and start typing ‘MSYS2’ – it should show up in the list of applications displayed).
Basic Structure of a command¶
In the simplest form, a command consists of a line of text, which is first broken up by splitting it using spaces as boundaries. The first ‘word’ must be the command name, corresponding to a real program to be executed. All other ‘words’ will eventually be passed to the program as arguments that should provide it with enough information to perform what is intended.
Command line arguments¶
To perform whatever action is required, the program that is being run may need some additional information. This information can be supplied to the program via the arguments that follow it in the command. Essentially, arguments are a series of ‘words’ specified in the right order so that the program can understand what is required of it.
For example, you need to supply two arguments to copy a file: the original file name, and the name of the file to be created. The program to do this, cp, expects the first argument to be the name of the file to be copied, and the second argument to be the name of the duplicate file to be created. If it does not find two arguments, it will produce an error message.
Note that some programs may accept a number of arguments, but use default values if they are omitted (example of these are cd and ls). Other programs may accept variable numbers of arguments, and process each argument in turn.
Command line options¶
There is a special type of argument that you might encounter, often referred to
as a command line option or switch. The purpose of these optional arguments is
to modify the behaviour of the program in some way. Command line options always
start with a minus symbol to distinguish them from normal arguments. For
example, passing the appropriate option (-l
) to the ls command
when listing the files in the current folder will produce a longer listing,
including information such as file size and modification time as well as the
file names normally output.
Command line options can also require additional arguments. In this case, these additional arguments should be entered immediately after the option itself – see the examples below.
Examples¶
Below are some typical command examples. (the $
symbol indicates the
prompt):
To list the contents of the current working directory:
$ ls
To list the contents of the current working directory, along with the file permissions, owner, size and modification date:
$ ls -l
To copy the file
source
, creating the filedest
:$ cp source dest
To convert image
source.mif
(MRtrix format) into imagedest.nii
(NIfTI format):$ mrconvert source.mif dest.nii
To convert image
source.mif
into imagedest.nii
, changing the voxel size to 1.25 x 1 x 1 mm and changing the datatype to 32-bit floating-point:$ mrconvert source.mif -vox 1.25,1,1 -datatype float32 dest.nii
Dealing with spaces in arguments¶
As previously mentioned, the command actually typed in will first be split up into tokens using spaces as delimiters. In certain cases, it may be necessary to provide arguments that contain spaces within them. A common example of this is when file names contain spaces (note that this should be avoided, especially since other programs and scripts often have issues dealing with these). This is obviously a problem, since an argument with a space in it will be interpreted as two separate arguments. To supply an argument with a space in it, use the following syntax.
As an example, if we need to supply the argument “argument with spaces” to some command, we can use any of the following:
argument\ with\ spaces
'argument with spaces'
"argument with spaces"
In the first example, the backslash character tells the shell to ignore the subsequent space character and treat it as a normal character.
Escaping special characters¶
We have already seen that spaces are treated differently from other characters
and need to be encapsulated by quotes ', "
or escaped by a preceding \,
to prevent them being interpreted by the shell as token delimiters. You will
most likely also encounter other special characters such as
!#$^&*?[](){}<>~;|
in more advanced usages; these come in
handy for instance for processing multiple files using
wildcard characters.
One can influence the way the shell interprets these special characters by
quoting and escaping the input. For instance, the string 'argument with
spaces'
uses single-quotes (strong quoting): this means that everything
between the two '
symbols is treated as literal characters without
special meaning. In the command below the series of special characters are
treated as a simple string and printed to the terminal via the command
echo.
$ echo look how ordinary these characters are: '!#$^&*?[](){}<>~;|\'
Unless encapsulated in single quotes, individual special characters can also be
marked to lose their special meaning using the backslash. For instance,
\'argument with spaces\'
would expand to three arguments 'argument
,
with
, and spaces'
. The only exception to this rule is the newline
character, which allows commands to span across multiple lines:
$ echo look how ordinary these characters are: '!#$^&*?[](){}<>~;|\' \ and \'
Double quotes "
are used for weak quoting, which escapes all characters
expect for \
, $
and itself. This disables some of the shell’s
interpretations (spaces, single-quotes, pattern matching, pathname expansions)
while others remain active (such as parameter expansion $
).
$ ls -l "$HOME/folder with spaces"
Note that special characters’ meaning can be shell- and context-dependent.
For example, in the Bourne Again Shell (bash), the string filename[].mif
is not interpreted but in the Z shell (zsh, the default shell for new user
accounts since macOS version 10.15), the opening [
needs to be quoted
"filename[].mif"
or escaped using a backslash filename\[].mif
.
For more information, consult your shell’s man page or this overview post,
Specifying filenames: paths¶
Filenames are often supplied to programs as arguments. For this reason, it is essential to have a good understanding of how files are specified on the command line. In Unix, a path is a term commonly used almost interchangeably with filename, for reasons that will hopefully become clear in this section.
Files and folders¶
Files and folders are stored on computers using a folder or directory
structure. For example, on a Windows computer, you might find a folder called
MyDocuments
, within which there might be a data
folder, and within that
some more folders, etc. Specifying a file or folder is simply a matter of
providing enough information to uniquely identify it.
The easiest way to visualise the directory structure is to think of it as a
tree. If you listed the contents of the root folder (the root of the tree),
you would find a number of other folders (the main branches). For example, one
of these folders would be the home
folder, where user accounts are kept.
These folders might contain more folders (smaller branches) and/or files
(leaves), as illustrated below:
/
├── bin/
├── boot/
├── home/
│ └── donald/
│ ├── data/
│ │ ├── pilot/
│ │ │ ├── subject1/
│ │ │ │ └── dwi.mif
│ │ │ ├── subject2/
│ │ │ │ └── dwi.mif
│ │ │ └── subject3/
│ │ │ └── dwi.mif
│ │ └── project/
│ │ ├── analysis_script.sh
│ │ ├── control1/
│ │ │ ├── anat.nii
│ │ │ └── dwi.mif
│ │ ├── control2/
│ │ │ ├── anat.nii
│ │ │ └── dwi.mif
│ │ ├── control3/
│ │ │ ├── anat.nii
│ │ │ └── dwi.mif
│ │ ├── patient1/
│ │ │ ├── anat.nii
│ │ │ └── dwi.mif
│ │ └── patient2/
│ │ ├── anat.nii
│ │ └── dwi.mif
│ ├── Desktop/
│ └── Documents/
├── usr/
└── var/
Here, the folder called project
can be uniquely identified by starting from
the root folder, going into home
, then donald
, data
, and finally
project
. This process can be thought of as specifying the path to the
file or folder of interest. In fact, this is the exact term used in Unix
jargon, essentially meaning ‘an unambiguous file name’. Thus, specifying a
filename boils down to providing a unique, unambiguous path to the file.
Note
In this context, directory and folder are synonymous.
Absolute paths¶
On Unix, the root of the tree is always referred to using a simple forward
slash. Folders are referred to using their names, and are delimited using a
forward slash. For example, the full, absolute path to the project
folder
in the figure above is:
/home/donald/data/project/
This simply means: starting from the root folder (/
), go into folder
home
, then donald
, then data
, to find project
. This is an
example of an absolute path, because the start point of the path (the root
folder) has been specified within the filename. Thus, an absolute path must
start with a forward slash – if it does not, it becomes a relative path,
explained below.
The working directory¶
When using the command line, you will often find that many of the files you are manipulating reside in the same folder. It therefore makes sense to specify this folder as your working directory, and then simply refer to each file by its name, rather than its absolute path. This is exactly what the working directory is, and it can save you a lot of unnecessary typing.
You can also think of it as your current location on the directory tree. For
example, if your current working directory is /home/donald/data/project
,
you can imagine that you have climbed up branch home
, then up branch
donald
, then up branch data
, then up branch project
, and since
you’re sitting there, you have direct access to all the files and folders that
spring from that branch.
Your working directory can be specified with the command cd, and queried using the command pwd (both described in Basic commands).
Relative paths¶
Relative paths are so called because their starting point is the current working directory, rather than the root folder. Thus, they are relative to the current working directory, and only make sense if the working directory is also known.
For example, the working directory might currently be
/home/donald/data/project/
. In this folder there may be a number of other
files and folders. Since the file analysis_script.sh
is in the current
working directory, it can be referred to unambiguously using the relative path
analysis_script.sh
, rather than its full absolute path
/home/donald/data/project/analysis_script.sh
– that’s a lot less typing.
When you specify a relative path, it will actually be converted to an absolute
path, simply by taking the current working directory (an absolute path),
appending a forward slash, and appending the relative path you supplied after
that. For example, if you supply the relative path analysis_script.sh
, the
system will (internally) add up the current working directory
/home/donald/data/project
+ /
+ analysis_script.sh
to give the
absolute path.
Since the system simply adds the relative path to the working directory, you
can see that files and folders further along the directory tree can also be
accessed easily. For example, the project
folder contains other folders,
patient1
, patient2
, etc. The file anat.nii
within one of these
folders can be specified using the relative path patient1/anat.nii
(assuming your current working directory is /home/donald/data/project
).
Of course, if you changed your current working directory, the relative path
would need to change accordingly. Using the same example as previously, if
/home/donald/data/project/patient1
was now your current working directory,
you could use the simpler relative path anat.nii
to refer to the same file.
Special filenames¶
A few shortcuts have special significance, and you should learn to use them, or at least know of them. These are:
~
(tilde):shorthand for your home folder. For example, I could refer to the
project
folder as~/data/project
, since my home folder is/home/donald
..
(single full stop):the current directory. For example, if my current working directory is
/home/donald
, I can refer to theproject
folder by specifying./data/project
, or evendata/./project
. Although this may not look very useful, there are occasions when it becomes important (see examples below)...
(double full stop):the parent folder of the current directory. For example, if my current working directory is
/home/donald/Desktop
, I can still refer to thedata
folder using the relative path../data
. This shortcut essentially means “drop the previous folder name from the path”, or “go back down to the previous branch”. Here are some alternative, less useful ways of referring to that samedata
folder, just to illustrate the idea:../../donald/data ../Documents/../data ~/Desktop/../data
Using wildcards¶
There are a number of characters that have special meaning to the shell. Some
of these characters are referred to as wildcards, and their purpose is to ask
the shell to find all filenames that match the wildcard, and expand them on the
command line. Although there are a number of wildcards, the only one that will
be detailed here is the *
character.
The *
character essentially means ‘any number or any characters’. When the
shell encounters this character in an argument, it will look for any files that
match that pattern, and append them one after the other where the original
pattern used to be. This can be better understood using some examples.
Imagine that within the current working directory, we have the files
file1.txt
, file2.txt
, file3.txt
, info.txt
, image1.dat
, and
image2.dat
. If we simply list the files (using the ls command),
we would see:
$ ls
file1.txt file2.txt file3.txt
image1.dat image2.dat info.txt
If we only wanted to list the text files, we could use a wildcard, and specify
that we are only interested in files that end with .txt
:
$ ls *.txt
file1.txt file2.txt file3.txt info.txt
We might only be interested in those text files that start with file
. In
this case, we could type:
$ ls file*.txt
file1.txt file2.txt file3.txt
This use of wildcards becomes very useful when dealing with folders containing large numbers of similar files, and only a subgroup of them is of interest.
Note
It will be important later on to understand exactly what is going on here. Typing a command such as:
$ ls *.txt
does not instruct the ls command to find all files that match the wildcard. The wildcard matching is actually performed by the shell, before the ls command is itself invoked. What this means is that the shell takes the command you typed, modifies it by expanding the arguments, and invokes the corresponding command on your behalf. In the case above, this means that the command actually invoked will be:
$ ls file1.txt file2.txt file3.txt info.txt
In other words, your single argument containing a wildcard is expanded into multiple matching arguments by the shell.
As another example, a command like:
$ cp *.dat
will be expanded to:
$ cp image1.dat image2.dat
which will cause image2.dat
to be overwritten with the contents of
image1.txt
– presumably causing irretrievable loss of data. In other
words: think carefully about what you’re typing…
Basic commands¶
Below is a list of commands that are very commonly used. Of these, cd and ls are essential to using the command line, and you should be familiar with them.
Each command described below has a syntax line, which gives a brief description of how it should be used. The first item on the line is always the name of the command, followed by a number of arguments. Arguments given in square brackets are optional and can be omitted. If an argument is followed by three dots (‘…’), it means that any number of that type of argument can be provided, and each will be processed in turn. In addition, any options that are of particular interest are listed in the corresponding section for each command.
cd
: change working directory¶
$ cd [folder]
Change the current working directory. If no folder is specified, the current
working directory will be set to the home folder. Otherwise, it will be set to
folder
, assuming that it is a valid path.
Note that in this command, the token -
has special meaning when passed
instead of folder
: it refers to the previous working directory. This can be
useful to rapidly change back and forth between folders, e.g.:
# our current working directory is /home/donald/Documents
$ cd ../data/project/patient2
... # current working directory is now /home/donald/data/project/patient2
... # do something useful
$ cd - # switch back to /home/donald/Documents when you're done
ls
: list files¶
$ ls [option]... [target]...
List files. If no target is specified, the contents of the current working directory are listed. Otherwise, the files specified are listed in the order provided, assuming that they are valid paths. If a target is a folder, its contents will be listed.
Options:
-a
list all files, including hidden files (hidden files are those that start with a full stop ‘.’).
-l
provide a long listing, including file size, ownership, permissions, and modification date.
cp
: copy files or folders¶
$ cp [option]... source target
$ cp [option]... source... folder
In its first form, copy the file source
to create the file target
, assuming
both are valid paths. You should be aware that in general, if the file
target
already exists, it will be overwritten (although this behaviour can
be modified through an alias).
The second form is used to copy one or more source
files into the
target
folder. In this case, target must be a pre-exising folder. Each
newly created file will have the same base name as the original.
Options:
-i
ask for confirmation before overwriting any files.
-r
recursive copy: use this option to copy an entire folder and its contents.
mv
: move/rename files or folders¶
$ mv [option]... source target
$ mv [option]... source... folder
In its first form, move or rename the file (or folder) source
to target
,
assuming both are valid paths. Note that renaming is essentially equivalent to
moving the file to a different location, if source
and target
reside in
different folders.
The second form is used to move one or more source
files into the
target
folder. In this case, target
must be a pre-existing folder.
Options:
-i
ask for confirmation before overwriting any files.
echo
: output strings to the terminal¶
$ echo [option]... [string]...
The echo command writes any specified arguments, separated by single space
characters and followed by a newline (\n
) character to the standard output.
Examples of typical command use¶
Below are some examples of commands in typical use, illustrating some of the concepts explained in this document. To fully understand the examples, you may need to refer back to the sections on Specifying filenames: paths, using special filenames, or using wildcards.
To change your current working directory to its parent folder (move one branch down the directory tree):
$ cd ..
To change your current working directory from whatever it was to the
data
folder in your home directory:$ cd ~/data
To list all images (with the
.png
suffix) whose filename start withns
from thecontrols
folder:$ ls controls/ns*.png
To move the file
data.mat
, residing in the current working directory, into the parent folder of that directory:$ mv data.mat ..
To copy the file
info.txt
from the folderimportant
into the current working directory:$ cp important/info.txt .
To copy all shell script files from the
data
folder in your home directory into thescripts
folder in the current working directory:$ cp ~/data/*.sh scripts/
To copy all images for study 3 of patient Joe Bloggs from the
/data
folder into the current working directory (assuming the images are named according to the conventionID-study-slice.ima
):$ cp /data/bloggsj_010203_123/*-3-*.ima .
Useful tips¶
The commands that you might type in can get quite long, and using the command line can quickly become laborious. You are strongly encouraged to read the tips below, which can not only save you a lot of unnecessary typing, but also help you avoid making mistakes and typos. Some of these features may not be immediately obvious, so you are advised to spend a little time familiarising yourself with them.
Previous command history - the up arrow key¶
You will often find yourself typing a whole series of similar commands, that differ only by a few characters each time. It would be tedious if you had to type each one of them in individually. Similarly, you may find that the long command you just typed in has failed because of a single little typo. To save yourself the time and the hassle, use of the command history.
The shell remembers all the commands that you recently typed in. At any time, you can set the text of your command to any of these previous commands, as if you had just typed it in. It is then trivial to go back and edit those sections that need to be amended, using the left and right arrow keys, etc.
To access previous commands, simply press the up arrow. This will give you the last command you typed in. Pressing it again will produce the command before that, and so on. You can also press the down arrow key to find more recent commands if you have gone too far back in the command history.
Command completion - the tab key¶
You will often need to supply filenames as arguments to commands. Some of these may be long and hard to remember, for example, if they contain patient IDs and dates. However, the computer knows everything about which files are in which folder, and can therefore help the user to type in the right filename. You will find that often, it is only necessary to type in the first few letters of the file and the shell will fill in the rest for you.
You can ask the computer to attempt to complete the filename by pressing the
TAB
key. At this point, the shell will look at the fragment that you have
already typed in, and compare it to the list of files in the corresponding
folder. For example, if you type:
$ ls /data/3T_scanner/b
and press the TAB
key, the shell will look at the contents of the
/data/3T_scanner
folder, and see if any of these files or folders begin
with a b
. At this point, one of three things can happen:
there is only one file that begins with the fragment you supplied. In this case, the shell will complete the filename for you. Using the example above, if the only file or folder in
/data/3T_scanner
that started with ab
wasbloggsj_010203_123
, then the command would be changed to:$ ls /data/3T_scanner/bloggsj_010203_123/
there are more than one file that begin with the fragment you supplied. In this case, the shell will emit a beep or some other visual feedback (unless configured not to) to notify the user.
Pressing the
TAB
key a second time will cause the shell to output a list of all the potential candidates, so that the user can select the right one. You should then just type in more of the filename and press theTAB
key again once you think that fragment is unambiguous.Using the example above, there might have been another folder in
/data/3T_scanner
that started with ab
, saybrownj_030201_789
. In this case, the firstTAB
key press would have produced a beep, and the second would have produced a list like the following:$ ls /data/3T_scanner/b bloggsj_010203_123 brownj_030201_789
You then simply need to type in an extra
l
and pressTAB
again for the shell to change the command as above.there are no files that start with the fragment you supplied. In this case, the shell will also emit a beep to notify the user. This time, pressing the
TAB
key again will only cause repeated beeping. You should amend what you have typed in, and make sure that it is correct up to that point.
What’s going on under the hood?¶
While the description given so far is sufficient to get started, you will rapidly encounter situations where things don’t work as you might expect. In these cases, it really helps to understand how what you type translates into action, so that you can usefully reason about what’s going on. The main thing to get to grips with is what the various components are and what their specific roles are.
The terminal¶
The terminal’s role is purely one of user input and output: it displays the output produced by the shell or the command current executing within it, and receives input from the user via the keyboard (and potentially the mouse too) to relay it to the shell or the command currently executing. In both cases, the input and ouput consists merely of a stream of bytes. Some of these characters correspond to printable characters, others to non-printable characters (e.g. carriage return, newline, …), and yet others to sequences that might carry special meaning (e.g. the VT100 Terminal Control sequences). You might see some of these garbled sequences of characters when output intended for the terminal is instead written to file.
The shell¶
The role of the shell is to interpret the input provided by the user, and provide output in response to be displayed by the terminal. The most obvious output produced by the shell is the prompt, which typically looks like:
username@machine:path $
This is simply a string of characters produced by the shell, instructing the terminal to display this to the user, in order to indicate to the user that it is ready to receive input. The convention on modern systems to display the username, machine and path is entirely optional, but you’ll quickly find it very useful as you navigate around the folders on your system, or if you start using the terminal to remotely access other systems (e.g. a high performance computing cluster).
The input that you type at this point is then passed to the shell, which
decides what to do with it. In most cases, it will simply echo the characters
you’ve entered so you can see what you’ve typed on the terminal. But some of
your input might already start to be interpreted, for example Ctrl-A
will
bring the cursor to the start of the line, Ctrl-E
back to the end (along
with the Home & End keys, on a well-configured system). The Left & Right
arrow keys will allow you to move the cursor along to edit your command. The
Up & Down arrow keys will bring up previous commands that you’ve typed. These
are all examples of the shell interpreting your input and responding
accordingly.
Note
The shell’s handling of your input is typically managed by the readline library. It is highly configurable, and the actions tied to each of these keys can be modified in many ways. This tutorial describes the actions you’d expect on most Unix systems by default, but don’t be surprised if things behave slightly differently on other systems.
Once you’ve typed in your command, you will typically just hit the Return key to execute it. In the simplest case, when the shell receives this instruction, your input will simply be broken up into a list of distinct ‘entries’, each separated by spaces (although it entirely possible to have entries with spaces, as detailed earlier). In other cases, it will also perform any additional interpretation required (e.g. for wildcard expansion, variable substitution, etc). These entries are typically called the command-line arguments.
The first (and possibly only) entry in this list is expected to be the command
name, and the shell will try to locate the corresponding executable (unless
it’s a special shell built-in command, such as cd
, pwd
, echo
,
export
, …). Executables are just regular files that happen to have a
special flag set to signal the fact that they can be executed (you can see this
using ls -l
– look for the x
in the permissions field). Some of these executables might
be human-readable text, in which case they tend to be called scripts. Other
executables consist of machine code: the raw instructions executed by the CPU
– these are often referred to as binaries. Regardless, the shell will need
to locate this file so that it can execute the code it contains.
To locate the executable, the system will typically look through the list of
folders contained in the PATH
environment variable. You can query this list
yourself by typing echo $PATH
(it’s a colon-separated list of folder
paths). This is why it is often necessary to modify the PATH
when installing
non-standard software: this is how the shell ‘knows’ where to find these new
commands. It is also possible to provide the exact location of the command, by
invoking the command via the absolute path to its executable
(e.g. /usr/bin/whoami
, rather than just whoami
), or a relative
path to it (e.g. ./bin/myexecutable
); this can come in handy if
you just need to execute your script from a location not in your PATH
,
without having to modify the PATH
explicitly.
The command¶
At this point, the shell is now ready to actually run your command. It will place your list of command-line arguments in a special location, and launch the executable itself. The executable now takes over the terminal: any input you provide will be send to it, and any output it produces will be displayed in the terminal. It also has access to your list of command-line arguments, and will perform its actions based on what it finds there.
It is important to note that how the command interprets the arguments it’s been
given is entirely up to the command itself – or rather, up to the command’s
developer. This means that different commands will adopt slightly different
conventions as to how their arguments are expected to be provided. Some
commands expect all command-line options to be provided before
any of the main arguments, and others don’t. Some commands expect options to be
provided in short form (e.g. ls -l
), other in long form (e.g. git
--help
), others will accept a mixture of both (some short, some long), and
yet others may accept both forms for the same option (e.g. cp -f
is
equivalent to cp --force
). There are accepted conventions, but this by no
means implies that all developers will abide by them.
Another potential source of information that the executable will have access to
is via environment variables (the PATH
is one such variable). This allows
the user to set a variable, which the command can query as it is executing.
This might be to specify the location of important configuration files, or the
number of threads that the application is expected to run, etc. In general,
this is reserved for information that is unlikely to change very often,
allowing the user to set this information once within the shell’s startup file
(typically ~/.bashrc
), and no longer have to worry about it. For example,
adding this line in ~/.bashrc
means that applications can query this
variable at runtime:
export MYAPP_DIR=/usr/local/myapp/configfiles
Implications¶
Once you appreciate the way these components fit together, various aspects of the system may start to make more sense. For instance, there are many terminal programs available, from raw VT100 terminals, to various graphical ones, all with various levels of functionality (e.g. multiple tabs, split display, transparent background, unlimited scrollback, etc.). But within these various terminals, you will generally be running the same shell, and it’ll behave the same way no matter which terminal it is running in. However, if you log into a different system, you may find subtle differences in the way it behaves (different prompts, some keyboard shortcuts that work differently, etc.). You may also find that the default shell you are logging in with is different on different system: some HPC systems are configured with the C shell as the default, and its syntax can be quite different. But at heart, the concepts outlined here will be the same.
It’s also important to understand what the shell will do to your command as
it’s interpreting it, and what arguments this will translate to once passed to
the executable itself. A useful trick here is to prefix your intended command
with echo
: the shell will perform all the variable substitute and wildcard
expansion that it normally would, but because echo
is now the command, all
this will now simply be printed on the terminal. For example:
$ echo cp files*.txt destination/
cp file.txt file_1.txt file_2.txt file_3.txt destination/
Note that the command itself is not executed; it is merely displayed as it
would have been interpreted by the shell. This might come in handy to see that
the files.txt
file will also be copied, when that might not have been
intended. This trick is particularly useful when performing more advanced
substitutions, as you’ll see in the Advanced usage page.
Advanced usage¶
So far, we have only covered the simplest aspects of what the shell can do. But it can be used for far more than this, and can even be used as a full-blown scripting language capable of running complex applications. While this level of mastery is probably unnecessary for most users, there are a few advanced topics that are very useful and worth covering in more detail, even in an introductory document such as this. The interested reader is referred to more complete guides, such as this one.
Redirection¶
The standard output of commands, normally intended to be displayed on the
terminal, can be redirected to a file if needed, using the >
symbol.
For example, assuming that:
$ ls
docs html LICENSE README.md
The file listing can be redictered to a file, called listing.txt
in the
example below:
$ ls > listing.txt
This creates the file specified, and the output normally shown by ls
is not
visible on the terminal. It has however been stored in the listing.txt
file, as we can verify with cat
:
$ cat listing.txt
docs
html
LICENSE
README.md
This can also work in append mode, where the output of the command is appended to the file, rather than overwriting its entire contents.
For example:
$ app1 input output -options > log.txt
$ app2 arg1 arg2 >> log.txt
will create the log.txt
file in the first line, and record any output from
the app1
command. The second line will then append its output to the log
file.
Likewise, we can redirect the standard input to feed in the contents of a
file as input, rather than typing it in, using the <
symbol.
For example:
$ sort < myfile.txt
will feed the contents of myfile.txt
to the sort
command’s standard
input.
Pipes¶
This is a special type of redirection, where the standard output of one
command can be fed directly into the standard input of another, using the
|
symbol. Both commands run concurrently, with the second command able to
process the output of the first as soon as it is provided. This can be
incredibly useful to build compound commands.
For example:
$ grep ERROR log.txt | sort | uniq
ERROR: error type one
ERROR: input file not found
ERROR: something bad happened
uses the grep
command to find all lines in log.txt
that contain the
character string ERROR
, then feeds those lines (which would normally be
displayed on the terminal) via the pipe as input for the sort
command. This
sorts the lines in alphabetical order, and feed its output to the uniq
command, which remove duplicates. The outcome of the full pipeline is a list of
all unique error messages logged in the log.txt
file.
Another particularly useful example is to capture the output from a command
expected to produce a lot of output, and browse through it at a more suitable
pace rather than seeing it fly past on the terminal. This can be done using the
command less
(a paginator):
$ complex_process -verbose | less
This ability to quickly implememt otherwise non-trivial functionality is one of
the great strengths of the command-line. Unix is full of little tools like
grep
, sort
and uniq
that are designed to operate on text and to be
daisy-chained in this manner.
Conditional execution¶
While BASH provide its own if
statement for more complex situations, it
also offers a simple construct to allow execution of one command based on the
success or failure of another, using the &&
and ||
operators
respectively.
For example:
$ myapp args -options || echo "myapp failed to run!" >> log.txt
will record the fact that the myapp
command has failed to the log.txt
file.
On the other hand:
$ stage1 -options inputdata/ tmpdata/ && stage2 tmpdata/ outputdata/
will only run the stage2
command if the stage1
executable has completed
successfully (useful if the data produced by the first command is to be
processed by the next one).
Variables¶
It is often useful to store information in variables. For instance, you might
want to use a long and complicated filename often, and rather than typing it in
every time you need it, you could use a variable. Variables are assigned using
the =
symbol (beware: no spaces around it), and retrieved (dereferenced)
using the $
symbol.
For example:
$ logfile=/some/complicated/location/myapp/logs/run1.txt
$ myapp input intermediate > $logfile
$ otherapp intermediate output >> $logfile
...
The variable logfile
is set to the filename of the logfile, and the output
of all subsequent commands is then redirected to that file (see above).
Iterating with for loops¶
It is often required to perform the same command for a number of files. This
can be achieved simply and effectively with a for
loop, like this:
$ for item in logs/run*.txt; do grep OUTPUT $item; done
This will find all lines that contain the token OUTPUT
in the logfiles
stored in the logs/
folder that match the filename run*.txt
, and print
them on the terminal.
What actually happens here is that a variable item
is used to store each
token listed after the in
keywords (until the end of line or ;
symbol),
and the command(s) between the do
and done
keywords are then executed
for each token. The current value of the token can then be retrieved within the
loop by dereferencing it like any other variable, using the $
symbol.
Note that the above does not need to be all on the same line. In practice, lines
can be broken wherever the ;
was used in the example above:
$ for item in logs/run*.txt
> do
> grep OUTPUT $item
> done
Parameter substitution¶
There are certain operations that can be performed on variables at the point
where they are being dereferenced. Of these, the most useful are probably the
ability to strip a suffix or prefix. This is done using a syntax like
${var#prefix}
or ${var%suffix}
. This is most useful in scripts and when
combined with for
loops.
For example:
$ for data in *.dat
> do
> process $data ${data%.dat}.out > ${data%.dat}.log
> done
will run the process
command on all files in the current folder that end
with the .dat
suffix, and pass as second argument the same filename with
the .dat
suffix stripped and replaced with the .out
suffix. The output
of each command will individually be stored in log files, each with the
.log
suffix. If the current folder contained the files:
$ ls
backup/ final.dat original.dat parameters.txt trial2.dat
Then the commands actually run will be:
$ process final.dat final.out > final.log
$ process original.dat original.out > original.log
$ process trial2.dat trial2.out > trial2.log
There are many other types of parameter substitutions possible, see the relevant documentation for details.
Customising the shell¶
As noted in previous sections, the shell can be configured in various way to make it easier to use or more powerful. Many of the more useful modifications can be made via the readline library, since this is the part of the shell that allows commands to be edited. There are also various environment variables that can be set, for instance to modify the prompt, or store more commands in the history.
Startup files¶
All of these modifications are typically implemented by adding the relevant commands to the shell’s various startup files. Part of the difficulty in getting these modifications to work lie in figuring which file gets executed under what circumstances, and what settings this file affects.
When the shell starts, it will typically be invoked as a login or non-login
shell, depending on whether you have just logged into the system directly into
this shell. Generally, when you log into a desktop session, the session will be
started from the shell, and this shell is the login shell. This means any
terminal sessions you start subsequently from this graphical session are
non-login shells (although not on macOs, by default). The reason this matters
is because different startup files are used depending on whether it’s a login
shell or not. This means that settings placed in one startup file (e.g.
~/.bashrc
) may have no effect when logging in via a remote SSH connection,
for instance, since this session would start a login shell. On top of that,
different distributions have come up with subtly different conventions as to
what files are used, making it very difficult to make recommendations
guaranteed to work in all cases.
The information below relates to the Bourne Again shell (BASH), and will hopefully be representative of most systems (see the official BASH documentation for full details). The relevant files would be different for different shells:
- login shells will typically read the system-wide
/etc/profile
if it exists, then the user-specific~/.profile
or~/.bash_profile
if it exists. - non-login shells will typically read the
~/.bashrc
file if it exists. - on many distributions, the system-wide
/etc/profile
will contain instructions to read the system-wde/etc/bash.bashrc
file if it exists. - likewise, on many distributions, the user
~/.profile
will contain instructions to run the user~/.bashrc
file if found. - Settings that affect the readline library will normally go in the
~/.inputrc
file (see below).
As you can appreciate, things can rapidly become complicated. In most cases,
you should be able to add your settings to the ~/.bashrc
file, and on a
properly configured system, that should be sufficient.
Useful ~/.bashrc customisations¶
These suggestions are things that I’ve found useful to add to ~/.bashrc
,
you may find some of them to your taste:
Append to the history, do not overwrite it:
shopt -s histappend
Keep a lot more commands in the history than the default:
export HISTSIZE=10000 HISTFILESIZE=100000
Don’t keep duplicate entries in the command history:
export HISTCONTROL=ignoreboth
On colour-capable terminals, use colours in file listings:
alias ls='ls --color=auto'
Make common operations prompt you if they are about to overwrite files:
alias rm='rm -i' alias cp='cp -i' alias mv='mv -i'
Useful readline customisations¶
Most distributions will come with the following already set, but just in case,
the following is useful to put into your ~/.inputrc
if your Home, End, or
Del keys don’t work, and if you want to be able to skip over words with
Ctrl+Left/Right:
# mappings for Home/End:
"\e[1~": beginning-of-line
"\e[4~": end-of-line
"\e[7~": beginning-of-line
# Del key:
"\e[3~": delete-char
# Ctrl+arrows to skip words:
"\e[5C": forward-word
"\e[5D": backward-word
"\e\e[C": forward-word
"\e\e[D": backward-word
"\e[1;5C": forward-word
"\e[1;5D": backward-word
Generally, most distribution set up the Up & Down arrows to go through the history, with PgUp & PgDn going to the oldest and most recent entry respectively. I find a more useful use for the Up & Down arrows is to perform a search through the history. If nothing has been typed yet, this just goes through the history as is normally the case. But as soon as a few characters have been entered, only those commands in the history that start with the same fragment will come up when you press Up. This allows you to quickly retrieve a command you might have typed quite some time ago, as long as you know how it started:
# alternate mappings for "up" and "down" to search the history
"\e[A": history-search-backward
"\e[B": history-search-forward
For example, if you set a complicated environment variable at the beginning of
your session, but now need to modify its value slightly, you could just type
exp
followed by the Up arrow key, and the chances are the first match will
be the export COMPLICATED_VARIABLE=some_other_complicated_value
line that
you wanted to edit – no need to type it all in again…