The UNIX Shell

This lesson is adapted from the one of the same name from Software Carpentry


This lesson can be done interactively with the students and this notebook distributed for reference.


The Unix shell is older than most of the people who use it. It has survived so long because it is one of the most productive programming environments ever created—maybe even the most productive. Its syntax may be cryptic, but people who have mastered it can experiment with different commands interactively, then use what they have learned to automate their work.

  • The shell is an interactive interpreter: it reads commands, finds the corresponding programs, runs them, and displays output.

  • Output can be redirected using > and <.

  • Commands can be combined using pipelines.

  • The history command can be used to view and repeat previous operations, while tab completion can be used to save re-typing.

  • Directories (or folders) are nested to organize information hierarchically.

  • Use grep to find things in files, and find to find files themselves.

  • Programs can be paused, run in the background, or run on remote machines.

  • The shell has variables like any other program, and these can be used to control how it behaves.


Introduction

  • The shell has been an important interface for 40 years (born in 1975).
  • Today, shells are used to interact with servers or with programs that have a command line interface.
  • Shells also allow users to quickly combine several tools to minimize programming effort.

At a high level, computers really do four things:

  • Run programs
  • Store data
  • Communicate with each other
  • Interact with us

How we interact with the computer is varied and evolving (mouse, keyboard, touchscreen, etc.). The oldest method of interaction is called CLUI, or command-line user interface, to distinguish it from the GUIs, or graphical user interfaces, that most of us are now used to.

Workflow at the command line:

  • User logs in
  • User types a command
  • Computer executes the command
    • and prints its output
  • User types another command
  • Computer executes the command
    • and prints its output
  • …and so on until the user logs off

In between the user and the computer is another program called the command shell

What the user types goes into the shell, which figures out what commands to run and orders the computer to execute them. The computer then sends the output of those programs back to the shell, which takes care of displaying things to the user.

A shell is just a program like any other; the only thing that's different about it is that its job is to run other programs, rather than to do calculations itself.

The most popular Unix shell is bash, the Bourne again shell. (It's called that because it's derived from a shell written by Stephen Bourne. This is what passes for wit among programmers.) Bash is the default shell on most modern implementations of Unix…

Using it, or any other shell, feels a lot more like programming than like using windows and mice.

Commands are terse—often only a couple of characters long—and their names are often cryptic.

So why should you use it? There are two good reasons.

  1. First, many tools only have command-line interfaces, or are easiest to use—particularly on remote machines—through the command line.
  2. Second, the shell allows you to combine existing tools in powerful ways to create new tools of your own with little or no programming. This lets you do a lot of work with just a few keystrokes—once you have paid the up-front cost of learning how the shell works and what its basic commands are.

Files and Directories

  • Logging in to the shell and running basic utilities.
  • Investigating the file and directory structure from the command line using the commands pwd, ls, and cd.

Some of the commands we will use most often are ones related to storing data on disk.

The subsystem reponsible for this is called the file system.

It organizes our data into files, which hold information… …and directories, which hold files or other directories.

How can we use the shell to run other programs that will show us what's in the file system?

  1. Start by logging in to the computer.
  2. On the dock near the bottom of your screen you should see an icon like this:
  3. Click on that and it should bring up a window that looks something like this:
  4. You can customize its look by setting the preferences.

Once we have logged in, we'll see a shell prompt, which is usually just a dollar sign (but which may show extra information, like our user ID).

The shell prompt signals that the shell is waiting for us to type something in.

Type "whoami", followed by "enter". This command prints out the ID of the current user, i.e., shows us who the shell thinks we are.

When we enter it, the shell finds a program called whoami

...runs it...

…displays its output…

…and then displays a new prompt, telling us that it's ready for more commands.

Now that we know who we are, we can find out where we are using pwd, which stands for "print working directory".

This is our current default directory, i.e., the directory the computer assumes we want to use unless we specify something else explicitly.

The computer's response is /Users/jklay. To understand what this means, let's have a look at how the file system as a whole is organized.

Tour of filesystem:

At the very top of the file system is a directory called the root directory that holds everything else the computer is storing.

/

Inside that directory (or underneath it, if you're drawing a tree) are several other directories, such as bin (which is where some built-in programs are stored), data, users, tmp, and so on.

We know that our current working directory, /Users/jklay, is stored inside /Users because /Users is the first part of its name. Similarly, we know that /Users is stored inside the root directory / because its name begins with /.

Underneath /Users, we find one directory for each user with an account on this machine.

Notice, by the way, that there are two meanings for the / character. When it appears at the front of a file or directory name, it refers to the root directory. When it appears inside a name, it's just a separator.

$ ls

stands for listing, prints the names of all the files and directories in the current directory in alphabetical order, arranged neatly into columns.

To make its output more comprehensible, we can give it the argument, or flag,

$ ls -F

This tells ls to add a trailing / to the names of directories.

By convention, the second part of a filename, called the filename extension, indicates what type of data the file holds.

.txt signals a plain text file, .pdf indicates a PDF document, .cfg is a configuration file full of parameters for some program or other, and so on. However, this is only a convention, and not a guarantee. Files contain bytes, nothing more; it's up to us and our programs to interpret those bytes according to the rules for PDF documents, images, and so on.

Relative paths vs Absolute paths

We can use cd followed by a directory name to change our working directory.

$ cd

stands for "change directory"……which is a bit misleading: the command doesn't change the directory……it changes the shell's idea of what directory we are in.

cd doesn't print anything, but if we run pwd after it, we can see that we are now "in" a new directory

If we run ls again it will show us the listing of files and directories where we are now.

To go up in the directory tree, use

$ cd ..

.. is a special directory name meaning "the directory containing this one". Or more succinctly, the parent of the current directory.

The special directory .. doesn't usually show up when we run ls.

$ ls -a

shows all, including .. and also another special directory that's just called ., which is the directory we're currently in. It may seem redundant to have a name for where we are, but we'll see some uses for it later.


Creating and Deleting

  • Creating directories and files using mkdir and a simple editor called nano.
  • Moving and copying files.
  • Deleting files and directories.

When we type in commands like ls or pwd, the shell finds the corresponding programs, runs them on our behalf, and shows us their output. But how do we create files and directories for it to show us?

Let's create a new directory called tmp with the command mkdir tmp.

$ mkdir tmp

As you might guess from its name, mkdir means "make directory".

Since tmp is a relative path (without a leading slash), the new directory is made below the current one.

If you run ls again you will see it in the list. However, there's nothing below it yet: tmp is empty, which we can tell if we try to list its contents:

$ ls tmp

The command doesn't print any output, indicating that tmp is empty. If we use ls -a tmp to show directories whose names start with '.', though, we see that . and .. are there, as they always are.

Let's change our working directory to tmp using cd, then run the command nano junk.

$ cd tmp
$ nano junk

nano is a simple text editor. It can only work with plain character data, not tables, images, or any other human-friendly media.

You can start typing and your text will appear in the window starting at the cursor location.

When you are done typing, you can save the file to disk by holding down the "control" button and the o button at the same time. By convention, Unix uses the caret ^ followed by a letter to mean "control plus that letter".

Once our quotation is saved, we can use Control-X to quit the editor and return to the shell.

nano doesn't leave any output on the screen after it exits, but ls now shows that we have created a file called junk.

Let's tidy up by running rm junk. "rm" stands for "remove"—this command deletes files.

It's important to remember that there is no undelete. Unix doesn't move things to a trash bin: it unhooks them from the file system so that their storage space on disk can be recycled. Tools for finding and recovering deleted files do exist, but there's no guarantee they'll work in any particular situation, since the computer may reclaim the file's disk space right away.

If we now run ls, its output is empty once again, which tells us that our file is gone.

Now recreate the file by opening nano again with nano junk, then cd up one directory.

$ nano junk
$ cd ../

If we try to remove the tmp directory using rm tmp, we get an error message: rm only works on files, not directories.

$ rm tmp
rm: cannot remove `tmp': Is a directory

The right command is rmdir, which stands for "remove directory".

$ rmdir tmp
rmdir: failed to remove `tmp': Directory not empty

It doesn't work yet either, though, because the directory we're trying to remove isn't empty.

If we want to get rid of tmp we must first delete the file junk.

$ rm tmp/junk
$ rmdir tmp

The directory is now empty, so rmdir deletes it.

Create that directory and file one more time.

$ mkdir tmp
$ nano tmp/junk
$ ls tmp
junk
$

junk isn't a particularly informative name, so let's change the file's name using mv.

$ mv tmp/junk tmp/quotes.txt

mv is short for "move": we use it to move a file from one place to another.

It also works on directories: there is no separate mvdir command.

The first argument tells mv what we're moving. The second tells it where to move it to.

The general form of the command is

$ mv <source> <destination>

where the source can be a file or a directory and the same for the destination.

In this case, we're moving tmp/junk to tmp/quotes.txt, which has the same effect as renaming the file.

Sure enough, ls shows us that tmp now contains one file called quotes.txt.

$ ls tmp
quotes.txt

We can bring that file into the current working directory by using the mv command, but this time the second argument is a directory.

$ mv tmp/quotes.txt .

The effect is to move the file from the directory it was in to a different directory. You can use ls to see that tmp is now empty and the file has been moved to the current working directory.

The cp command works very much like mv, except it copies a file instead of moving it.

$ cp <source> <destination>

where the source must be a file but the destination can be either a file or a directory.

To summarize, here are the commands we've seen so far, along with the two special directory names.

`pwd` print working directory
`cd` change working directory
`ls` listing
`.` current directory
`..` parent directory
`mkdir` make a directory
`nano` text editor
`rm` remove (delete) a file
`rmdir` remove (delete) a directory
`mv` move (rename) a file or directory
`cp` copy a file

With this information you will be able to do most of the things we will need to do at the command line in the terminal. As you get more comfortable working with the command line, we'll learn new commands to expand your ability to work with the filesystem.


All content is under a modified MIT License, and can be freely used and adapted. See the full license text here.