The Command Line Interface (CLI)

The first paragraph of this introduction serves as our motivation:

The command-line interface, sometimes referred to as the CLI, is a tool into which you can type text commands to perform specific tasks—in contrast to the mouse's pointing and clicking on menus and buttons. Since you can directly control the computer by typing, many tasks can be performed more quickly, and some tasks can be automated with special commands that loop through and perform the same action on many files—saving you, potentially, loads of time in the process.

Windows-users should move to the Linux virtual machine provided, Mac-users can choose to remain on their 'native' platform and simply start the Terminal.app found in Appliations/Utilities.

Resources

In addition to the links in this document, you can get inspiration from a provided excerpt of the book "Data Science at the Command Line" by Jeroen Janssens (Published by O’Reilly Media).

If you find yourself in need of a thorough introduction to the CLI, the freely-available book The Linux Command Line is highly recommended.

Commands

There is an equivalent explicit command to every operation you can perform using the mouse. Note that this is not to say that being explicit and literal would generally be easier than using the mouse! Use the right tool for the job.

As detailed on this page, the basic syntax of the CLI is:


In [ ]:
UTILITY -optional_flags ARGUMENT1 ARGUMENT2 ...

where the 'utility' performs the task you want, 'flags' modify its default behaviour, and 'arguments' determine the scope of the operation (e.g. which files or directories to operate on). Depending on the task, utilities take from zero to many arguments.

Terminal

The word 'terminal' stems from the days of building-sized calculators (1950's): it was the single access point to the computer. In modern computers we can have any number of 'virtual' terminals open, as applications (for Mac and Linux users, Windows doesn't have a useful analogue). We issue our commands the the command prompt, which typically is the dollar-sign ($). Any messages or command outputs from the computer to us will appear after the prompt.

A note on the terminal prompt

Like most things in Linux, the command prompt can be customised. This is the case for both the VM and Mac users. For example, the 'working directory' (see below) is often shown by default.

The shell: a command interpreter

We'll talk about computer programs and languages a little later. For now, suffice it to say that the specific set of commands available for use are determined by the shell: a program that interprets commands.

The shell communicates with the OS, which translates instructions to the kernel, which is yet another program layer responsible for hardware-interactions (CPU, memory, disk, screen, etc.). But we digress: the shell is the 'deepest' you are likely ever to have the need to go!

Bourne Again SHell (bash)

We will be using bash to interact with the computer. This is the default shell on most Linux-based OSs, as well as macOS.

Aside: Linux vs. macOS

'Linux' is in fact the name of the kernel, many OSs (sometimes called 'distributions') are built on top of it, such as RedHat/Fedora, SuSe, Mint, etc. The OS X-line of Mac operating systems (renamed macOS) shares a common 'parent' with Linux: UNIX. The Darwin-kernel underneath macOS is like linux's sibling (they took very different routes in their lives though...) For this reason, the experience of interacting with linux and macOS computers at the shell-level is very similar, and academic software development on Linux is feasible to translate to macOS.

Directory navigation

The commands we meet in this section are for casually browsing through the file system. None of the do anything to files; an analogy might be just browsing around using Windows Explorer or Finder.

  • ls
  • pwd
  • cd
  • file
  • less
  • head and tail
  • wc

Listing contents and moving around

The command ls (short for 'list') shows the contents of the current directory


In [ ]:
ls

But where am I? Use 'print working directory' to find out:


In [ ]:
pwd

To move around, use cd to 'change directory'. The utility takes a single argument, the $TARGET directory


In [ ]:
cd $TARGET

When the Terminal-application starts, it will by default place you in your home folder.

What directory (a.k.a. filesystem location) are you in now? Where is 'home'?

What is the contents of your home folder?

Change directory into the Documents-folder and display the contents (if any...)

To 'go up' one level in the directory hierarchy, use the 'dot-dot'-syntax (NB the space between cd and ..!)


In [ ]:
cd ..

The path-separator: /

Based on the output of the pwd-command, you can determine that the symbol separating levels of the directory hierarchy on both Linux and Mac is the forward slash: /. In contrast, Windows uses the backslash (\).

The term 'path' refers to the 'route' you would have to travel to reach any file/folder in the directory hierarchy.

Special notations

There are three special locations that have their own notations:

  • . (dot)
  • .. (dot dot)
  • ~/ (tilde slash)

'Dot' is short for 'current working directory', i.e., the output of pwd. 'Dot dot' is short for 'the parent of the current directory'. 'Tilde slash' is short for 'my home directory'.

Where are you now?

Relative vs. absolute paths as arguments

Each file system has a root: the highest level of the hierarchy of files and folders. This is denoted simply as /. (The Windows-equivalent would be 'C:\')

Utilities like ls and cd can take relative or absolute paths as arguments:

  • an absolute path explicitly lists the location of a TARGET, from root-level down
  • a relative path only lists the location relative to the present working directory

Change directory to the root of the filesystem & list the contents

Change directory to the folder local in the folder usr at the root of the filesystem, using only a single command

Change to your home directory using its absolute path in a single command

Hint: cd has a default value if you give it no arguments: it takes you home.


In [ ]:
cd

Permissions

Linux/Mac strongly enforce access permissions: some directories and files are off-limits to you as a 'normal' user! For example, you cannot enter other people's home folders, and there are lots of files you can read, but not write to.

Try to change directory to: /root

In addition to users, Linux/Mac operates with a notion of groups. Execute the groups-command to find out which you belong to.

Tab autocompletion

This is the most important feature shell-newcomers must learn! You never (ever) type full paths into the terminal. Instead, you let the shell 'guess' what you want by typing in the beginning of the path, and hitting the Tab-key!

Whenever you need to type out a location in an argument (for example, in the cd command), you don't have to type out the whole thing: the first few letters will do. Once you've typed three or four letters, press the tab key, and the command line will fill in the rest for you! For example, if you are in your home directory, and you type cd Desk and then press the tab key, the command line will automatically complete the command to read cd Desktop! You can also use this if you find yourself mistyping folder names: tab autocompletion will always fill it in correctly.

Do as the quote above says!

Command history

All the commands you type into the terminal (including the ones that didn't actually work) are saved into the command history. When at the prompt, you can hit the arrow up-key to go back to the previous command. Hit it again to get to the previous one, etc. Arrow down will bring you back in the history.

Scroll through your command history until you find one using the ls-command. Change the argument to something else and run it.

Go to (change directory to) your course materials-folder.

The following assumes that you placed the notebooks-folder of the course materials into the folder you shared with the VM. If you did something else, please use your host OS (Windows or Mac) to move notebooks so that it can be viewed accessed by the guest OS (Linux) too.

But where can I find my shared-folder in Linux (the VM)?!

VirtualBox 'mounts' shared folder into the /media-directory and appends sf_ (for 'shared folder') to the name of the share. If you created the share exactly as instructed, you'll find it in: /media/sf_shared.

File types: textual or binary?

Note that the suffix of a file does not necessarily carry any information on its contents! Some suffixes, such as .txt are considered "standardised", but the operating system does not enforce any rules on the naming of files.

Use the command file to determine the file types of:

  • short_textual_file.xyz
  • long_textual_file.log
  • some_unknown_file.fmt

Remember to use tab-completion, folks!


In [ ]:
file short_textual_file.xyz

ASCII and Unicode: character encoding

American Standard Code for Information Interchange (ASCII) is a mapping between a byte value and a character such as 'a' or '@'.

How many distinct values can a byte of information represent?

What about all the French "e, è, é, ê, ë" 's?!? Not to mention Cyrillic, Arabic, etc. characters? Enter Unicode, and specifically the most commonly used encoding: UTF-8. UTF-8 encodes characters with up to 21 bits, and can therefore represent 2.097.151 unique characters. 89.7% of websites are UTF-8 encoded.

Displaying the contents of a textual file in bash

It is often most efficient to view the contents of small files in the terminal, instead of a graphical text editor.


In [ ]:
cat short_textual_file.xyz

For longer files, the less command is very handy.


In [ ]:
less long_textual_file.log

Some navigation commands include:

  • [arrow keys]: move up/down a line at a time
  • [space or n]: move down a page (next page)
  • [b]: move up (back) a page
  • [g]: move to top of file
  • [G] (Shift-G): move to bottom of file
  • [q]: quit viewing the file

First/last n lines in a file

  • display the first 10 lines of long_textual_file.log using the head-command with flag -n 10
  • display the last 10 lines of long_textual_file.log using the tail-command with flag -n 10

Counting lines, words and characters: wc

Sometimes it can be useful to count the number of elements in a text file; use the utility wc ('word count') for this:


In [ ]:
wc short_textual_file.xyz

The output is read as

    lines    words    characters

You can also use the -l, -w and -m flags to only show one of the three outputs.

What happens if you try to display the contents of a binary file?


In [ ]:
less binary_image_file

To 'See it anyway', hit 'y' for 'yes'.

A closer look at the list of files in a folder

List the contents of the notebooks-folder using the flag: -l ("minus el")

This page should help in figuring out the result. Note that command flags can be combined, e.g., -la.

Append a to the l-flag; which new items do you see?

Append h to the l-flag; what happens to the file size?

Getting help: read the man-ual

Many built-in shell commands have a so-called man-page:


In [ ]:
man ls