The first paragraph of this introduction serves as our motivation:
The command-line interface, sometimes referred to as the CLI, is a tool into which you can type text commands to perform specific tasks—in contrast to the mouse's pointing and clicking on menus and buttons. Since you can directly control the computer by typing, many tasks can be performed more quickly, and some tasks can be automated with special commands that loop through and perform the same action on many files—saving you, potentially, loads of time in the process.
Windows-users should move to the Linux virtual machine provided, Mac-users can choose to remain on their 'native' platform and simply start the Terminal.app
found in Appliations/Utilities
.
In addition to the links in this document, you can get inspiration from a provided excerpt of the book "Data Science at the Command Line" by Jeroen Janssens (Published by O’Reilly Media).
If you find yourself in need of a thorough introduction to the CLI, the freely-available book The Linux Command Line is highly recommended.
There is an equivalent explicit command to every operation you can perform using the mouse. Note that this is not to say that being explicit and literal would generally be easier than using the mouse! Use the right tool for the job.
As detailed on this page, the basic syntax of the CLI is:
In [ ]:
UTILITY -optional_flags ARGUMENT1 ARGUMENT2 ...
where the 'utility' performs the task you want, 'flags' modify its default behaviour, and 'arguments' determine the scope of the operation (e.g. which files or directories to operate on). Depending on the task, utilities take from zero to many arguments.
The word 'terminal' stems from the days of building-sized calculators (1950's): it was the single access point to the computer. In modern computers we can have any number of 'virtual' terminals open, as applications (for Mac and Linux users, Windows doesn't have a useful analogue). We issue our commands the the command prompt, which typically is the dollar-sign ($
). Any messages or command outputs from the computer to us will appear after the prompt.
Like most things in Linux, the command prompt can be customised. This is the case for both the VM and Mac users. For example, the 'working directory' (see below) is often shown by default.
We'll talk about computer programs and languages a little later. For now, suffice it to say that the specific set of commands available for use are determined by the shell: a program that interprets commands.
The shell communicates with the OS, which translates instructions to the kernel, which is yet another program layer responsible for hardware-interactions (CPU, memory, disk, screen, etc.). But we digress: the shell is the 'deepest' you are likely ever to have the need to go!
We will be using bash
to interact with the computer. This is the default shell on most Linux-based OSs, as well as macOS.
'Linux' is in fact the name of the kernel, many OSs (sometimes called 'distributions') are built on top of it, such as RedHat/Fedora, SuSe, Mint, etc. The OS X-line of Mac operating systems (renamed macOS) shares a common 'parent' with Linux: UNIX. The Darwin-kernel underneath macOS is like linux's sibling (they took very different routes in their lives though...) For this reason, the experience of interacting with linux and macOS computers at the shell-level is very similar, and academic software development on Linux is feasible to translate to macOS.
The command ls
(short for 'list') shows the contents of the current directory
In [ ]:
ls
But where am I? Use 'print working directory' to find out:
In [ ]:
pwd
To move around, use cd
to 'change directory'. The utility takes a single argument, the $TARGET
directory
In [ ]:
cd $TARGET
When the Terminal-application starts, it will by default place you in your home folder.
To 'go up' one level in the directory hierarchy, use the 'dot-dot'-syntax (NB the space between cd
and ..
!)
In [ ]:
cd ..
/
Based on the output of the pwd
-command, you can determine that the symbol separating levels of the directory hierarchy on both Linux and Mac is the forward slash: /
. In contrast, Windows uses the backslash (\
).
The term 'path' refers to the 'route' you would have to travel to reach any file/folder in the directory hierarchy.
There are three special locations that have their own notations:
.
(dot)..
(dot dot)~/
(tilde slash)'Dot' is short for 'current working directory', i.e., the output of pwd
. 'Dot dot' is short for 'the parent of the current directory'. 'Tilde slash' is short for 'my home directory'.
Each file system has a root: the highest level of the hierarchy of files and folders. This is denoted simply as /
. (The Windows-equivalent would be 'C:\')
Utilities like ls
and cd
can take relative or absolute paths as arguments:
Hint: cd
has a default value if you give it no arguments: it takes you home.
In [ ]:
cd
Linux/Mac strongly enforce access permissions: some directories and files are off-limits to you as a 'normal' user! For example, you cannot enter other people's home folders, and there are lots of files you can read, but not write to.
/root
In addition to users, Linux/Mac operates with a notion of groups. Execute the groups
-command to find out which you belong to.
This is the most important feature shell-newcomers must learn! You never (ever) type full paths into the terminal. Instead, you let the shell 'guess' what you want by typing in the beginning of the path, and hitting the Tab-key!
Whenever you need to type out a location in an argument (for example, in the cd command), you don't have to type out the whole thing: the first few letters will do. Once you've typed three or four letters, press the tab key, and the command line will fill in the rest for you! For example, if you are in your home directory, and you type cd Desk and then press the tab key, the command line will automatically complete the command to read cd Desktop! You can also use this if you find yourself mistyping folder names: tab autocompletion will always fill it in correctly.
All the commands you type into the terminal (including the ones that didn't actually work) are saved into the command history. When at the prompt, you can hit the arrow up-key to go back to the previous command. Hit it again to get to the previous one, etc. Arrow down will bring you back in the history.
ls
-command. Change the argument to something else and run it.The following assumes that you placed the notebooks
-folder of the course materials into the folder you shared with the VM. If you did something else, please use your host OS (Windows or Mac) to move notebooks
so that it can be viewed accessed by the guest OS (Linux) too.
VirtualBox 'mounts' shared folder into the /media
-directory and appends sf_
(for 'shared folder') to the name of the share. If you created the share exactly as instructed, you'll find it in: /media/sf_shared
.
In [ ]:
file short_textual_file.xyz
American Standard Code for Information Interchange (ASCII) is a mapping between a byte value and a character such as 'a' or '@'.
What about all the French "e, è, é, ê, ë" 's?!? Not to mention Cyrillic, Arabic, etc. characters? Enter Unicode, and specifically the most commonly used encoding: UTF-8. UTF-8 encodes characters with up to 21 bits, and can therefore represent 2.097.151 unique characters. 89.7% of websites are UTF-8 encoded.
In [ ]:
cat short_textual_file.xyz
For longer files, the less
command is very handy.
In [ ]:
less long_textual_file.log
Some navigation commands include:
In [ ]:
wc short_textual_file.xyz
The output is read as
lines words characters
You can also use the -l
, -w
and -m
flags to only show one of the three outputs.
In [ ]:
less binary_image_file
To 'See it anyway', hit 'y' for 'yes'.
notebooks
-folder using the flag: -l
("minus el")This page should help in figuring out the result. Note that command flags can be combined, e.g., -la
.
a
to the l
-flag; which new items do you see?h
to the l
-flag; what happens to the file size?
In [ ]:
man ls