In order to understand how to program a computer, it helps to learn how to think like a computer. At least a little bit. A lot of what we do in programming strips back the veneer of point-and-click friendliness of OSX or Windows so that we can interact with the computer in ways that require much less human input. The world will seem a bit less friendly in the short term, but you will gain the ability to work much more quickly and powerfully with the computer through code.
There are a huge number of videos on YouTube and resources accessible from Google that delve into more detail than we possibly can here about what is happening inside your computer.
If you've never really got to grips with what is happening inside a computer, then this TedED video would be a good way to get started because it helps to explain the basics of things like I/O and what actually happens when you click with the mouse on a button. In fact, you will see that we've used code to import the YouTube video in a way that requires me to do very little work and this is one of the strengths of programming: that someone else created the code to embed a YouTube video into a notebook (which is what this web page is) and all I need to do is know how to ask that code to find the video on the YouTube web site. Everything else happens automatically.
This next video is a little more technical and we don't really expect you to remember it, but it touches on a lot of really important concepts: binary numbers, Boolean logic, and how these basic building blocks are assembled into much more complex processes like adding numbers or, ultimately, manipulating data.
The really important thing to get from this last video is that computers are chaining together long sets of simple operations which always basically work out to 1 or 0, which is the same as True or False. This is Boolean logic and it is integral to computation and to data processing, but you should always keep in mind that a huge set of calculations are going on in your computer in an order specified by a set of rules: do 'A', then do 'B', then... When these rules become sufficiently complex (they become like long recipes!) they are called algorithms. And when they get so complicated that they are not easy to write down as a set of logical outputs, it's often easier to express in a more human-readable form... which is why we have programming languages.
But remember: finding the average of a set of numbers involves an algorithm (which, in a digital comupter, is based on lots of logical operations involving 1s and 0s). And calculating the probability that the lecturer won't show up to the first lecture also involves an algorithm, it's just that it's a much more complicated one unless you take matters into your own hands and arrange for an accident...
At some point in your exploration of programming you will need to learn how to read/write data from elsewhere on the computer or the Internet. At that point, for anyone who learned how to use a computer after about 1990, computers can quickly come to seem very unfriendly indeed! That's because when you are programming you will need to give the computer a lot more information about how to find, read, and write data.
When you use an app on your mobile phone or on your computer it will often save your files somewhere convenient so that you can find them again, but this isn't magic: a programmer made a choice about how to do this for you, and now that you're learning to program you have to make that choice.
Have you ever thought about how and where files are stored on your computer? Probably not, unless you were very, very bored. Unfortunately, when you start programming you do need to learn a bit more about this -- enough, at least, to tell Python where to find the file that you want it to read... though you can do much more than that!
A few starting principles:
Directories (a.k.a. folders) and files all have a unique location somewhere on your hard drive.
A directory (or folder) can contain directories and files. A simple file cannot contain a directory. Only special types of file such as a Zip archive can contain a folder.
The directory that sits at the bottom of the hierarchy (i.e. the one directory that is not in another directory) is known as the root directory.
The directory in which your settings and documents are saved (i.e. the stuff associated with your username) is known as the home directory.
A file must be stored in a directory (there are no root files).
We'll get to a video in a second, but on a Mac you can 'view' the path in the Finder simply by clicking on any finder window and typing Alt + Command + P
(or selecting View > Path
from the menu bar). This will show the current path in a strip along the bottom of the finder window. Pay attention to the Path view in the movie below.
Now, sooner or later you are going to have to learn how to use the Terminal (a.k.a. Shell) because it will make certain tedious tasks go much, much faster. It is also by far the best way to install Python libraries or other 'libraries' that you need to develop your code. Learning to the use the Terminal is also going to help a lot in learning to fix subtle problems with your program (e.g. checking that you are reading the right data).
So here's the same process of navigating from a directory called KCL Modules
down to the 2017-18 Geocomputation Teaching directory as you just saw above, but using the Terminal:
You'll notice that there were several seemingly cryptic commands -- we'll examine them in more detail below, but the important ones to note in this video are:
ls
lists the contents of the current directorycd
changes directoryWe always say that programmers are lazy and this is a good example of that: why write list
when we can write ls
or change directory
when we can write cd
? You obviously need to learn what those bits of laziness mean, but they can help a lot in speeding up your code!
You need to think of the terminal 'prompt' as having a 'location in' your computer: this works the same as clicking around through the Finder (or Windows Explorer) in that you have to move 'down' into sub-folders or 'up' into parent directories to see what's available at that 'level' of the drive.
So in the movie you've just watched:
KCL Teaching
that (you see at the end) sits under /Users/my.blurred.username/Documents/
.KCL Teaching
using ls
.cd
) into the directory called KCL Modules
(and note that, because there's a space in the directory name I had to write this as KCL\ Modules
... that's because the Terminal couldn't otherwise tell if we meant to move to a directory called KCL
and then run a command called Modules
; it's a long story).Undergraduate
-- if you're keeping track this now means that we're 'in' /Users/my.blurred.username/Documents/KCL\ Teaching/KCL\ Modules/Undergraduate/
For people who are used to just saving a document pretty much anywhere on their computer (or iPad or iPhone) and having it accessible via Spotlight/Search functions this can take a lot of getting used to, but remember that in learning to program we're gradually stripping away the bells and whistles that sit between you and what the computer's actually doing.
In and of itself the advantages of the Terminal might not be obvious, but we can also chain commands together to do several things in one go. You do not need to run the example below because it's for illustrative purposes:
curl http://bit.ly/2iIK9bA | head -1 > Header.csv
Note: If you do want to run the code above then you will probably need to do some prep-work:
xcode-select --install
. This installs the curl
utility along with a bunch of other useful applications.conda install posix
but unless you want to donate a Windows machine I can't test this.This command might seem really cryptic but we can break this problem down into steps (as I did when trying to remember how to do it!):
curl
-- this is a tool (hopefully installed on your computer!) that allows you to download a file from the internet using only the Terminal.head
-- this utility works with the top part of a file (tail
starts working with the end of a file)curl
together with head
using |
(known as a 'pipe') -- this tells the computer to pass the output of curl
to the input of head
(so head sees this as a file).head -1
means 'take only the first line of the file (so head -10
would take the first ten lines of the file).head
to a file called Header.csv
using the redirection command >
.So this one line of code allows you to download a file, extract the first line of the file (so that we can see what the variables are!), and then write it to a file on the local computer. Assuming that you didn't see any errors, you should now have a file called Header.csv
and can see this for yourself!
The point of this is that we are gaining in power at the cost of 'ease of use'. Doing this same task without using code would require: navigating to the web page, clicking the link to download the file, opening the file in a text editor or Excel, selecting and copying the first line, and then opening a new file, pasting in the copied data, and saving the file with a name and location! Easier the first time round, but much more work in the long run!
As you will experience, learning to program involves a lot of delayed gratification: you will need to invest quite a bit of time in learning to crawl before you can walk, and even more time in learning to walk before you can run. A well-designed programming language like Python can make it a little bit easier to learn each step, but it still won't make it easy.
We've said this before and we'll say it again (and again, and again) that you must think of this as learning a language: at first very little will make sense and you won't be able to say much more than 'hello, my name is X', but if you practice and really think about what you're doing it will get easier and easier, and faster and faster, to program.
Learning to navigate around your hard drive using the Terminal is a good case in point: why learn to do this when you could just dump all of your data into one directory and then never need to think about it again? Or navigate around by clicking on folders and files using the mouse?
Two reasons:
Because you will need to think logically and in an organised way about how you manage data when you start doing data analysis: you will almost never receive raw data that doesn't require some work; so if you save your raw data and your processed data in the same directory how can you be sure you've loaded the right file? It's much easier to have two folders -- raw
or source
, and clean
or output
-- and use Python to read in raw data from one directory and then write the processed data out to a different directory than it would be using Excel's File > Save As
. That's because the path is just text and you can easily work with it that way!
Because once you've learned how to use the path on your own computer, you can use the same techniques to navigate a web server or computer on the other side of the planet using URLs (which are just paths with some special 'sauce' that tell Python that the file isn't on your computer).
Here's an example that does the same thing (and then some) as the Terminal code above, but using Python code instead:
In [1]:
import csv
import requests as r
url = 'http://bit.ly/2iIK9bA'
data = r.get(url)
content = data.content.decode('utf-8').splitlines()
cr = csv.reader(content)
for row in cr:
if row[3] != 'Population':
print(row[1] + " has a population of " + "{:,}".format(int(row[3])))
Just to highlight what we've just done:
requests
,row
became something from which we could extract variables thanks to the magic of the csv
library that we import
,You can see what the source looked like here. And you'll note that the URL looks exactly like the path that we saw when we used the Terminal to navigate between directories.
So if you grasp the concept in one context, you can apply it in others. That is scalability!
Finally, there are a few 'special' paths that you need to know about when using the Terminal or Python:
/
is the root directory (so cd /
would take you to the 'root' of the hard drive)~
is the home directory (this is a shortcut: it's the same as typing cd /Users/myusername/
).
means the current directory (this is avoid ambiguity: cd ./myusername/Documents
would assume that in the current directory there is a sub-directory called Documents
under myusername
)..
means the next directory up (this is so that you don't have to type out cd /Users/myusername/Settings
just to go from Documents
'over' to Settings
; instead, you can use cd ../Settings/
.In terms of terminology: any path that starts with a /
is an absolute path because we are starting from the root directory; any path that starts with either ..
or .
is a relative path because it is starting from the current directory and is relative to where the Terminal or program 'is' now.
For historical reasons, Windows has long been just a bit different from Unix/Linux/Mac in how paths are specified: on Windows it has long been the case that the path separator was \
, not /
(:
also isn't allowed in file name on a Mac for historical reasons)!
So the following Unix/Mac path: /Users/myusername/Documents/
On a Windows machine would usually be: \Users\myusername\Documents\
If you remember that on a Unix/Mac we use \
to manage things like spaces in a directory name then you can see how it gets pretty confusing and complicated pretty quickly when you start to program... But in Python we can cope with both of these situations and others using the os
library (short for 'operating system'); see the examples below!
Remember: to turn code you need click in the code block below and then either type Ctrl+Enter
or hit the 'Run' button on the menu bar above.
In [2]:
import os
# Create a path by joining together the input arguments
print(os.path.join('Users','myusername','Documents'))
print(" ")
# Create an absolute path (starts with '/') and file names that contain spaces (we don't need to worry about spaces)
print(os.path.join('/','Users','myusername','Documents','My Work','Code Camp'))
print(" ")
# Create a relative path (starts with '..') and file names that contain spaces (we don't need to worry about spaces)
print(os.path.join('..','Documents','My Work','Code Camp'))
print(" ")
# 'Expand' the current user's home directory (~) into an absolute path
print(os.path.expanduser("~"))
print(" ")
# 'Expand' the current directory path (.) into an absolute path
print(os.path.abspath('.'))
print(" ")
That's quite a lot of new content to get your head around, so you might want to spend some time exploring your own computer using the Terminal (Shell/Power Shell on Windows) to get the hang of how paths work. All you'll need are: cd
, ls
, and pwd
(or their Windows equivalents and here for more) to get started!
You might recall that we previously said that all programmers are lazy, and understanding this laziness is the key to thinking that most Terminal (and Python) commands look like the result of monkeys hitting keys at random to being able to write code without constantly having to check Google. Let's revisit some of the commeands we've just seen:
Command | Short For... | What It Does |
---|---|---|
ls | list | List the contents of the directory I'm about to give you ('ls ~ ' lists the contents of your home directory; 'ls .. ' lists the contents of the parent directory; etc.) |
cd | change directory | Move to the path I'm about to give you ('cd ~ ' moves you to your home directory; 'cd .. ' moves you to the parent directory; etc.) |
pwd | print working directory | Show the path of the current current directory (i.e. where am I now?) |
head | Get the head of a file | Print the first few lines of the specified file (e.g. head somefile.csv ) |
tail | Get the tail of a file | Print the last few lines of the specified file (e.g. tail -n 2 somefile.csv ) |
clear | Clear the Terminal | When the Terminal looks really busy with junk you don't need just type clear |
exit | Exit the Terminal | To close an open Terminal window |
mv | Move a directory or file | Like dragging things in the Finder window (e.g. mv somefile.csv ../another_dir/ to move the file somefile.csv over to a directory named another_dir that is accessible from the parent directory of this one) |
cp | Copy a directory or file | Like dragging things in the Finder window (e.g. cp somefile.csv ../another_dir/ to make a copy of the file somefile.csv in a directory named another_dir that is accessible from the parent directory of this one) |
You can see how all of these commands are basically abbreviations/mnemonics that make it easy for programmers to be lazy, but productive: less time typing == more time thinking/doing.
The following individuals have contributed to these teaching materials:
The content and structure of this teaching project itself is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license, and the contributing source code is licensed under The MIT License.
Supported by the Royal Geographical Society (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.
This notebook may depend on the following libraries: None
In [ ]: