Notebook-2: Thinking Like a Computer

In order to understand how to program a computer, it helps to learn how to think like a computer. At least a little bit. A lot of what we do in programming strips back the veneer of point-and-click friendliness of OSX or Windows so that we can interact with the computer in ways that require much less human input. The world will seem a bit less friendly in the short term, but you will gain the ability to work much more quickly and powerfully with the computer through code.

What is a computer?

At it's most basic, a computer is a programmable device for performing calculations.

This is a kind of computer.

As is this.

There are a huge number of videos on YouTube and resources accessible from Google that delve into more detail than we possibly can here about what is happening inside your computer.

What's Going on Inside Your Computer?

If you've never really got to grips with what is happening inside a computer, then this TedED video would be a good way to get started because it helps to explain the basics of things like I/O and what actually happens when you click with the mouse on a button. In fact, you will see that we've used code to import the YouTube video in a way that requires me to do very little work and this is one of the strengths of programming: that someone else created the code to embed a YouTube video into a notebook (which is what this web page is) and all I need to do is know how to ask that code to find the video on the YouTube web site. Everything else happens automatically.

How a Computer Adds Numbers

This next video is a little more technical and we don't really expect you to remember it, but it touches on a lot of really important concepts: binary numbers, Boolean logic, and how these basic building blocks are assembled into much more complex processes like adding numbers or, ultimately, manipulating data.

The really important thing to get from this last video is that computers are chaining together long sets of simple operations which always basically work out to 1 or 0, which is the same as True or False. This is Boolean logic and it is integral to computation and to data processing, but you should always keep in mind that a huge set of calculations are going on in your computer in an order specified by a set of rules: do 'A', then do 'B', then... When these rules become sufficiently complex (they become like long recipes!) they are called algorithms. And when they get so complicated that they are not easy to write down as a set of logical outputs, it's often easier to express in a more human-readable form... which is why we have programming languages.

But remember: finding the average of a set of numbers involves an algorithm (which, in a digital comupter, is based on lots of logical operations involving 1s and 0s). And calculating the probability that the lecturer won't show up to the first lecture also involves an algorithm, it's just that it's a much more complicated one unless you take matters into your own hands and arrange for an accident...

Working Without Buttons

At some point in your exploration of programming you will need to learn how to read/write data from elsewhere on the computer or the Internet. At that point, for anyone who learned how to use a computer after about 1990, computers can quickly come to seem very unfriendly indeed! That's because when you are programming you will need to give the computer a lot more information about how to find, read, and write data.

When you use an app on your mobile phone or on your computer it will often save your files somewhere convenient so that you can find them again, but this isn't magic: a programmer made a choice about how to do this for you, and now that you're learning to program you have to make that choice.

Paths

Have you ever thought about how and where files are stored on your computer? Probably not, unless you were very, very bored. Unfortunately, when you start programming you do need to learn a bit more about this -- enough, at least, to tell Python where to find the file that you want it to read... though you can do much more than that!

A few starting principles:

  1. Directories (a.k.a. folders) and files all have a unique location somewhere on your hard drive.

  2. A directory (or folder) can contain directories and files. A simple file cannot contain a directory. Only special types of file such as a Zip archive can contain a folder.

  3. The directory that sits at the bottom of the hierarchy (i.e. the one directory that is not in another directory) is known as the root directory.

  4. The directory in which your settings and documents are saved (i.e. the stuff associated with your username) is known as the home directory.

  5. A file must be stored in a directory (there are no root files).

Paths in the Finder

We'll get to a video in a second, but on a Mac you can 'view' the path in the Finder simply by clicking on any finder window and typing Alt + Command + P (or selecting View > Path from the menu bar). This will show the current path in a strip along the bottom of the finder window. Pay attention to the Path view in the movie below.

Paths in a Terminal

Now, sooner or later you are going to have to learn how to use the Terminal (a.k.a. Shell) because it will make certain tedious tasks go much, much faster. It is also by far the best way to install Python libraries or other 'libraries' that you need to develop your code. Learning to the use the Terminal is also going to help a lot in learning to fix subtle problems with your program (e.g. checking that you are reading the right data).

So here's the same process of navigating from a directory called KCL Modules down to the 2017-18 Geocomputation Teaching directory as you just saw above, but using the Terminal:

You'll notice that there were several seemingly cryptic commands -- we'll examine them in more detail below, but the important ones to note in this video are:

  • ls lists the contents of the current directory
  • cd changes directory

We always say that programmers are lazy and this is a good example of that: why write list when we can write ls or change directory when we can write cd? You obviously need to learn what those bits of laziness mean, but they can help a lot in speeding up your code!

You need to think of the terminal 'prompt' as having a 'location in' your computer: this works the same as clicking around through the Finder (or Windows Explorer) in that you have to move 'down' into sub-folders or 'up' into parent directories to see what's available at that 'level' of the drive.

So in the movie you've just watched:

  1. I started the movie while in a directory called KCL Teaching that (you see at the end) sits under /Users/my.blurred.username/Documents/.
  2. I listed the files and directories in KCL Teaching using ls.
  3. I then changed directory (using cd) into the directory called KCL Modules (and note that, because there's a space in the directory name I had to write this as KCL\ Modules... that's because the Terminal couldn't otherwise tell if we meant to move to a directory called KCL and then run a command called Modules; it's a long story).
  4. I then changed directory again into a directory called Undergraduate -- if you're keeping track this now means that we're 'in' /Users/my.blurred.username/Documents/KCL\ Teaching/KCL\ Modules/Undergraduate/
  5. We carry on navigating down the hierarchy until we reach the 2017-18 teaching folder, at which point I print the working directory to show you where we've ended up and that it's the same location as we reached using the Finder in the other video.

For people who are used to just saving a document pretty much anywhere on their computer (or iPad or iPhone) and having it accessible via Spotlight/Search functions this can take a lot of getting used to, but remember that in learning to program we're gradually stripping away the bells and whistles that sit between you and what the computer's actually doing.

Chaining

In and of itself the advantages of the Terminal might not be obvious, but we can also chain commands together to do several things in one go. You do not need to run the example below because it's for illustrative purposes:

curl http://bit.ly/2iIK9bA | head -1 > Header.csv

Note: If you do want to run the code above then you will probably need to do some prep-work:

  • On a Mac you will need to run (from the Terminal!) the following command: xcode-select --install. This installs the curl utility along with a bunch of other useful applications.
  • On Windows you will probably need to run something like conda install posix but unless you want to donate a Windows machine I can't test this.
  • Once that's done you should type all of the above (or copy+paste) on to one new line and you should see the first line of this file (http://bit.ly/2iIK9bA) printed to the Terminal.

This command might seem really cryptic but we can break this problem down into steps (as I did when trying to remember how to do it!):

  1. curl -- this is a tool (hopefully installed on your computer!) that allows you to download a file from the internet using only the Terminal.
  2. head -- this utility works with the top part of a file (tail starts working with the end of a file)
  3. We 'glue' the output of curl together with head using | (known as a 'pipe') -- this tells the computer to pass the output of curl to the input of head (so head sees this as a file).
  4. head -1 means 'take only the first line of the file (so head -10 would take the first ten lines of the file).
  5. We then direct the output of head to a file called Header.csv using the redirection command >.

So this one line of code allows you to download a file, extract the first line of the file (so that we can see what the variables are!), and then write it to a file on the local computer. Assuming that you didn't see any errors, you should now have a file called Header.csv and can see this for yourself!

The point of this is that we are gaining in power at the cost of 'ease of use'. Doing this same task without using code would require: navigating to the web page, clicking the link to download the file, opening the file in a text editor or Excel, selecting and copying the first line, and then opening a new file, pasting in the copied data, and saving the file with a name and location! Easier the first time round, but much more work in the long run!

Delayed Gratification

As you will experience, learning to program involves a lot of delayed gratification: you will need to invest quite a bit of time in learning to crawl before you can walk, and even more time in learning to walk before you can run. A well-designed programming language like Python can make it a little bit easier to learn each step, but it still won't make it easy.

We've said this before and we'll say it again (and again, and again) that you must think of this as learning a language: at first very little will make sense and you won't be able to say much more than 'hello, my name is X', but if you practice and really think about what you're doing it will get easier and easier, and faster and faster, to program.

Learning to navigate around your hard drive using the Terminal is a good case in point: why learn to do this when you could just dump all of your data into one directory and then never need to think about it again? Or navigate around by clicking on folders and files using the mouse?

Two reasons:

  1. Because you will need to think logically and in an organised way about how you manage data when you start doing data analysis: you will almost never receive raw data that doesn't require some work; so if you save your raw data and your processed data in the same directory how can you be sure you've loaded the right file? It's much easier to have two folders -- raw or source, and clean or output -- and use Python to read in raw data from one directory and then write the processed data out to a different directory than it would be using Excel's File > Save As. That's because the path is just text and you can easily work with it that way!

  2. Because once you've learned how to use the path on your own computer, you can use the same techniques to navigate a web server or computer on the other side of the planet using URLs (which are just paths with some special 'sauce' that tell Python that the file isn't on your computer).

Here's an example that does the same thing (and then some) as the Terminal code above, but using Python code instead:


In [1]:
import csv
import requests as r

url = 'http://bit.ly/2iIK9bA'
data = r.get(url)
content = data.content.decode('utf-8').splitlines()
cr = csv.reader(content)

for row in cr:
    if row[3] != 'Population':
        print(row[1] + " has a population of " + "{:,}".format(int(row[3])))


Greater London has a population of 9,787,426
Greater Manchester has a population of 2,553,379
West Midlands has a population of 2,440,986
West Yorkshire has a population of 1,777,934
Glasgow has a population of 1,209,143
Liverpool has a population of 864,122
South Hampshire has a population of 855,569
Tyneside has a population of 774,891
Nottingham has a population of 729,977
Sheffield has a population of 685,368
Bristol has a population of 617,280
Belfast has a population of 579,127
Leicester has a population of 508,916
Edinburgh has a population of 482,005
Brighton and Hove has a population of 474,485
Bournemouth/Poole has a population of 466,266
Cardiff has a population of 447,287
Teesside has a population of 376,633
Stoke-on-Trent has a population of 372,775
Coventry has a population of 359,262
Sunderland has a population of 335,415
Birkenhead has a population of 325,264
Reading has a population of 318,014
Kingston upon Hull has a population of 314,018
Preston has a population of 313,322
Newport has a population of 306,844
Swansea has a population of 300,352
Southend-on-Sea has a population of 295,310
Derby has a population of 270,468
Plymouth has a population of 260,203
Luton has a population of 258,018
Farnborough/Aldershot has a population of 252,397
Medway Towns has a population of 243,931
Blackpool has a population of 239,409
Milton Keynes has a population of 229,941
Northampton has a population of 215,963
Barnsley/Dearne Valley has a population of 223,281
Norwich has a population of 213,166
Aberdeen has a population of 207,932
Swindon has a population of 185,609
Crawley has a population of 180,508
Ipswich has a population of 178,835
Wigan has a population of 175,405
Mansfield has a population of 171,958
Oxford has a population of 171,380
Warrington has a population of 165,456
Slough has a population of 163,777
Peterborough has a population of 163,379
Cambridge has a population of 158,434
Doncaster has a population of 158,141
Dundee has a population of 157,444
York has a population of 153,717
Gloucester has a population of 150,053
Burnley has a population of 149,422
Basildon has a population of 144,859
Grimsby has a population of 134,160
Hastings has a population of 133,422
High Wycombe has a population of 133,204
Thanet has a population of 125,370
Accrington/Rossendale has a population of 125,059
Burton-upon-Trent has a population of 122,199
Colchester has a population of 121,859
Eastbourne has a population of 118,219
Exeter has a population of 117,763
Cheltenham has a population of 116,447
Paignton/Torquay has a population of 115,410
Lincoln has a population of 114,879
Chesterfield has a population of 113,057
Chelmsford has a population of 111,511
Basingstoke has a population of 107,642
Maidstone has a population of 107,627
Bedford has a population of 106,940

Just to highlight what we've just done:

  1. We retrieved a CSV file from somewhere on this planet using requests,
  2. We 'parsed' it so that each row became something from which we could extract variables thanks to the magic of the csv library that we import,
  3. We skip the first row because it's a 'header' row and doesn't contain data,
  4. We print out the 2nd and 4th columns of the CSV file.

You can see what the source looked like here. And you'll note that the URL looks exactly like the path that we saw when we used the Terminal to navigate between directories.

So if you grasp the concept in one context, you can apply it in others. That is scalability!

Special Paths

Finally, there are a few 'special' paths that you need to know about when using the Terminal or Python:

  • / is the root directory (so cd / would take you to the 'root' of the hard drive)
  • ~ is the home directory (this is a shortcut: it's the same as typing cd /Users/myusername/)
  • . means the current directory (this is avoid ambiguity: cd ./myusername/Documents would assume that in the current directory there is a sub-directory called Documents under myusername)
  • .. means the next directory up (this is so that you don't have to type out cd /Users/myusername/Settings just to go from Documents 'over' to Settings; instead, you can use cd ../Settings/.

In terms of terminology: any path that starts with a / is an absolute path because we are starting from the root directory; any path that starts with either .. or . is a relative path because it is starting from the current directory and is relative to where the Terminal or program 'is' now.

Windows vs Everyone Else

For historical reasons, Windows has long been just a bit different from Unix/Linux/Mac in how paths are specified: on Windows it has long been the case that the path separator was \, not / (: also isn't allowed in file name on a Mac for historical reasons)!

So the following Unix/Mac path: /Users/myusername/Documents/

On a Windows machine would usually be: \Users\myusername\Documents\

If you remember that on a Unix/Mac we use \ to manage things like spaces in a directory name then you can see how it gets pretty confusing and complicated pretty quickly when you start to program... But in Python we can cope with both of these situations and others using the os library (short for 'operating system'); see the examples below!

Remember: to turn code you need click in the code block below and then either type Ctrl+Enter or hit the 'Run' button on the menu bar above.


In [2]:
import os
# Create a path by joining together the input arguments
print(os.path.join('Users','myusername','Documents'))
print(" ")
# Create an absolute path (starts with '/') and file names that contain spaces (we don't need to worry about spaces)
print(os.path.join('/','Users','myusername','Documents','My Work','Code Camp'))
print(" ")
# Create a relative path (starts with '..') and file names that contain spaces (we don't need to worry about spaces)
print(os.path.join('..','Documents','My Work','Code Camp'))
print(" ")
# 'Expand' the current user's home directory (~) into an absolute path
print(os.path.expanduser("~"))
print(" ")
# 'Expand' the current directory path (.) into an absolute path
print(os.path.abspath('.'))
print(" ")


Users/myusername/Documents
 
/Users/myusername/Documents/My Work/Code Camp
 
../Documents/My Work/Code Camp
 
/home/jovyan
 
/home/jovyan
 

That's quite a lot of new content to get your head around, so you might want to spend some time exploring your own computer using the Terminal (Shell/Power Shell on Windows) to get the hang of how paths work. All you'll need are: cd, ls, and pwd (or their Windows equivalents and here for more) to get started!

Power Terminal Features

You might recall that we previously said that all programmers are lazy, and understanding this laziness is the key to thinking that most Terminal (and Python) commands look like the result of monkeys hitting keys at random to being able to write code without constantly having to check Google. Let's revisit some of the commeands we've just seen:

Command Short For... What It Does
ls list List the contents of the directory I'm about to give you ('ls ~' lists the contents of your home directory; 'ls ..' lists the contents of the parent directory; etc.)
cd change directory Move to the path I'm about to give you ('cd ~' moves you to your home directory; 'cd ..' moves you to the parent directory; etc.)
pwd print working directory Show the path of the current current directory (i.e. where am I now?)
head Get the head of a file Print the first few lines of the specified file (e.g. head somefile.csv)
tail Get the tail of a file Print the last few lines of the specified file (e.g. tail -n 2 somefile.csv)
clear Clear the Terminal When the Terminal looks really busy with junk you don't need just type clear
exit Exit the Terminal To close an open Terminal window
mv Move a directory or file Like dragging things in the Finder window (e.g. mv somefile.csv ../another_dir/ to move the file somefile.csv over to a directory named another_dir that is accessible from the parent directory of this one)
cp Copy a directory or file Like dragging things in the Finder window (e.g. cp somefile.csv ../another_dir/ to make a copy of the file somefile.csv in a directory named another_dir that is accessible from the parent directory of this one)

You can see how all of these commands are basically abbreviations/mnemonics that make it easy for programmers to be lazy, but productive: less time typing == more time thinking/doing.

Credits!

Contributors:

The following individuals have contributed to these teaching materials:

License

The content and structure of this teaching project itself is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license, and the contributing source code is licensed under The MIT License.

Acknowledgements:

Supported by the Royal Geographical Society (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.

Potential Dependencies:

This notebook may depend on the following libraries: None


In [ ]: