There's no reason you'd know this yet, but the web page you're looking at is also known as a 'Jupyter notebook' -- it's why you can 'run' code as part of the web page. Check out this example by clicking in the box (next to the In [ ]) and hitting the 'run' button in the toolbar above or typing Ctrl+Return at the same time.
In [ ]:
print('Hello world')
If all has gone well you should have seen Hello world
appear on a line all on its own. That was Python code running in a notebook. Here's some more code:
In [ ]:
print( (1+2+3+4)/4 )
You can check that this gave you the right answer at the bottom of this page.
Anyway, because of their history, some people will call these "iPython notebooks", others will call them "Jupyter notebooks", and some will just stick with "notebooks". They are all the same thing.
Here's some more code to run as proof that this is actual code:
In [ ]:
import sys
print(sys.version)
You don't need to understand all of that output, the point is that this is Python and we can do anything in a notebook that we would in a program.
Rather than throw you in at the deep end with examples taken from computer science classes, for Code Camp we've tried to give you geographical examples whenever possible in the hopes that the early examples will seem a little less abstract and a little more relevant to your needs. Of course, the early examples are also very basic so the payoff might not be obvious right away, but trust us: if you stick with it you will start to change your thinking about geography as a discipline and about the power of computers to transform everything.
So, before we do any more coding, let's think about why might want to use this technology in geography.
We live in a world transformed by big (geo)data: from Facebook likes and satellites, to travel cards and drones, the process of collecting and analysing data about the world around us is becoming very, very cheap. Twenty years ago, gathering data about the human and physical environment was expensive, but now a lot of it is generated as the ‘exhaust’ of day-to-day activity: tapping on to the bus or train, taking photos (whether from a satellite, drone, or disposable camera), making phone calls, using our credit cards, and surfing the web. And that's before you start looking at the Terabytes of data being generated by satellites, air quality and river flow sensors, and other Earth Observation Systems!
As the costs of capturing, curating, and processing these data sets falls, the discipline of geography is changing. You face a world in which many of the defining career options for geographers with basic quantitative skills will either no longer exist, or will have been seriously de-skilled. So much can now be done through a web browser (e.g. CartoDB) that specifying ‘Knowledge of ArcGIS’ is becoming superfluous; not because geo-analysis jobs are no longer in demand or no longer done -- in fact, they are more vital than ever -- but because the market for these skills has split in two: expensive, specialist software is being superseded by simple, non-specialist web-based tools on the ‘basic’ side, and by customised code on the 'advanced' side.
It is for these reasons that terms like 'geocomputation', 'computational geography' and 'geographic data science' are back in vogue. After a period in which GIS was front-and-centre for many geographers with an interest in spatial data (as well for many geographers who objected to the shortcomings of quantitative approaches), the availability of data and advanced computational techniques (including Machine Learning), together with the 'discovery' by other disciplines of the role of geography in 'big data' processes, has created a need for a 'new' (or old, depending on your view) type of geographer able to reason much more directly through code while remaining rooted in the critical geographic tradition that is aware of the short-comings (and opportunities) of data.
Computational approaches -- which is to say, approaches to geography using computers executing commands written in programming code -- differ in important ways from the quantitative skills taught in traditional geography ‘methods’ classes: computational geography is underpinned by algorithms that employ concepts such as iteration and recursion, and we use these to tackle everything from a data processing problem to an entire research question.
For example, Alex Singleton’s OpenAtlas (available for free from the Consumer Data Research Centre) contains 134,567 maps. Alex designed and wrote a script to iterate over the Census areas (i.e. to ‘visit’ each area in turn when creating a map), and to recurse into smaller sub-regions from larger regions (i.e. to keep drilling down into smaller and smaller geographies) in order to generate maps at, literally, every conceivable scale. Then he let the computer do the ‘boring bit’ of actually creating each and every map.
Thinking algorithmically requires students and professionals to deal with abstraction: we don’t want to define how each analysis should work – or how each map should look – rather, we want to specify a set of rules about how to select and display data on a map, and then let the computer make them all for us. In this way of working it’s not really any more work to create 500 or 5,000 maps than it is to create 5 because we’ve already told the computer how to make useful maps.
Here's another way to think about it:
An algorithm is like a recipe. It takes "inputs" (the ingredients), performs a set of simple and (hopefully) well-defined steps, and then terminates after producing an "output" (the meal).
This article also goes on to make some interesting points about AI and deep learning that are well worth a read, but for our purposes, the bit about a recipe is the important bit: how would you break your problem down into steps like the ones you'd see for a recipe?
Learning to think this way is hard work: the first time I try a new recipe I really don't know how things are going to taste. Similarly, the first time I use an algorithm to make a map or solve a problem I usually don't actually know exactly how my maps are going to look until after I've made them. The difference from the 'normal', non-computational way of working is that I make a few changes to my code and then just run it again. And again... as many times as I need to in order to get what I want. I can keep changing the recipe until I get it just right.
However, trying the same recipe again and again and again also sounds like hard work! Wouldn't it be faster to just click and choose what you want the computer to do in SPSS or ArcMap? Well, yes and no. The two advantages to doing this with code over pointing-and-clicking are: 1) your solution is transferrable; and 2) thinking 'like a programmer' is also about problem-solving, and that also transfers very nicely to the 'real world' of employment.
Why do we say this:
Programming solutions are transferrable because you aren't just solving one problem, you are solving classes of problems. In the same way that many recipes build on the same basic ingredients (sometimes adding something new for added 'spice'), many applications use the same basic ingredients: it's how they're put together in new ways that lead to new outputs. It's a lot like Lego.
Thinking like a programmer also translates well because you are learning to deal with abstraction. Yes, the details of a problem matter (just as ignoring cultural differences between two countries can matter), but it's important to be able to break a really big, messy, complex problem down into smaller, tidier, more comprehensible bits that you can tackle. Programmers deal with this every day, so they tend to develop important skills in understanding and dealing with practical challenges of the sort that you'll face every day in your career.
Here's another useful bit of insight:
The best way [of solving problems] involves a) having a framework and b) practising it.
Problem-solving skills are almost unanimously the most important qualification that employers look for... more than programming languages proficiency, debugging, and system design...
— Hacker Rank (2018 Developer Skills Report)
You really should read the article (it's not very long) but here are the key points:
If you don't take our word for it, how about taking Richard Feynman's word on it?
If you can’t explain something in simple terms, you don’t understand it.
Once I've got a solution to my current problem, I can take that code and apply it to a new problem. Or a new case study. Or, I can post it online and let others build off of my work to tackle problems that I've not even considered! Giving away my code might seem like a bad idea, but think about this: in a world of exciting research questions, are you going to be able to tackle every single one? Your own work already builds off of code that other people gave away (the Mac OS, Linux, QGIS, Python, etc.)... perhaps you should give something back to the community? Not just because it's a nice thing to do, but because people will find out about you through your code. And those people might be in a position to offer you a job, or they might approach you as a collaborator, or they might point someone else with an interesting opportunity in your direction because you have built a reputation as a 'contributor'.
A big gap is opening up between the stuff that can be done by pushing buttons (which no longer even really requires geographical training) and the 'cutting edge'. There are many pieces that argue this case, but here are a few to start with:
There are many good reasons for geographers to learn to code, but let's start with some good general reasons why you should learn to program a computer even if you never use it to make a map or complete a bit of spatial analysis.
You should learn how to program a computer because it teaches you how to think.
And here is a useful perspective on whether or not learning to code is hard:
If you don't have a reason for learning to code outside of trying making lots of money that's not a very long term passion... but when you have an idea or a problem that you're passionate about solving then that's why we keep on going... but do you need to have an understanding of complex math or logic skills, the answer is no.
So while 'making money' is (often) a nice outcome of learning to code, having a passion for what you want to do with your code is what's going to get you through the worst parts of the learning curve. You also need to be realistic: to become a professional programmer is something that happens over many years, you probably won't just take a couple of classes and then go out into the world saying "I'm a programmer."
And, no, you do not need to know advanced maths in order to learn how to code: you need to be able to think logically and to reframe your problems in ways that align with the computer.
In a practical context we think that the benefits of learning to code fall into three categories:
Often, the payoff for coding the answer to a problem instead of just clicking through the options in SPSS or Arc can seem a long way away. It's like learning a new language: you spend a lot of time asking directions to the train station or whether someone had a nice breakfast before you can start work on the novel or the business case. But the payoff is there if you stick with it!
Another useful idea comes from Larry Wall (the man with the strong 'tache game below!), who created a programming language called Perl. Larry said that programmers had three virtues: Laziness, Hubris, and Impatience.
Some of the reasons that these are virtues in programming (but not in your studies!) are as follows:
Hint: you'll see a lot of laziness when you start trying to write code. Programmers don't like writing remove
when they could just write rm
, nor do they like writing define
when they could just write def
. Keep an eye out for these mnemonics as they can be pretty daunting at first.
Larry also pointed out that these virtues had three mirror-image false virtues:
There's a lot more thinking on this here: http://blog.teamtreehouse.com/the-programmers-virtues
In the early days of computing, programs weren't even written in English (or any other human language), they were written in Assembly/Machine Code. One of the people who thought that was crazy was this rather impressive Rear Admiral:
Grace Hopper felt that applications should be written in a way that more people could understand; this would be good for the military, but it would also be good for business and society in general. For her efforts, she is now known as the Mother of COBOL (the COmmon Business Oriented Language), a language that is still in (some) use today.
The best way to be a 'good' programmer is to know when the computer can help you and when it will just get in the way. A computer cannot 'solve' a problem for you, but it can help you to find the answer when you've told it what to look for and what rules to use in that search. A computer can only do exactly what you tell it to do, so if you don't know what to do then the computer won't either.
One of the founders of computing, Charles Babbage had this to say:
On two occasions I have been asked, — "Pray, Mr Babbage, if you put into the machine wrong figures, will the right answers come out?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. Passages from the Life of a Philosopher (1864), ch. 5 "Difference Engine No. 1"
Modern programmers call this: garbage in, garbage out. GIGO, for short.
The single most important thing that you can learn is how to frame your problem in a way that you can communicate to a computer. Crucially, the real power of the computer isn't figuring out how to tell it to add 1, 2, 3, 4
together and calculate the mean (average), it's figuring out how to tell it to add any possible set of numbers together and work out the mean. It's when you start to see ways to do this for sets of hard problems (for you) that you know you've started to understand the language.
So these notebooks are intended to help you start learning a new language -- a programming language -- but you should always remember this: you're not stupid if you don't know how to explain things to the computer so that it can help you find the answer. You're learning the basics of how to communicate with computers; there are only two things that are silly: the first is expecting to be able to run before you can walk; the second is copying and pasting answers without trying to understand why they are answers.
Actually, there is a third silly thing: not asking for help. In the same way that learning a new human language is easier if you immerse yourself in the culture and make friends with some native speakers, learning a new computer lanugage is easier if you immerse yourself in the culture of computing and make friends with some more proficient speakers. But just as your French- or Chinese-speaking friends will get tried of answering your questions if you don't make it obvious that you're working hard to learn, most programmers won't be very impressed if you just ask for 'the answer' and then come back two days later with the same question.
There are obviously many ways that you can calculate the mean (also known as the average if your maths is a little rusty): in your head, using pencil and paper, on a calculator, in Excel... and, of course, using code! For a small set of simple numbers, using your brain is going to be a lot faster than typing it into a calculator or computer.
In the area immediately below this paragraph you should see something like "In [ ]". On the right of this is an empty box into which you can type computer code (which we will call a 'code cell'). We can use Python like a calculator but doing it from a keyboard instead of a calculator's keypad.
For example, to calculate 2 plus 2 using Python type 2 + 2
in the code cell and run the code.
HINT: To run ensure your cursor is in the code cell and click the the 'Run' button in the toolbar above (the sideways-pointing triangle). Or, again with the cursor in the code cell, you can also type Ctrl+Return on the keyboard (that's the Control button and the Return button simultaneously).
In [ ]:
(2+2+1250)/3
If everything has gone well then you should see something like "Out [ ]" appear with the answer to the formula 2 + 2
. Hopefully it is 4
!
That was an easy one; faster to calculate in our head than use Python.
But quick, what's the average (mean) of: 1 + 2 + 3 + 4
?
A little harder, huh? So type your 'formula' to calculate the mean of 1, 2, 3, 4
then click the 'play' button on the tool bar at the top of the window to run your first piece of Python code!
In [ ]:
HINT: Your equation should include the four numbers above with some +
symbols and (
and )
, a /
symbol, and another number.
Did you get 2 or 2.5?
HINT: If you are totally at a loss for what to type in the code cell, just click the "show solution" button.
Even if you typed the equation incorrectly, you may not have obtained the 'correct' answer (in Python2 the result of (1+2+3+4)/4
is 2
, whereas in Python3 the result is 2.5
). If you got the 'right' answer (2.5
) then pat yourself on the back, but be warned that that was mostly luck because Python3 is trying to be helpful. If you got the 'wrong' answer that does not mean that you have done something wrong: computers do exactly what you tell them to do, and many of the problems you'll encounter in Code Camp and beyond are a result of the computer mindlessly and stupidly doing exactly what you told it to do, and not what you meant for it to do!
Or as some swag once put it:
Computers are tools for making very fast, accurate mistakes.
In [ ]:
(1 + 2 + 3 + 4) / 4
If you are using Python version 3 (which you should be), you should have got the result 2.5
If you got the result 2
, you are using Python version 2 and will need to check (and change) your installation of Python (go back to the Python Setup page, or try posting a query on Slack for help).
This is really important: just because you told the computer to do something that you eventually realise was 'wrong' (you may even feel silly) does not mean that you are stupid.
Did you ever try to learn a foreign language? Did you expect to be fluent after a couple of classes? Did you accidentally tell your host that you were 'pregnant' when you were just 'embarrassed'? Assuming that you had a realistic expectation of how far you'd get with French, Chinese, or English in your first couple of years, then you probably figured it'd be a while before you could hold a conversation with someone else.
So why would you expect to sit down at a computer and be able to hold a conversation with it (which is another way of thinking about what coding is) after reading a few pages of text and watching a YouTube video or two? You will need to give it time, you will need to get used to looking at the documentation, you will need to ask for help (this seems like a good time to introduce Stack Overflow), and you will need to persevere.
Your language class (assuming that you took one) probably had a 'lab' where you practised your new language and you probably made a lot of mistakes when you were getting started. It's the same for programming: the reason you got a 'silly' answer is that we haven't taught you how to ask the right question yet! For a language like Python "4
" is not always the same as "4.0
"... and sometimes the way ahead is unclear, but don't worry if you can't get the right answer yet, how to 'talk numbers' is the main topic in the next notebook and we'll show you the answer at the end of this notebook.
So, we want you to remember that there are no stupid questions when it comes to programming. We have all been lost at one point or another! Your lecturers and instructors frequently still ask for help (often by asking Google!), it's just that we do it for harder questions. And that's only because we have had a lot more practice in the language of programming. So the only silly thing you can do in this course is to assume that you can speed through the questions and don't have to practice or think through the technical aspects of coding.
We'll say it again: there's one other silly thing that you can do, and that's not asking for help at all. If you've been banging your head against the computer for five or ten minutes trying to get something to work and it just isn't working then try explaining it to a friend or the instructor! Depending on the question we might: give you the answer and explain why it's the answer; give you a really annoying partial answer because it's something that we want you to figure out for yourself; or say "That's a really tough question, we'll have to look up the answer". You never know until you ask.
There are web sites that will give you answers (in fact, we're going to point you to some of them) but if you don't expend any effort in trying to understand how the code works, or if you just copy the answer off of your friend, that's the same as assuming you'll learn a foreign language just because you're sitting next to a friend who is taking the same language course! That is also silly.
You will need to practice in order to progress. You don't learn French or Chinese by practising in the language lab once a week. You won't learn to program a computer by only practising during the practicals once a week.
What makes a computer potentially better than a calculator (or your brain) is that a computer isn't daunted by having to count lots of numbers and it doesn't need you to input each number individually! The computer can also do things like:
And it can do all of this in a matter of milliseconds! It can also do the same for 3,000 cities just as easily; sure, it'll take a little bit longer, but it's the same basic code.
In other words, code is scalable in a way that brains and calculators are not and that is a crucial difference.
Here's a trivial example of when computers start to get better and faster than brains:
In [ ]:
(23495.23 + 9238832.657 + 2 + 12921)/4
Remember: click in the cell and then hit Ctrl+Enter
to 'run' the cell and get the answer.
In these notebooks we will be using the Python programming language. As with human languages, there are many programming languages in the world, each with their own advantages and disadvantages, and each with their own vocabulary (allowed words) and grammar (syntax). We use Python.
Alongside Python, the other language that is often mentioned by people doing data-led research is R. It's the other one that many of your lecturers and a lot of other scientists use in a lot of their work. There's a great deal of debate about the relative merits of Python and R, but for our purposes, both Python and R can help us to undertake the geographical analysis. That is, in fact, the premise of this entire course!
So why have we chosen to use Python here? Of the two languages, we think that Python has some specific advantages:
However, if you have been told R is the way to go then don't worry, the concepts covered here still translate. And many of the contributors to these notebooks use both languages... it just depends on the problem.
Python was invented by Guido van Rossum in the late 1980s and he continues in the role of 'benevolent dictator' to this day, which means that he (and some other very smart people) try to ensure that the language continues to meet the basic goals of:
So while Python is not a language that enables the computer to make calculations the fastest (C and C++ are faster), nor is it the safest (you wouldn't use it to fly a rocket to Mars), it is a very readable, learnable and maintainable language.
So if you want to learn to code, to do 'data science', or build a business, Python is a great choice.
The points above are also made in Python In A Nutshell by Martin Brochhaus which you may find interesting and useful to accompany your learning of Python.
What are computers good at?
What are computers currently still bad at?
There is a long-standing contest, called the Turing Test in honour of the famous computer pioneer, that demonstrates this difference rather nicely: a computer passes the Turing test if it can fool a person into thinking that they're talking to another person. Some people have claimed that if a computer can really pass the Turing Test by keeping up a conversation of indefinite length on any range of topics then we'll have to declare that machines have become full AIs (Artificial Intelligences). To put it another way: if it sounds like a human and responds like a human... then is it a human?
Perhaps fortunately for us, although computers are getting a lot better at holding up their end of the conversation they still seem to have a hard time fooling anyone for very long. In contrast, bigger and better computers have now beat the best humans at Chess and Go, and are being used to help us understand earthquakes and climate change on a huge scale. Here, computers can do billions -- or trillions -- of calculations a second to work out that if 'A' happens then 'B' is the next most likely thing to happen, and so on and so on.
The difference is that games like Go and Chess have well-understood rules as (ultimately) do natural processes like climate change and earthquakes. Chess is 'easier' for a computer than Go because a big enough computer can work out every possible chess move and pick the best one, whereas it can't do that for Go and so has to make 'choices' based on incomplete information. Earthquakes have even more 'rules', but as far we know they still follow some set of rules dictated by physics and chemistry.
People, however, don't use the same unchanging rules in conversation. Yes, conversations have norms, unless you're using an online comment forum where it's normal to start a conversation by asking someone if they're an idiot, but people don't just 'play games' within the rules, they actually play with the rules themselves in a way that computers find very, very hard to follow. Think of sarcasm: you say one thing but it means exactly the opposite. And if it's delivered deadpan then sometimes even people have trouble knowing if you're being sincere!
That's why AI of the sort you might have seen in 2001 or Blade Runner has been twenty years away for the last sixty years! Recently, computers have been getting better and better at doing really difficult things, but it's usually still in a narrow area where we understand the rules and we normally need to spend a lot of time training the computer.
Turing, A (1950), Computing Machinery and Intelligence, Mind LIX (236): 433–460 doi: 10.1093/mind/LIX.236.433, ISSN 0026-4423
The following individuals have contributed to these teaching materials:
The content and structure of this teaching project itself is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license, and the contributing source code is licensed under The MIT License.
Supported by the Royal Geographical Society (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.
This notebook may depend on the following libraries: None