Python, like the other languages R and MATLAB, is currently very popular for data science work, but unlike those languages it is also a powerful general purpose programming language so is well suited to very different activities such as creating dynamic websites and bioinformatics. It is named after Monty Python, a surreal comedy group (see this clip for a classic Monty Python sketch).
Here are some reasons why Python is extensively used by data scientists:
TODO: INTRO SENTENCE
TODO: FINISH THE ABOVE USING PREVIOUS SYLLABUS BELOW
lists e.g. sequence of values from a sensorlistslistdictionariessets, tuples (brief)ranges, skipping steps and leaving early)lists and ndarraysndarraysones, zeros, empty, arange, linspace)Dataframes and Series (row and column indexes)INTEGRATE THROUGHOUT
os.pathdatetimejsonsys for command-line argumentsitertools, functoolsTODO: EXPLAIN THAT TOP DOWN, NOT BOTTOM UP, TO ENSURE GET TO DOING SOMETHING INTERESTING/USEFUL QUICKLY; MEANS THAT MAY NOT ALWAYS UNDERSTAND EVERY PRESENTED LINE OF CODE AT FIRST
Each of the lessons in this course (including this introduction) is what is called a Jupyter Notebook. These note contain explanatory text but also can provide an environment for interactively viewing, editing and running code, enabling you to learn by doing/experimenting.
You are invited to work through these lessons independently and at your own pace.
If you are at a Code Cafe event (an informal workshop hosted on University of Sheffield premises) then instructors will be on hand to help you. Also, we have an interactive discussion notebook (https://v.etherpad.org/p/code_cafe) where you can ask questions and make comments. TODO: CREATE NEW ETHERPAD
If you are working remotely by yourself then ... TODO: FEEDBACK MECHANISM FOR DISTANCE LEARNERS (SMC, ETHERPAD, MAILING LIST, GOOGLE GROUP, GH ISSUES?). EXPLAIN IN SIMPLE TERMS WHAT PROVIDED.
The first step is to ensure that you are viewing these Notebooks in a way where you can interact with them (as opposed to viewing a non-editable static snapshot of a Notebook on a site such as github or nbviewer). You have several options for viewing/editing/running Notebooks:
Please open this Notebook usign SMC or Jupyter running on your own machine (if you have not done so already) before continuing.
TODO: CLEAR ENOUGH?
Each Jupyter Notebook is a document comprised of a sequence of cells. A cell can contain formatted text (as this one does) or some lines of runnable code (like the cell below). Code cells can generate output, which here is the single value produced by the last line of code but could be a table or a graph. Try it: Click on the following cell (or use the cursor keys) to highlight it then press Shift-Enter to run it:
In [ ]:
pi = 3.141593
radius = 0.5
area = pi * radius * radius
area
Ignore the details of how this Python code produced a result for now; this is simply a demonstration of how Jupyter Notebooks work.
You can create cells and run code cells in any order you like and the values you create in one cell will be available when you next run a cell, allowing you to interactively explore code/data over time. Run the following cell to see how we still have access to the value associated with area:
In [ ]:
area * 2
You can think of a Notebook as being a little like an Excel spreadsheet containing just a single column, the key differences being that
To edit code cells, click inside them or press enter when they are surrounded by a blue border. Try this: Edit the code cell above, replace 2 with another number then re-run the cell.
You can edit text cells in the same way. Text formatting effects are achieved by writing [Markdown[(https://daringfireball.net/projects/markdown/) in these cells rather than just plain text. Try it: double-click within this cell to see how Markdown was used to denote headings, bold text, links...
Run the cell to render it as attractive HTML.
See the 'Edit', 'Insert' and 'Cell' menus directly above the Notebook for further ways of manipulating cells. Also, note that there are keyboard shortcuts for most Notebook actions (see the 'Help' menu).
You may be wondering what is happening behind the scenes and how Jupyter relates to Python. In brief:
Knowing more about Jupyter and IPython is not important at this stage; however, it it useful to have a little understanding of what these pieces of software are and how they fit together.
A brief note on restarting the kernel associated with a particular Notebook: if you want to forget all the values you have created in memory since first running a code cell in a Notebook then click 'Kernel' -> 'Restart'. This only erases the kernel's working memory; it does not change the code/text in the Notebook's cells.
Most lessons contain a 'setup' code cell. You need to run this before running any other code cells and you should not edit it nor do you need to focus on trying to understand its contents. Please run the following setup cell.
In [ ]:
____ = 0
from numpy.testing import assert_equal
from codecs import decode
There are exercises throughout the lessons to help confirm your understanding of different concepts. These are not externally assessed and are entirely for your own benefit / learning. The exercises typically require you to write/alter some code so that a statement is true.
For example, replace '____' in the following cell with a number so that the mathematical expressions before and after the comma (within the brackets) are exactly equal.
In [ ]:
assert_equal(____, 6 + 18)
Next, edit that cell and change the number you entered to a different number then re-run the cell so see what happens when the expressions either side of the comma are not equal.
Certain exercises come with hints to help you. However, these have been encoded using a simple cypher called rot13, so you have to explicitly decode them if you think you need them.
For example, say that your hint is the following ROT13-encoded string of characters:
'Gel zhygvcylvat ol guerr orsber lbh qb nalguvat ryfr'
You can decode this hint in a code cell like this:
In [ ]:
decode('Gel zhygvcylvat ol guerr orsber lbh qb nalguvat ryfr', 'rot13')