In [ ]:
# Introduction to Python for Data science: 01 - First steps
This is designed to be a self-directed study session where you work through the material at your own pace. If you are at a Code Cafe event, instructors will be on hand to help you.
If you haven't done so already please read through the **[Introduction](./00-Introduction.ipynb)** to this course, which covers:
1. **What Python is** and **why it is of interest**;
1. **Learning outcomes** for the course;
1. The course **structure** and **support facilities**;
1. An introduction to **Jupyter Notebooks**;
1. Information on course **exercises**.
This lesson covers:
<!-- * [Lesson setup code](#Lesson-setup-code) -->
* [Simple commands and calculations](#Simple-commands-and-calculations)
* [Using maths functions from the numpy Python package](#Using-maths-functions-the-numpy-Python-package)
* [Functions with named and optional arguments](#Functions-with-named-and-optional-arguments)
* [Getting help](#Getting-help)
* [Variables](#Variables)
It will be useful to keep the **[Introduction](./00-Introduction.ipynb)** material open in a separate tab whilst working on this session.
**OLD BELOW**
* [Plotting data](#Plotting-data)
* [Packages](#Packages)
* [The current working directory](#The-current-working-directory)
* [Importing your own data](#Importing-your-own-data)
* [Scripts](#Scripts)
* [Further reading and next steps](#Further-reading-and-next-steps)
* [Getting help NOTES](#Getting-help-NOTES)
* [References](#References)
In [ ]:
____ = 0
import os
import numpy as np
from codecs import decode
from numpy.testing import assert_almost_equal, assert_array_equal
Python is a command based system which means that you (usually) interact with it by entering commands rather than using a Graphical User Interface (GUI). Some of these commands are rather straightforward! For example, Python can be used to do arithmetic. Run each cell in turn to evaluate the following expressions:
In [ ]:
1 + 1
In [ ]:
3 * 9
In [ ]:
377 / 120
Power terms can be expressed using the ** operator e.g. for $2^8$:
In [ ]:
2 ** 8
For more complex expressions remember that some mathematical operations are preferentially evaluated before others.
Parentheses can be used to override that order of evaluation:
In [ ]:
31 * (365 - 30) / 10
numpy Python packageAs shown, support for basic arithmetic is built into Python itself but for other operations such as sin, cos and log we need to import functions for performing these operations from a Python package. A package is a collection of useful bits of code that we can reuse in many different programs. Some packages come with Python itself, whilst others must be installed separately.
Here we import the numpy package and use a function provided by the package to calculate the square root of 2.
In [ ]:
import numpy as np
np.sqrt(2)
(Keen-eyed readers will notice that we previously executed the import... line above back in the first code cell in this Notebook. It doesn't do any harm to repeat this here.)
This is the first time we've entered a function in Python so let's discuss some details. In the above,
sqrt, numpy package (here aliased as np for convenience) andThe terms call, evaluate and argument are oft-used by programmers.
Python is case sensitive. For example, a valid function call is np.sqrt(2) with everything in lower case. Variations such as np.Sqrt(2) or np.SQRT(2) won't work (try it in a new code cell below).
The numpy package also provides the standard trigonometry functions such as sin, cos and tan. These take their arguments in radians rather than degrees. As such, a right angle is pi/2 rather than 90.
In [ ]:
np.sin(np.pi / 2)
Note that we didn't need to import numpy again; we only need to do so once per interactive IPython session.
numpy's log function takes the natural logarithm by default:
In [ ]:
np.log(10)
If you want to calculate a logarithm to base 10, you need to use the np.log10 function.
As water flows thorugh a drinking water pipe under pressure it looses energy (pressure) due to friction at the pipe wall. This can be quantified using the following equation (the Swamee–Jain approximation of the Colebrook White formula, but you don't need to know anything about this):
$$f = 0.25 \left(\log_{10} \left(\frac{k_s}{3.7D} + \frac{5.74}{\mathrm{Re}^{0.9}}\right)\right)^{-2}$$Replace '____' below with some Python that calculates a value for $f$ (the 'friction factor') using
See the course Introduction section for more information on asserting that you've got the correct answer to an exercise. Also, if you get stuck then follow the link above regarding Python's order of operations to check that you know which operations will be done first and where you might need parentheses.
In [ ]:
assert_almost_equal(____, 0.08948259, decimal=6)
In [ ]:
round(1.23456)
Alternatively you can round to a different number of decimal places by supplying a second argument when calling the round function. Arguments are separated by commas.
In [ ]:
round(1.23456, ndigits=3)
This shows another feature of Python functions: named arguments.
Here, the function is calle using two arguments, with the second being associated with the ndigits parameter. A parameter is (in simple terms) the name of a function input (e.g. the name ndigits), whereas an argument is the value sent to the function (e.g. the number 3).
Since the second argument to round is, by design, always the number of decimals you could have simply executed
In [ ]:
round(1.23456, 3)
but the named argument version is more readable. Also, note that a value does not always need to be given for ndigits as it has a default value of 0.
Built in to Python itself and the packages you import is a large amount of documentation that you can call on any time.
Firstly, if you forget the names and order of the round function's arguments you can ask Jupyter+IPython to display information about the function in a separate pane by calling the function with ? instead of parentheses:
In [ ]:
round?
Secondly, you can see a more terse pop-up summary of a function by placing the cursor within round then press < Shift >< Tab > Try this
Thirdly, if you forget the name of a function or how to spell it you can type part of the name then press < tab > to autocomplete it (functionality sometimes known as tab completion).
For example, if you can't remember whether the numpy function for randomly generating a number is called random or rand, try typing np.ran then press < tab >.
In [ ]:
np.ran
In this case the function name is not immediately autocompleted as the entered characters are ambiguous: two functions start with ran. Select the one you want from the drop-down menu that appears after pressing < tab >
Fourthly, see the Help menu at the top of the Jupyter interface for links to the full reference documentation for Python, numpy and several other popular Python data science packages.
We'll rarely want to perform a calculation and throw away the result. It is much more likely that we'll want to store the result in Python's memory for later use, either as part of future calculations or ready for export to external files.
We do this by assigning the results of calculations to variables. For example:
In [ ]:
a = np.sin(1)
b = 10
c = a + b
c = c / 2
c
In the above, we created three variables called a, b and c. With each line (starting with the first), the expression on the right-hand side of the = sign is evaluated (by calling functions and/or performing arithmetic to generate a single value), then the result is assigned to the variable on the left-hand side.
After assigning a value to a variable, the variable can be
You can list the names of the variables that currently exist in this interactive IPython session using:
In [ ]:
%who
Some of these are variables you have created; others are variables that were created in the Notebook setup cell above.
To see the value of any given variable, just execute a Notebook cell where the last line is just the variable name:
In [ ]:
c = c + 5
c
You can also view a summary of the names and values of all variables currently defined in your interactive IPython session using:
In [ ]:
%whos
The middle column above shows the data type of each variable. This denotes (and limits) the operations that can be applied to a given variable (such as c) or a literal value (such as the number 1.23456). The data types you will most frequently encounter include:
int: integers a.k.a. whole numbers e.g. 6float: decimal numbers e.g 6.3. You will also encounter float64 and possibly float32 (both provided by the numpy package)str: strings of characters e.g. "Subject A" or 'Sheffield, S1 3JD'bool: a boolean value i.e. True or FalseWe will revisit the idea of data types at a later stage.
Now, suppose that after defining a variable we now want to remove it from Python's memory. We need execute a del statement e.g.
In [ ]:
del c
Action: list all currently-defined variables to prove that c no longer exists.
If we want to delete all currently-defined variables in an IPython session we can select Kernel -> Restart from the Jupyter Notebook menu bar or altenatively run a cell that contains:
In [ ]:
%reset
Please now move on to Lesson 02, where we'll be looking at how we can load, manipulate and analyse some tabular data (from a weather station in Sheffield).