You can follow along and through the notebooks that we will be working through by going to the GitHub repository that we manage our content in.
You can practice and play with code in Pangeo binder environment:
Capabilities to streamline and automate routine processes through scripting are ubiquitous
Repeatabilty with documentation
The primary downside that is mentioned when discussing the choice of Python as a programming language is that as an interpreted language it can execute more slowly than traditional compiled languages such as C or C++.
There are a variety of ways to run Python on your computer:
python
at the Command Prompt
(Windows) or in the Terminal
(Mac OS) and seeing what response you get. If Python is installed you will typically see information about the currently installed version and then be taken to the Python command prompt where you can start typing commands. Once Python is installed on your computer you have a number of options for how you start up an environment where you can execute Python commands/code.
The most simple method is to just type python
at the Command Prompt (Windows) or Terminal (Mac OS and Linux). If you installation was successful you will be taken to the interactive prompt. For example:
UL0100MAC:~ kbene$ python
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>>
If you would like to run the IDLE IDE you should be able to find the executable file in the folder where the Python executable installed on your system.
If you installed the Anaconda release of Python you can type ipython
at the Command Prompt (Windows) or Terminal (Mac OS and Linux). If you installation was successful you will be taken to an enhanced (compared with the basic Python prompt) interactive prompt. For example:
UL0100MAC:~ kbene$ ipython
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42)
Type "copyright", "credits" or "license" for more information.
IPython 3.2.0 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]:
If you installed the Anaconda release of Python you can type jupyter notebook
at the Command Prompt (Windows) or Terminal (Mac OS and Linux). If you installation was successful you should see some startup messages in the terminal window and your browser should open up and display the Jupyter Notebook interface from where you can navigate through your system's folder structure (starting in the folder that you ran the ipython notebook
command from), and load existing notebooks or create new ones in which you can enter and execute Python commands. You can also start a local Jupyter Notebook instance through the Anaconda Navigator application that is included with recent releases of the Anaconda Python distribution. In more recent releases of the Anaconda Python distribution you can run the Anaconda Navigator from which you can run Jupyter Notebooks and other applications. *This is the interface that we are using for today's workshop**.
There are a number of strategies that you can use for getting help with specific Python commands and syntax. First and foremost you can access the Python documentation which will default to the most recent Python 3.x version that is in production, but from which (in the upper left corner of the page) you can select other Python versions if you are not using the version referenced by the page. Looking at and working through some of the materials in the Python tutorial is also a great way to see the core Python capabilities in action.
In some cases you can find quite a few useful and interesting resources through a resonably crafted Google search: e.g. for python create list
.
You can also get targeted help some specific commands or objects from the command prompt by just using the help()
function. Where you put the name of the command or object between the parentheses ()
.
For example:
>>>help(print)
and
>>>help(str)
and
>>>myVar = [1,2,3,4,5]
>>>help(myVar)
Type in the help command in a code box in Jupyter Notebook for a few of the following commands/objects and take a look at the information you get:
dict
- e.g. help(dict)
print
sorted
float
For some commands/functions you need to import the module that that command belongs to. For example:
import os
help(os.path)
Try this pair of commands in a code window in your Jupyter Notebook or interactive terminal.
In [1]:
# type your help commands in the box and
# execute the code in the box by typing shift-enter
# (hold down the shift key while hitting the enter/return key)
At the core of Python (and any programming language) there are some key characteristics of how a program is structured that enable the proper execution of that program. These characteristics include the structure of the code itself, the core data types from which others are built, and core operators that modify objects or create new ones. From these raw materials more complex commands, functions, and modules are built. For guidance on recommended Python structure refer to the Python Style Guide.
In [1]:
# The interpreter can be used as a calculator, and can also echo or concatenate strings.
3 + 3
Out[1]:
In [2]:
3 * 3
Out[2]:
In [3]:
3 ** 3
Out[3]:
In [4]:
3 / 2 # classic division - output is a floating point number
Out[4]:
In [5]:
# Use quotes around strings
'dogs'
Out[5]:
In [6]:
# + operator can be used to concatenate strings
'dogs' + "cats"
Out[6]:
In [7]:
print('Hello World!')
Go to the section 4.4. Numeric Types in the Python 3 documentation at https://docs.python.org/3.4/library/stdtypes.html. The table in that section describes different operators - try some!
What is the difference between the different division operators (/
, //
, and %
)?
Variables allow us to store values for later use.
In [9]:
a = 5
b = 10
a + b
Out[9]:
Variables can be reassigned:
In [10]:
b = 38764289.1097
a + b
Out[10]:
The ability to reassign variable values becomes important when iterating through groups of objects for batch processing or other purposes. In the example below, the value of b
is dynamically updated every time the while
loop is executed:
In [11]:
a = 5
b = 10
while b > a:
print("b="+str(b))
b = b-1
Variable data types can be inferred, so Python does not require us to declare the data type of a variable on assignment.
In [12]:
a = 5
type(a)
Out[12]:
is equivalent to
In [13]:
a = int(5)
type(a)
Out[13]:
In [14]:
c = 'dogs'
print(type(c))
c = str('dogs')
print(type(c))
There are cases when we may want to declare the data type, for example to assign a different data type from the default that will be inferred. Concatenating strings provides a good example.
In [15]:
customer = 'Carol'
pizzas = 2
print(customer + ' ordered ' + pizzas + ' pizzas.')
Above, Python has inferred the type of the variable pizza
to be an integer. Since strings can only be concatenated with other strings, our print statement generates an error. There are two ways we can resolve the error:
pizzas
variable as type string (str
) on assignment orpizzas
variable as a string within the print
statement.
In [16]:
customer = 'Carol'
pizzas = str(2)
print(customer + ' ordered ' + pizzas + ' pizzas.')
In [17]:
customer = 'Carol'
pizzas = 2
print(customer + ' ordered ' + str(pizzas) + ' pizzas.')
Given the following variable assignments:
x = 12
y = str(14)
z = donuts
Predict the output of the following:
y + z
x + y
x + int(y)
str(x) + y
Check your answers in the interpreter.
Variable names are case senstive and:
We further recommend using variable names that are meaningful within the context of the script and the research.
The structure of a Python program is pretty simple: Blocks of code are defined using indentation. Code that is at a lower level of indentation is not considerd part of a block. Indentation can be defined using spaces or tabs (spaces are recommended by the style guide), but be consistent (and prepared to defend your choice). As we will see, code blocks define the boundaries of sets of commands that fit within a given section of code. This indentation model for defining blocks of code significantly increases the readabiltiy of Python code.
For example:
>>>a = 5
>>>b = 10
>>>while b > a:
... print("b="+str(b))
... b = b-1
>>>print("I'm outside the block")
You can (and should) also include documentation and comments in the code your write - both for yourself, and potential future users (including yourself). Comments are pretty much any content on a line that follows a #
symbol (unless it is between quotation marks. For example:
>>># we're going to do some math now
>>>yae = 5 # the number of votes in favor
>>>nay = 10 # the number of votes against
>>>proportion = yae / nay # the proportion of votes in favor
>>>print(proportion)
When you are creating functions or classes (a bit more on what these are in a bit) you can also create what are called doc strings that provide a defined location for content that is used to generate the help()
information highlighted above and is also used by other systems for the automatic generation of documentation for packages that contain these doc strings. Creating a doc string is simple - just create a single or multi-line text string (more on this soon) that starts on the first indented line following the start of the definition of the function or class. For example:
>>># we're going to create a documented function and then access the information about the function
>>>def doc_demo(some_text="Ill skewer yer gizzard, ye salty sea bass"):
... """This function takes the provided text and prints it out in Pirate
...
... If a string is not provided for `some_text` a default message will be displayed
... """
... out_string = "Ahoy Matey. " + some_text
... print(out_string)
>>>help(doc_demo)
>>>doc_demo()
>>>doc_demo("Sail ho!")
Any programming language has at its foundation a collection of types or in Python's terminology objects. The standard objects of Python consist of the following:
[]
. Elements in lists are extracted or referenced by their position in the list. For example, my_list[0]
refers to the first item in the list, my_list[5]
the sixth, and my_list[-1]
to the last item in the list. Dictionaries - an unordered collection of objects that are referenced by keys that allow for referring to those objexts by reference to those keys. Dictionaryies are bounded by curley-brackets - {}
with each element of the dictionary consisting of a key (string) and a value (object) separated by a colon :
. Elements of a dictionary are extracted or referenced using their keys. for example:
my_dict = {"key1":"value1", "key2":36, "key3":[1,2,3]}
my_dict['key1'] returns "value1"
my_dict['key3'] returns [1,2,3]
Tuples - immutable lists that are bounded by parentheses = ()
. Referencing elements in a tuple is the same as referencing elements in a list above.
set
function on a sequence of objects. A specialized list of operators on sets allow for identifying union, intersection, and difference (among others) between sets. None
These objects have their own sets of related methods (as we saw in the help()
examples above) that enable their creation, and operations upon them.
In [45]:
# Fun with types
this = 12
that = 15
the_other = "27"
my_stuff = [this,that,the_other,["a","b","c",4]]
more_stuff = {
"item1": this,
"item2": that,
"item3": the_other,
"item4": my_stuff
}
this + that
# this won't work ...
# this + that + the_other
# ... but this will ...
this + that + int(the_other)
# ...and this too
str(this) + str(that) + the_other
Out[45]:
https://docs.python.org/3/library/stdtypes.html?highlight=lists#list
Lists are a type of collection in Python. Lists allow us to store sequences of items that are typically but not always similar. All of the following lists are legal in Python:
In [46]:
# Separate list items with commas!
number_list = [1, 2, 3, 4, 5]
string_list = ['apples', 'oranges', 'pears', 'grapes', 'pineapples']
combined_list = [1, 2, 'oranges', 3.14, 'peaches', 'grapes', 99.19876]
# Nested lists - lists of lists - are allowed.
list_of_lists = [[1, 2, 3], ['oranges', 'grapes', 8], [['small list'], ['bigger', 'list', 55], ['url_1', 'url_2']]]
There are multiple ways to create a list:
In [47]:
# Create an empty list
empty_list = []
# As we did above, by using square brackets around a comma-separated sequence of items
new_list = [1, 2, 3]
# Using the type constructor
constructed_list = list('purple')
# Using a list comprehension
result_list = [i for i in range(1, 20)]
We can inspect our lists:
In [48]:
empty_list
Out[48]:
In [49]:
new_list
Out[49]:
In [50]:
result_list
Out[50]:
In [51]:
constructed_list
Out[51]:
The above output for typed_list
may seem odd. Referring to the documentation, we see that the argument to the type constructor is an iterable, which according to the documentation is "An object capable of returning its members one at a time." In our construtor statement above
# Using the type constructor
constructed_list = list('purple')
the word 'purple' is the object - in this case a word - that when used to construct a list returns its members (individual letters) one at a time.
Compare the outputs below:
In [52]:
constructed_list_int = list(123)
In [53]:
constructed_list_str = list('123')
constructed_list_str
Out[53]:
Lists in Python are:
Ordered here does not mean sorted. The list below is printed with the numbers in the order we added them to the list, not in numeric order:
In [54]:
ordered = [3, 2, 7, 1, 19, 0]
ordered
Out[54]:
In [55]:
# There is a 'sort' method for sorting list items as needed:
ordered.sort()
ordered
Out[55]:
Info on additional list methods is available at https://docs.python.org/3/library/stdtypes.html?highlight=lists#mutable-sequence-types
Because lists are ordered, it is possible to access list items by referencing their positions. Note that the position of the first item in a list is 0 (zero), not 1!
In [56]:
string_list = ['apples', 'oranges', 'pears', 'grapes', 'pineapples']
In [57]:
string_list[0]
Out[57]:
In [58]:
# We can use positions to 'slice' or selection sections of a list:
string_list[3:]
Out[58]:
In [59]:
string_list[:3]
Out[59]:
In [60]:
string_list[1:4]
Out[60]:
In [61]:
# If we don't know the position of a list item, we can use the 'index()' method to find out.
# Note that in the case of duplicate list items, this only returns the position of the first one:
string_list.index('pears')
Out[61]:
In [62]:
string_list.append('oranges')
In [63]:
string_list
Out[63]:
In [64]:
string_list.index('oranges')
Out[64]:
In [65]:
# one more time with lists and dictionaries
list_ex1 = my_stuff[0] + my_stuff[1] + int(my_stuff[2])
print(list_ex1)
list_ex2 = (
str(my_stuff[0])
+ str(my_stuff[1])
+ my_stuff[2]
+ my_stuff[3][0]
)
print(list_ex2)
dict_ex1 = (
more_stuff['item1']
+ more_stuff['item2']
+ int(more_stuff['item3'])
)
print(dict_ex1)
dict_ex2 = (
str(more_stuff['item1'])
+ str(more_stuff['item2'])
+ more_stuff['item3']
)
print(dict_ex2)
In [66]:
# Now try it yourself ...
# print out the phrase "The answer: 42" using the following
# variables and one or more of your own and the 'print()' function
# (remember spaces are characters as well)
start = "The"
answer = 42
If objects are the nouns, operators are the verbs of a programming language. We've already seen examples of some operators: assignment with the =
operator, arithmetic addition and string concatenation with the +
operator, arithmetic division with the /
and -
operators, and comparison with the >
operator. Different object types have different operators that may be used with them. The Python Documentation provides detailed information about the operators and their functions as they relate to the standard object types described above.
Flow control commands allow for the dynamic execution of parts of the program based upon logical conditions, or processing of objects within an iterable object (like a list or dictionary). Some key flow control commands in python include:
while-else
loops that continue to run until the termination test is False
or a break
command is issued within the loop:
done = False
i = 0
while not done:
i = i+1
if i > 5: done = True
if-elif-else
statements defined alternative blocks of code that are executed if a test condition is met:
do_something = "what?"
if do_something == "what?":
print(do_something)
elif do_something == "where?":
print("Where are we going?")
else:
print("I guess nothing is going to happen")
for
loops allow for repeated execution of a block of code for each item in a python sequence such as a list or dictionary. For example:
my_stuff = ['a', 'b', 'c']
for item in my_stuff:
print(item)
a
b
c
Functions represent reusable blocks of code that you can reference by name and pass informatin into to customize the exectuion of the function, and receive a response representing the outcome of the defined code in the function. When would you want to define a function? You should consider defining a function when you find yourself entering very similar code to execute variations of the same process. The dataset used for the following example is part of the supplementary materials (Data S1 - Egg Shape by Species) for Stoddard et al. (2017).
Mary Caswell Stoddard, Ee Hou Yong, Derya Akkaynak, Catherine Sheard, Joseph A. Tobias, L. Mahadevan. 2017. "Avian egg shape: Form, function, and evolution". Science. 23 June 2017. Vol. 356, Issue 6344. pp. 1249-1254. DOI: 10.1126/science.aaj1945. https://science.sciencemag.org/content/356/6344/1249
A sample workflow without functions:
In [67]:
# read data into a list of dictionaries
import csv
# create an empty list that will be filled with the rows of data from the CSV as dictionaries
csv_content = []
# open and loop through each line of the csv file to populate our data file
with open('aaj1945_DataS1_Egg_shape_by_species_v2.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader: # process each row of the csv file
csv_content.append(row)
print(csv_content[0].keys())
#print()
#print(csv_content[0])
In [68]:
# extract content of each "column" individually
order = []
for item in csv_content:
try:
order.append(item['Order'])
except:
order.append(None)
family = []
for item in csv_content:
try:
family.append(item['Family'])
except:
family.append(None)
species = []
for item in csv_content:
try:
species.append(item['Species'])
except:
species.append(None)
asymmetry = []
for item in csv_content:
try:
asymmetry.append(item['Asymmetry'])
except:
asymmetry.append(None)
ellipticity = []
for item in csv_content:
try:
ellipticity.append(item['Ellipticity'])
except:
ellipticity.append(None)
avgLength = []
for item in csv_content:
try:
avgLength.append(item['AvgLength (cm)'])
except:
avgLength.append(None)
noImages = []
for item in csv_content:
try:
noImages.append(item['Number of images'])
except:
noImages.append(None)
noEggs = []
for item in csv_content:
try:
noEggs.append(item['Number of eggs'])
except:
noEggs.append(None)
print(order[0:3])
print(family[0:3])
print(species[0:3])
print(asymmetry[0:3])
print(ellipticity[0:3])
print(avgLength[0:3])
print(noImages[0:3])
print(noEggs[0:3])
In [69]:
# define a function that can extract a named column from a named list of dictionaries
def extract_column(source_list, source_column):
new_list = []
for item in source_list:
try:
new_list.append(item[source_column])
except:
new_list.append(None)
print(source_column + ": " + ", ".join(new_list[0:3]))
return(new_list)
order = extract_column(csv_content, 'Order')
family = extract_column(csv_content, 'Family')
species = extract_column(csv_content, 'Species')
asymmetry = extract_column(csv_content, 'Asymmetry')
ellipticity = extract_column(csv_content, 'Ellipticity')
avgLength = extract_column(csv_content, 'AvgLength (cm)')
noImages = extract_column(csv_content, 'Number of images')
noEggs = extract_column(csv_content, 'Number of eggs')
print()
print(order[0:3])
print(family[0:3])
print(species[0:3])
print(asymmetry[0:3])
print(ellipticity[0:3])
print(avgLength[0:3])
print(noImages[0:3])
print(noEggs[0:3])
In [70]:
# use the extract_column function in a loop to automatically extract all of the columns from a from the list
# of dictionaries to create a dictionary representing each column of values
columns = {}
for column in csv_content[0].keys():
columns[column] = extract_column(csv_content, column)
columns
Out[70]:
An example of reading a data file and doing basic work with it illustrates all of these concepts. This also illustrates the concept of writing a script that combines all of your commands into a file that can be run. eggs.py in this case.
#!/usr/bin/env python
import csv
# create an empty list that will be filled with the rows of data from the CSV as dictionaries
csv_content = []
# open and loop through each line of the csv file to populate our data file
with open('aaj1945_DataS1_Egg_shape_by_species_v2.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader: # process each row of the csv file
csv_content.append(row)
print("keys: " + ", ".join(csv_content[0].keys()))
print()
print()
# define a function that can extract a named column from a named list of dictionaries
def extract_column(source_list, source_column):
new_list = []
for item in source_list:
try:
new_list.append(item[source_column])
except:
new_list.append(None)
print(source_column + ": " + ", ".join(new_list[0:3]))
return(new_list)
order = extract_column(csv_content, 'Order')
family = extract_column(csv_content, 'Family')
species = extract_column(csv_content, 'Species')
asymmetry = extract_column(csv_content, 'Asymmetry')
ellipticity = extract_column(csv_content, 'Ellipticity')
avgLength = extract_column(csv_content, 'AvgLength (cm)')
noImages = extract_column(csv_content, 'Number of images')
noEggs = extract_column(csv_content, 'Number of eggs')
print()
print(order[0:3])
print(family[0:3])
print(species[0:3])
print(asymmetry[0:3])
print(ellipticity[0:3])
print(avgLength[0:3])
print(noImages[0:3])
print(noEggs[0:3])
# Calculate and print some statistics
print()
mean_asymmetry = sum(map(float, asymmetry))/len(asymmetry)
print("Mean Asymmetry: ", str(mean_asymmetry))
mean_ellipticity = sum(map(float, ellipticity))/len(ellipticity)
print("Mean Ellipticity: ", str(mean_ellipticity))
mean_avglength = sum(map(float, avgLength))/len(avgLength)
print("Mean Average Length: ", str(mean_avglength)) print("Mean Average Length: ", str(mean_avglength))
To execute this script you can use a couple of strategies:
python eggs.py
command at the command line#!
line at the beginning of the script by making sure that the script is executable (ls -l
can provide information about whether a file is executable, chmod u+x eggs.py
can make your script executable for the user that owns the file), and entering the name of the script on the command line: ./eggs.py
if the script is in the current directory. While Python's Standard Library of modules is very powerful and diverse, you will encounter times when you need functionality that is not included in the base installation of Python. Fear not, there are over 100,000 additional packages that have been developed to extend the capabilities of Python beyond those provided in the default installation. The central repository for Python packages is the Python Package Index that can be browsed on the web, or can be programmatically interacted with using the PIP utility.
Once installed, the functionality of a module (standard or not) is added to a script using the import
command.
Some book-length resources:
In [ ]: