ANZSMS 2015 Python Programming Workshop

Print Function

The print function prints output to the command-line interface (terminal in OS X/Linux, powershell in Windows). In the following example, we print the text string "Hello, world!"


In [9]:
print("Hello, world!")


Hello, world!

Variables, numbers and math basic operations

Take the following statement, x = 33. In this statement, x is a variable with a value of 33 (integer literal). As its name suggests, variables can be reassigned when executing a program.

Simple data types in Python include: integers, floating point numbers, strings and Boolean (True/False) values. In the following examples, the variables Ag and Au are assigned to integer values. The variables are then reassigned to floating point numbers and some simple math operations demonstrated.

Au and Ag variables assigned to integer values

Integers are whole numbers, i.e., 1, 0, -20 etc.


In [10]:
Ag = 107

In [11]:
Au = 197

In [12]:
type(Ag)


Out[12]:
int

In [13]:
type(Au)


Out[13]:
int

Au and Ag variables reassigned to floating point values

Floats use a decimal point or exponential notation. Note that values such as 20.0, 1.0, 0.0 etc. are floats, not integers. Also, floating point values may not be a true representation of the number since computer use a binary (base-2) number system. This leads to floating point errors when working with floating point values.


In [14]:
Ag = 106.9

In [15]:
Au = 197.0

In [16]:
type(Ag)


Out[16]:
float

In [17]:
type(Au)


Out[17]:
float

Math operations

In Python 3, math operations behave as you would expect. In Python 2, division is the same as floor division when working with integers!


In [64]:
Au + Ag # addition (Note: You can add comments to your code by using the # symbol)


Out[64]:
303.9

In [19]:
Au - Ag # subtraction


Out[19]:
90.1

In [20]:
Au * 5 # multiplication


Out[20]:
985.0

In [21]:
Ag ** 2 # exponential - mass of silver squared in this case


Out[21]:
11427.61

In [22]:
Au / Ag # division


Out[22]:
1.842843779232928

In [23]:
Au // Ag # floor division


Out[23]:
1.0

Type conversion

We can convert a float to an integer or an integer to a float. Conversions of string representations of numbers to actual numbers (integers and floats) is also common in Python programming.


In [24]:
integer_Ag = int(Ag)

In [25]:
integer_Ag


Out[25]:
106

In [26]:
type(integer_Ag)


Out[26]:
int

The modulo operator (%)

The modulo (%) operator gives the remainder from division. This is commonly used to check whether a number is odd or even. In mass spectrometry, we could use this to test whether an ion has an odd number of nitrogen atoms. For example, mz = 114; if mz % 2 == 0:; print("Ion has an odd number of nitrogens")


In [27]:
modulo = integer_Ag % 2

In [28]:
if modulo == 0:
    print("Ion is even")


Ion is even

Strings and indexing

In the following example, we have a file name represented as a string. From this string, we can use to indexing to select certain characters or substrings from the file name string.

Strings

String are simply strings of characters (letters, numbers, symbols etc.). Strings are indicated using either single quotes ('This is a string') or double quotes ("This is another string"). Multiline strings can be made using triple quotes (i.e. """Multiline string""").


In [29]:
MS2_spectrum = "Liver_MS2_406.raw"

In [30]:
MS2_spectrum


Out[30]:
'Liver_MS2_406.raw'

Indexing

Single characters are indexed using square brackets after the variable name. In Python, the first character is 0. Characters may also be index from the end of the string (starting at -1).


In [31]:
MS2_spectrum[0]


Out[31]:
'L'

In [32]:
MS2_spectrum[-1]


Out[32]:
'w'

Substrings can be indexed using string[start:end]


In [33]:
sample = MS2_spectrum[0:5]

In [34]:
sample # Note: The character at position 5 is not included in the substring.


Out[34]:
'Liver'

In [35]:
ion = MS2_spectrum[10:13]

In [36]:
ion


Out[36]:
'406'

In [37]:
file_format = MS2_spectrum[-3:]

In [38]:
file_format


Out[38]:
'raw'

Type conversion - string to float


In [39]:
type(ion)


Out[39]:
str

In [40]:
float(ion)


Out[40]:
406.0

Collection data types

Lists

Lists are mutable (i.e., modifiable), ordered collections of items. Lists are created by enclosing a collection of items with square brackets. An empty list may also be created simply by assigning [] to a variable, i.e., empty_list = [].


In [41]:
MS_files = ["MS_spectrum", "MS2_405", "MS2_471", "MS2_495"]

In [42]:
MS_files


Out[42]:
['MS_spectrum', 'MS2_405', 'MS2_471', 'MS2_495']

Indexing in lists is the same as for strings


In [43]:
MS_files[2]


Out[43]:
'MS2_471'

Several list 'methods' exist for manipulating lists


In [44]:
MS_files.remove("MS2_405")

In [45]:
MS_files


Out[45]:
['MS_spectrum', 'MS2_471', 'MS2_495']

In [46]:
MS_files.append("MS3_225")

In [47]:
MS_files


Out[47]:
['MS_spectrum', 'MS2_471', 'MS2_495', 'MS3_225']

Tuples

Tuples are immutable (i.e., can't modified after their creation), ordered collections of items and are the simplist collection data type. Tuples are created by enclosing a collection of items by parentheses).


In [48]:
Fe_isotopes = (53.9, 55.9, 56.9, 57.9)

In [49]:
Fe_isotopes


Out[49]:
(53.9, 55.9, 56.9, 57.9)

Indexing


In [50]:
Fe_isotopes[0]


Out[50]:
53.9

Dictionaries

Dictionaries are mutable, unordered collections of key: value pairs. Dictionaries are created created by enclosing key: value pairs with curly brackets. Importantly, keys must be hashable. This means, for example, that lists can't be used as keys since the items inside a list may be modified.


In [51]:
carbon_isotopes = {"12": 0.9893, "13": 0.0107}

Fetching the value for a certain key


In [52]:
carbon_isotopes["12"]


Out[52]:
0.9893

Dictionary methods


In [53]:
carbon_isotopes.keys()


Out[53]:
dict_keys(['13', '12'])

In [54]:
carbon_isotopes.values()


Out[54]:
dict_values([0.0107, 0.9893])

In [55]:
carbon_isotopes.items()


Out[55]:
dict_items([('13', 0.0107), ('12', 0.9893)])

Sets

Sets are another data type which are like an unordered list with no dublicates. They are especially useful for finding all the unique items from a list as shown below.


In [56]:
phospholipids = ["PA(16:0/18:1)", "PA(16:0/18:2)", "PC(14:0/16:0)", "PC(16:0/16:1)", "PC(16:1/16:2)"]
# Lets assume we apply a function that finds the type of phospholipid name to 
phospholipid_fatty_acids = ["16:0", "18:1", "16:0", "18:2", "14:0", "16:0", "16:0", "16:1", "16:1", "16:2"]

In [57]:
unique_fatty_acids = set(phospholipid_fatty_acids)

In [58]:
unique_fatty_acids


Out[58]:
{'14:0', '16:0', '16:1', '16:2', '18:1', '18:2'}

In [59]:
num_unique_fa = len(unique_fatty_acids)

In [60]:
num_unique_fa


Out[60]:
6

Boolean operators

Boolean operators asses the truth or falseness of a statement.


In [61]:
Ag > Au


Out[61]:
False

In [62]:
Ag < Au


Out[62]:
True

In [63]:
Ag == 106.9


Out[63]:
True

In [48]:
Au >= 100


Out[48]:
True

In [49]:
Ag <= Au and Ag > 200


Out[49]:
False

In [50]:
Ag <= Au or Ag > 200


Out[50]:
True

Conditional statements

Code is only executed if the conditional statement is evaluated as True. In the following example, Ag has a value of greater than 100 and therefore only the "Ag is greater than 100 Da" string is printed. A colon follows the conditional statement and the following code block is indented by 4 spaces (always use 4 spaces rather than tabs - errors will resulting when mixing tabs with spaces!). Note, the elif and else statements are optional.


In [51]:
if Ag < 100:
    print("Ag is less than 100 Da")
elif Ag > 100:
    print("Ag is greater than 100 Da.")
else:
    print("Ag is equal to 100 Da.")


Ag is greater than 100 Da.

While loops

While loops repeat the execution of a code block while a condition is evaulated as True. When using while loops, be careful not to make an infinite loop where the conditional statement never evaluates as False. (Note: You could, however, use 'break' to break from an infinite loop.)


In [52]:
mass_spectrometers = 0
while mass_spectrometers < 5:
    print("Ask for money")
    mass_spectrometers = mass_spectrometers + 1
    # Comment: This can be written as mass_spectrometers += 1
    print("Number of mass spectrometers equals", mass_spectrometers)
    
print("\nNow we need more lab space")


Ask for money
Number of mass spectrometers equals 1
Ask for money
Number of mass spectrometers equals 2
Ask for money
Number of mass spectrometers equals 3
Ask for money
Number of mass spectrometers equals 4
Ask for money
Number of mass spectrometers equals 5

Now we need more lab space

For loops

For loops iterate over each item of collection data types (lists, tuples, dictionaries and sets). For loops can also be used to loop over the characters of a string. In fact, this fact will be utilised later to evaluate each amino acid residue of a peptide string.


In [36]:
lipid_masses = [674.5, 688.6, 690.6, 745.7]

In [37]:
Na = 23.0

lipid_Na_adducts = []
for mass in lipid_masses:
    lipid_Na_adducts.append(mass + Na)

In [38]:
lipid_Na_adducts


Out[38]:
[697.5, 711.6, 713.6, 768.7]

List comprehension

The following is a list comprehension which performs the same operation of the for loop above but in less lines of code.


In [39]:
adducts_comp = [mass + Na for mass in lipid_masses]

In [40]:
adducts_comp


Out[40]:
[697.5, 711.6, 713.6, 768.7]

We could also add a predicate to a list comprehension. Here, we calculate the mass of lipids less than 700 Da.


In [43]:
adducts_comp = [mass + Na for mass in lipid_masses if mass < 700]

In [44]:
adducts_comp


Out[44]:
[697.5, 711.6, 713.6]

While and for loops with conditional statements

Both while and for loops can be combined with conditional statements for greater control of flow within a program.


In [11]:
mass_spectrometers = 0
while mass_spectrometers < 5:
    mass_spectrometers += 1
    print("Number of mass spectrometers equals", mass_spectrometers)
    if mass_spectrometers == 1:
        print("Woohoo, the first of many!")
    elif mass_spectrometers == 5:
        print("That'll do for now.")
    else:
        print("More!!")


Number of mass spectrometers equals 1
Woohoo, the first of many!
Number of mass spectrometers equals 2
More!!
Number of mass spectrometers equals 3
More!!
Number of mass spectrometers equals 4
More!!
Number of mass spectrometers equals 5
That'll do for now.

In [58]:
for MS_file in MS_files:
    if "spectrum" in MS_file:
        print("MS file:", MS_file)
    elif "MS2" in MS_file:
        print("MS2 file:", MS_file)
    else:
        print("MS3 file:", MS_file)


MS file: MS_spectrum
MS2 file: MS2_471
MS2 file: MS2_495
MS3 file: MS3_225

Exercise: Calculate peptide masses

In the following example, we will calculate the mass of a peptide from a string containing one letter amino acid residue codes. For example, peptide = "GASPV". To do this, we will first need a dictionary containing the one letter codes as keys and the masses of the amino acid residues as values. We will then need to create a variable to store the mass of the peptide and use a for loop to iterate over each amino acid residue in the peptide.


In [6]:
amino_dict = {
    'G': 57.02147,
    'A': 71.03712,
    'S': 87.03203,
    'P': 97.05277,
    'V': 99.06842,
    'T': 101.04768,
    'C': 103.00919,
    'I': 113.08407,
    'L': 113.08407,
    'N': 114.04293,
    'D': 115.02695,
    'Q': 128.05858,
    'K': 128.09497,
    'E': 129.0426,
    'M': 131.04049,
    'H': 137.05891,
    'F': 147.06842,
    'R': 156.10112,
    'Y': 163.06333,
    'W': 186.07932,
    }

# Data modified from http://www.its.caltech.edu/~ppmal/sample_prep/work3.html

In [7]:
peptide_name = "SCIENCE"

In [8]:
mass = 18.010565
for amino_acid in peptide_name:
    mass += amino_dict[amino_acid]

In [9]:
mass


Out[9]:
796.2731749999999

Functions

Functions perform a specified task when called during the execution of a program. Functions reduce the amount of code that needs to be written and greatly improves code readability. (Note: readability matters!) The for loop created above is better placed in a function so that the for loop doesn't need to be re-written everytime we wish to calculate the mass of a peptide. Pay careful attention to the syntax below.


In [10]:
def peptide_mass(peptide):
    mass = 18.010565
    for amino_acid in peptide:
        mass += amino_dict[amino_acid]
    return mass

In [11]:
peptide_mass(peptide_name)


Out[11]:
796.2731749999999

User input

A simple means to gather user inputted data is to use input. This will prompt the user to enter data which may be used within the program. In the example below, we prompt the user to enter a peptide name. The peptide name is then used for the function call to calculate the peptide's mass.


In [14]:
user_peptide = input("Enter peptide name: ")


Enter peptide name: WHATTHEWTF

In [15]:
peptide_mass(user_peptide)


Out[15]:
1314.5782049999998