Working with Python: functions and modules

Session 1: Introduction to Python

Printing

You can include a comment in python by prefixing some text with a # character. All text following the # will then be ignored by the interpreter.


In [ ]:
print('Hello from python!') # to print some text, enclose it between quotation marks - single
print("I'm here today!")    # or double
print(34)                   # print an integer
print(2 + 4)                # print the result of an arithmetic operation
print("The answer is", 42)  # print multiple expressions, separated by comma

Variables

A variable can be assigned to a simple value or the outcome of a more complex expression. The = operator is used to assign a value to a variable.


In [ ]:
x = 3     # assignment of a simple value
print(x)
y = x + 5 # assignment of a more complex expression
print(y)
i = 12
print(i)
i = i + 1 # assigment of the current value of a variable incremented by 1 to itself
print(i)
i += 1    # shorter version with the special += operator
print(i)

Simple data types

Python has 4 main basic data types.


In [ ]:
a = 2           # integer
b = 5.0         # float
c = 'word'      # string
d = 4 > 5       # boolean True or False
e = None        # special built-in value to create a variable that has not been set to anything specific
print(a, b, c, d, e)
print(a, 'is of type', type(a)) # to check the type of a variable

Arithmetic operations


In [ ]:
a = 2             # assignment
a += 1            # change and assign (*=, /=)
3 + 2             # addition
3 - 2             # subtraction
3 * 2             # multiplication
3 / 2             # integer (python2) or float (python3) division

3 // 2            # integer division
3 % 2             # remainder
3 ** 2            # exponent

Lists

A list is an ordered collection of mutable elements.


In [ ]:
a = ['red', 'blue', 'green']       # manual initialisation
copy_of_a = a[:]                   # copy of a 
another_a = a                      # same as a
b = list(range(5))                 # initialise from iteratable
c = [1, 2, 3, 4, 5, 6]             # manual initialisation
len(c)                             # length of the list
d = c[0]                           # access first element at index 0
e = c[1:3]                         # access a slice of the list, 
                                   # including element at index 1 up to but not including element at index 3
f = c[-1]                          # access last element
c[1] = 8                           # assign new value at index position 1
g = ['re', 'bl'] + ['gr']          # list concatenation
['re', 'bl'].index('re')           # returns index of 're'
a.append('yellow')                 # add new element to end of list
a.extend(b)                        # add elements from list `b` to end of list `a`
a.insert(1, 'yellow')              # insert element in specified position
're' in ['re', 'bl']               # true if 're' in list
'fi' not in ['re', 'bl']           # true if 'fi' not in list
c.sort()                           # sort list in place
h = sorted([3, 2, 1])              # returns sorted list
i = a.pop(2)                       # remove and return item at index (default last)
print(a, b, c, d, e, f, g, h, i)
print(a, copy_of_a, another_a)

Dictionnaries

A dictionnary is an unordered collection of key-value pairs where keys must be unique.


In [ ]:
a = {'A': 'Adenine', 'C': 'Cytosine'}        # dictionary
b = a['A']                                   # translate item
c = a.get('N', 'no value found')             # return default value
'A' in a                                     # true if dictionary a contains key 'A'
a['G'] = 'Guanine'                           # assign new key, value pair to dictonary a
a['T'] = 'Thymine'                           # assign new key, value pair to dictonary a
print(a)
d = a.keys()                                 # get list of keys
e = a.values()                               # get list of values
f = a.items()                                # get list of key-value pairs
print(b, c, d, e, f)
del a['A']                                   # delete key and associated value
print(a)

Sets

A set is an unordered collection of unique elements.


In [ ]:
a = {1, 2, 3}                                # initialise manually
b = set(range(5))                            # initialise from iteratable
c = set([1,2,2,2,2,4,5,6,6,6])               # initialise from list
a.add(13)                                    # add new element to set
a.remove(13)                                 # remove element from set
2 in {1, 2, 3}                               # true if 2 in set
5 not in {1, 2, 3}                           # true if 5 not in set
d = a.union(b)                               # return the union of sets as a new set
e = a.intersection(b)                        # return the intersection of sets as a new set
print(a, b, c, d, e)

Tuples

Tuple is an ordered collection of immutable elements. Tuples are similar to lists, but the elements un a tuple cannot be modified. Most of list operations seen above can be used on tuples except the assignment of new value at a certain index position.


In [ ]:
a = (123, 54, 92)              # initialise manually
b = ()                         # empty tuple
c = ("Ala",)                   # tuple of a single string (note the trailing ",")
d = (2, 3, False, "Arg", None) # a tuple of mixed types
print(a, b, c, d)
t = a, c, d                    # tuple packing
x, y, z = t                    # tuple unpacking
print(t, x, y, z)

Strings

String is an ordered collection of immutable characters or tuple of characters.


In [ ]:
a = 'red'                          # assignment
char = a[2]                        # access individual characters
b = 'red' + 'blue'                 # string concatenation
c = '1, 2, three'.split(',')       # split string into list
d = '.'.join(['1', '2', 'three'])  # concatenate list into string
print(a, char, b, c, d)            
dna = 'ATGTCACCGTTT'               # assignment
seq = list(dna)                    # convert string into list of character
e = len(dna)                       # return string length
f = dna[2:5]                       # slice string
g = dna.find('TGA')                # substring location, return -1 when not found
print(dna, seq, e, f, g)
text = '   chrom start end    '    # assignment
print('>', text, '<')
print('>', text.strip(), '<')      # remove unwanted whitespace at both end of the string
print('{:.2f}'.format(0.4567))     # formating string
print('{gene:s}\t{exp:+.2f}'.format(gene='Beta-Actin', exp=1.7))

Conditional execution

A conditional if/elif statement is used to specify that some block of code should only be executed if a conditional expression evaluates to True, there can be a final else statement to do something if all of the conditions are False. Python uses indentation to show which statements are in a block of code.


In [ ]:
a, b = 1, 2           # assign different values to a and b
if a + b == 3:
    print('True')
elif a + b == 1:
    print('False')
else:
    print('?')

Comparison operations


In [ ]:
1 == 1            # equal value
1 != 2            # not equal
2 > 1             # larger
2 < 1             # smaller

1 != 2 and 2 < 3  # logical AND
1 != 2 or 2 < 3   # logical OR
not 1 == 2        # logical NOT

a = list('ATGTCACCGTTT')
b = a             # same as a
c = a[:]          # copy of a
'N' in a          # test if character 'N' is in a

print('a', a)      # print a
print('b', b)      # print b
print('c', c)      # print c
print('Is N in a?', 'N' in a)
print('Are objects b and a point to the same memory address?', b is a)
print('Are objects c and a point to the same memory address?', c is a)
print('Are values of b and a identical?', b == a)
print('Are values of c and a identical?', c == a)
a[0] = 'N'         # modify a  
print('a', a)      # print a
print('b', b)      # print b
print('c', c)      # print c
print('Is N in a?', 'N' in a)
print('Are objects b and a point to the same memory address?', b is a)
print('Are objects c and a point to the same memory address?', c is a)
print('Are values of b and a identical?', b == a)
print('Are values of c and a identical?', c == a)

Loops

There are two ways of creating loops in Python, the for loop and the while loop.


In [ ]:
a = ['red', 'blue', 'green']
for color in a:
    print(color)

In [ ]:
number = 1
while number < 10:
    print(number)
    number += 1

Python has two ways of affecting the flow of the for or while loop inside the block. The break statement immediately causes all looping to finish, and execution is resumed at the next statement after the loop. The continue statement means that the rest of the code in the block is skipped for this particular item in the collection.


In [ ]:
# break
sequence = ['CAG','TAC','CAA','TAG','TAC','CAG','CAA']
for codon in sequence:
    if codon == 'TAG':
        break            # Quit looping at this point
    else:
        print(codon)

# continue
values = [10, -5, 3, -1, 7]
total = 0
for v in values:
    if v < 0:
        continue         # Skip this iteration   
    total += v
print(values, 'sum:', sum(values), 'total:', total)

Files

To read from a file, your program needs to open the file and then read the contents of the file. You can read the entire contents of the file at once, or read the file line by line. The with statement makes sure the file is closed properly when the program has finished accessing the file.

Passing the 'w' argument to open() tells Python you want to write to the file. Be careful; this will erase the contents of the file if it already exists. Passing the 'a' argument tells Python you want to append to the end of an existing file.


In [ ]:
# reading from file
with open("data/genes.txt") as f:
    for line in f:
        print(line.strip())

# writing to a file
with open('programming.txt', 'w') as f:
    f.write("I love programming in Python!\n")
    f.write("I love making scripts.\n")
    
# appending to a file 
with open('programming.txt', 'a') as f:
    f.write("I love working with data.\n")

Getting help

The Python 3 Standard Library is the reference documentation of all libraries included in Python as well as built-in functions and data types.


In [ ]:
help(len)          # help on built-in function
help(list.extend)  # help on list function

In [ ]:
# help within jupyter
len?

Exercise 1.1

We are going to look at a Gapminder dataset, made famous by Hans Rosling from the Ted presentation ‘The best stats you’ve ever seen’.

  • Read the dataset from the file data/gapminder.txt
  • Find what are the oldest and youngest years in the dataset programatically
  • Calculate average life expectancy as well as global population increase between these two years
  • Find which country has the lowest life expectancy in 2002

Next session

Go to our next notebook: python_functions_and_modules_2