"Scripting language" means:
Multi-platform
"Battery included" means:
In [1]:
import this
In [2]:
print("hello")
Before starting, you need to know that in Python, code indentation is an essential part of the syntax. It is used to delimitate code blocks such as loops and functions. It may seem cumbersome, but it makes all Python code consistent and readable. The following code is incorrect:
>>> a = 1
>>> b = 2
since the two statements are not aligned despite being part of the same block of statements (the main block). Instead, they must be indented in the same way:
>>> a = 1
>>> b = 2
Here is another example involving a loop and a function (def):
def example():
for i in [1, 2, 3]:
print(i)
In C, it may look like
void example(){
int i;
for (i=1; i<=3; i++){
printf("%d\n", i);
}
}
OR
void example(){
int i;
for (i=1; i<=3; i++)
{
printf("%d\n", i);
}
}
Note: both tabs and spaces can be used to define the indentation, but conventionally 4 spaces are preferred.
Variable names conventionally have lower-case letters, with multiple words seprated by underscores.
Other rules and style conventions: PEP8 style recommendations (https://www.python.org/dev/peps/pep-0008/)
Integers
In [3]:
a = 10
b = 2
a + b
Out[3]:
In [4]:
# incremental operators
a = 10
a += 2 # equivalent to a = a + 2 (there is no ++ operators like in C/C++])
a
Out[4]:
In [5]:
a = 10
a = a + 2
a
Out[5]:
Boolean
In [6]:
test = True
if test:
print(test)
In [7]:
test = False
if not test:
print(test)
In [8]:
# Other types can be treated as boolean
# Main example are integers
true_value = 1
false_value = 0
if true_value:
print(true_value)
if not false_value:
print(false_value)
Integers, Long, Float and Complex
In [9]:
long_integer = 2**63
float1 = 2.1
float2 = 2.0
float3 = 2.
complex_value = 1 + 2j
In [10]:
long_integer
Out[10]:
In [11]:
float3
Out[11]:
Basic mathematical operators
In [12]:
1 + 2
Out[12]:
In [13]:
1 - 2
Out[13]:
In [14]:
3 * 2
Out[14]:
In [1]:
3 / 2
Out[1]:
In [16]:
3
Out[16]:
In [17]:
float(3)
Out[17]:
In [2]:
3 // 2
Out[2]:
In [19]:
3 % 2
Out[19]:
In [20]:
3 ** 2
Out[20]:
Promotion: when you mix numeric types in an expression, all operands are converted (or coerced) to the type with highest precision
In [21]:
5 + 3.1
Out[21]:
Converting types: casting
A variable belonging to one type can be converted to another type through "casting"
In [22]:
int(3.1)
Out[22]:
In [23]:
float(3)
Out[23]:
In [24]:
bool(1)
Out[24]:
In [25]:
bool(0)
Out[25]:
In [26]:
import keyword
# Here we are using the "dot" operator, which allows us to access objects (variables, that is) attributes and functions
print(keyword.kwlist)
In [28]:
raise = 1
In [29]:
print(dir(bool))
Standard python modules are libraries that are available without the need to install additional software (they come together with the python interpreter). They only need to be imported. The import keyword allows us to import standard (and non standard) Python modules. Some common ones:
In [30]:
import os
os.listdir('.')
Out[30]:
In [31]:
os.path.exists('data.txt')
Out[31]:
In [32]:
os.path.isdir('.ipynb_checkpoints/')
Out[32]:
Import comes in different flavors
In [33]:
import math
math.pi
Out[33]:
In [34]:
from math import pi
pi
Out[34]:
In [35]:
# alias are possible on the module itself
import math as m
m.pi
Out[35]:
In [36]:
# or alias on the function/variable itself
from math import pi as PI
PI
Out[36]:
In [38]:
# pi was deleted earlier and from math import pi as PI did not created pi
# variable in the local space as expected hence the error
del pi
pi
In [40]:
math.sqrt(4.)
Out[40]:
There are quite a few data structures available. The builtins data structures are:
Lists, strings and tuples are ordered sequences of objects. Unlike strings that contain only characters, list and tuples can contain any type of objects. Lists and tuples are like arrays. Tuples like strings are immutables. Lists are mutables so they can be extended or reduced at will. Sets are mutable unordered sequence of unique elements.
Lists are enclosed in brackets:
l = [1, 2, "a"]
Tuples are enclosed in parentheses:
t = (1, 2, "a")
Tuples are faster and consume less memory.
Dictionaries are built with curly brackets:
d = {"a":1, "b":2}
Sets are made using the set builtin function. More about the data structures here below:
immutable | mutable | |
---|---|---|
ordered sequence | string | |
ordered sequence | tuple | list |
unordered sequence | set | |
hash table | dict |
Indexing starts at 0, like in C
In [41]:
s1 = "Example"
s1[0]
Out[41]:
In [42]:
# last index is therefore the length of the string minus 1
s1[len(s1)-1]
Out[42]:
In [43]:
s1[6]
Out[43]:
In [44]:
# Negative index can be used to start from the end
s1[-2]
Out[44]:
In [46]:
# Careful with indexing out of bounds
s1[100]
In [47]:
"Simple string"
Out[47]:
In [48]:
'Simple string'
Out[48]:
In [49]:
#single quotes can be used to use double quotes and vice versa
"John's book"
Out[49]:
In [50]:
#we can also use escaping
'John\'s book'
Out[50]:
In [51]:
"""This is an example of
a long string on several lines"""
Out[51]:
A little bit more on the print function: formatting
In [52]:
print('This {0} is {1} on format'.format('example', 'based'))
In [53]:
print('This {0} is {1} on format, isn't it a nice {0}?'.format('example', 'based'))
In [54]:
# Notice the escaping of the quote char
print('This {0} is {1} on format, isn\'t it a nice {0}?'.format('example', 'based'))
In [55]:
print("You can also use %s %s\n" % ('C-like', 'formatting'))
In [56]:
print("You can also format integers %d\n" % (1))
In [57]:
print("You can also specify the precision of floats: %f or %.20f\n" % (1., 1.))
String operations
In [58]:
s1 = "First string"
s2 = "Second string"
# + operator concatenates strings
s1 + " and " + s2
Out[58]:
In [59]:
# Strings are immutables
# Try
s1[0] = 'e'
In [60]:
# to change an item, you got to create a new string
'N' + s1[1:]
Out[60]:
Slicing sequence syntax
- [start:end:step] most general slicing - [start:end:] (step=1) - [start:end] (step=1) - [start:] (step=1,end=-1) - [:] (start=0,end=-1, step=1) - [::2] (start=0, end=-1, step=2)
In [61]:
s1 = 'Banana'
s1[1:6:2]
Out[61]:
In [62]:
s = 'TEST'
s[-1:-4:-2]
Out[62]:
In [63]:
# slicing. using one : character means from start to end index.
s1 = "First string"
s1[:]
Out[63]:
In [64]:
s1[::2]
Out[64]:
In [65]:
# indexing
s1[0]
Out[65]:
Other string operations
In [66]:
print(dir(s1))
Well, that's a lot ! Here are the common useful ones:
In [67]:
# split is very useful when parsing files
s = 'first second third'
s.split()
Out[67]:
In [68]:
# a different character can be used as well as separator
s = 'first,second,third'
s.split(',')
Out[68]:
In [69]:
# Upper is a very easy and handy method
s.upper()
Out[69]:
In [70]:
# Methods can be chained as well!
s.upper().lower().split(',')
Out[70]:
In [71]:
# you can any kind of objects in a lists. This is not an array !
l = [1, 'a', 3]
l
Out[71]:
In [72]:
# slicing and indexing like for strings are available
l[0]
l[0::2]
Out[72]:
In [73]:
l
Out[73]:
In [74]:
# list are mutable sequences:
l[1] = 2
l
Out[74]:
Mathematical operators can be applied to lists as well
In [75]:
[1, 2] + [3, 4]
Out[75]:
In [76]:
[1, 2] * 10
Out[76]:
Adding elements to a list: append Vs. expand
Lists have several methods amongst which the append and extend methods. The former appends an object to the end of the list (e.g., another list) while the later appends each element of the iterable object (e.g., anothee list) to the end of the list.
For example, we can append an object (here the character 'c') to the end of a simple list as follows:
In [77]:
stack = ['a','b']
stack.append('c')
stack
Out[77]:
In [78]:
stack.append(['d', 'e', 'f'])
stack
Out[78]:
In [79]:
stack[3]
Out[79]:
The object ['d', 'e', 'f']
has been appended to the exiistng list. However, it happens that sometimes what we want is to append the elements one by one of a given list rather the list itself. You can do that manually of course, but a better solution is to use the :func:extend()
method as follows:
In [80]:
# the manual way
stack = ['a', 'b', 'c']
stack.append('d')
stack.append('e')
stack.append('f')
stack
Out[80]:
In [81]:
# semi-manual way, using a "for" loop
stack = ['a', 'b', 'c']
to_add = ['d', 'e', 'f']
for element in to_add:
stack.append(element)
stack
Out[81]:
In [82]:
# the smarter way
stack = ['a', 'b', 'c']
stack.extend(['d', 'e','f'])
stack
Out[82]:
In [83]:
t = (1, 2, 3)
t
Out[83]:
In [84]:
# simple creation:
t = 1, 2, 3
print(t)
t[0] = 3
In [85]:
# Would this work?
(1)
Out[85]:
In [86]:
# To enforce a tuple creation, add a comma
(1,)
Out[86]:
Same operators as lists
In [87]:
(1,) * 5
Out[87]:
In [88]:
t1 = (1,0)
t1 += (1,)
t1
Out[88]:
Why tuples instead of lists?
In [89]:
a = {'1', '2', 'a', '4'}
a
Out[89]:
In [90]:
a = [1, 1, 1, 2, 2, 3, 4]
a
Out[90]:
In [91]:
a = {1, 2, 1, 2, 2, 3, 4}
a
Out[91]:
In [92]:
a = []
to_add = [1, 1, 1, 2, 2, 3, 4]
for element in to_add:
if element in a:
continue
else:
a.append(element)
a
Out[92]:
In [93]:
# Sets have the very handy "add" method
a = set()
to_add = [1, 1, 1, 2, 2, 3, 4]
for element in to_add:
a.add(element)
a
Out[93]:
Sets have very interesting operators
What operators do we have ?
In [94]:
a = {'a', 'b', 'c'}
b = {'a', 'b', 'd'}
c = {'a', 'e', 'f'}
In [95]:
# intersection
a & b
Out[95]:
In [96]:
# union
a | b
Out[96]:
In [97]:
# difference
a - b
Out[97]:
In [98]:
# symmetric difference
a ^ b
Out[98]:
In [99]:
# is my set a subset of the other?
a < b
Out[99]:
In [100]:
# operators can be chained as well
a & b & c
Out[100]:
In [101]:
# the same operations can be performed using the operator's name
a.intersection(b).intersection(c)
Out[101]:
In [102]:
# a more complex operation
a.intersection(b).difference(c)
Out[102]:
In [104]:
d = {} # an empty dictionary
In [105]:
d = {'first':1, 'second':2} # initialise a dictionary
In [106]:
# access to value given a key:
d['first']
Out[106]:
In [107]:
# add a new pair of key/value:
d['third'] = 3
In [108]:
# what are the keys ?
d.keys()
Out[108]:
In [109]:
# what are the values ?
d.values()
Out[109]:
In [110]:
# what are the key/values pairs?
d.items()
Out[110]:
In [111]:
# can be used in a for loop as well
for key, value in d.items():
print(key, value)
In [112]:
# Delete a key (and its value)
del d['third']
d
Out[112]:
In [113]:
# naive for loop approach:
for key in d.keys():
print(key, d[key])
In [114]:
# no need to call the "keys" method explicitly
for key in d:
print(key, d[key])
In [115]:
# careful not to look for keys that are NOT in the dictionary
d['fourth']
In [116]:
# the "get" method allows a safe retrieval of a key
d.get('fourth')
In [117]:
# the "get" method returns a type "None" if the key is not present
# a different value can be specified in case of a missed key
d.get('fourth', 4)
Out[117]:
Note on the "None" type
In [118]:
n = None
n
In [119]:
print(n)
In [120]:
None + 1
In [121]:
# equivalent to False
if n is None:
print(1)
else:
print(0)
In [122]:
# we can explicitly test for a variable being "None"
value = d.get('fourth')
if value is None:
print('Key not found!')
In [123]:
range(10)
Out[123]:
In [124]:
my_list = [x*2 for x in range(10)]
my_list
Out[124]:
In [126]:
redundant_list = [x for x in my_list*2]
redundant_list
Out[126]:
In [127]:
my_set = {x for x in my_list*2}
my_set
Out[127]:
In [128]:
my_dict = {x:x+1 for x in my_list}
my_dict
Out[128]:
In [129]:
# if/else can be used in comprehension as well
even_numbers = [x for x in range(10) if not x%2]
even_numbers
Out[129]:
In [130]:
# if/else can also be used to assign values based on a test
even_numbers = ['odd' if x%2 else 'even' for x in range(10)]
even_numbers
Out[130]:
In [131]:
a = [1, 2, 3]
a
Out[131]:
In [132]:
b = a # is a reference
b[0] = 10
print(a)
In [133]:
# How to de-reference (copy) a list
a = [1, 2, 3]
# First, use the list() function
b1 = list(a)
# Second use the slice operator
b2 = a[:] # using slicing
b1[0] = 10
b2[0] = 10
a #unchanged
Out[133]:
In [134]:
# What about this ?
a = [1,2,3,[4,5,6]]
# copying the object
b = a[:]
b[3][0] = 10 # let us change the first item of the 4th item (4 to 10)
a
Out[134]:
Here we see that there is still a reference. When copying, a shallow copy is performed. You'll need to use the copy module.
In [135]:
from copy import deepcopy
a = [1,2,3,[4,5,6]]
b = deepcopy(a)
b[3][0] = 10 # let us change the first item of the 4th item (4 to 10)
a
Out[135]:
In [136]:
# if/elif/else
animals = ['dog', 'cat', 'cow']
if 'cats' in animals:
print('Cats found!')
elif 'cat' in animals:
print('Only one cat found!')
else:
print('Nothing found!')
In [137]:
# for loop
foods = ['pasta', 'rice', 'lasagna']
for food in foods:
print(food)
In [138]:
# nested for loops
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
for desert in deserts:
print(food, desert)
In [139]:
# for loop glueing together two lists
for animal, food in zip(animals, foods):
print('The {0} is eating {1}'.format(animal, food))
In [140]:
animals
Out[140]:
In [141]:
# "zip" will glue together lists only until the shortest one
foods = ['pasta', 'rice']
for animal, food in zip(animals, foods):
print('The {0} is eating {1}'.format(animal, food))
In [142]:
# while loop
counter = 0
while counter < 10:
counter += 1
counter
Out[142]:
In [143]:
# A normal loop
for value in range(10):
print(value)
In [144]:
# continue: skip the rest of the expression and go to the next element
for value in range(10):
if not value%3:
continue
print(value)
In [145]:
# break: exit the "for" loop
for value in range(10):
if not value%3:
break
print(value)
In [146]:
# break: exit the "for" loop
for value in range(1, 10):
if not value%3:
break
print(value)
In [147]:
# break and continue will only exit from the innermost loop
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
for desert in deserts:
if food == 'rice':
continue
print(food, desert)
In [148]:
# break and continue will only exit from the innermost loop
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
for desert in deserts:
if desert == 'biscuit':
break
print(food, desert)
In [149]:
d = {'first': 1,
'second': 2}
d['third']
In [150]:
# Exceptions can be intercepted and cashes can be avoided
try:
d['third']
except:
print('Key not present')
In [151]:
# Specific exceptions can be intercepted
try:
d['third']
except KeyError:
print('Key not present')
except:
print('Another error occurred')
In [152]:
# Specific exceptions can be intercepted
try:
d['second'].non_existent_method()
except KeyError:
print('Key not present')
except:
print('Another error occurred')
In [153]:
# The exception can be assigned to a variable to inspect it
try:
d['second'].non_existent_method()
except KeyError:
print('Key not present')
except Exception, e:
print('Another error occurred: {0}'.format(e))
In [154]:
# Exception can be created and "raised" by the user
if d['second'] == 2:
raise Exception('I don\'t like 2 as a number')
In [155]:
def sum_numbers(first, second):
return first + second
In [156]:
sum_numbers(1, 2)
Out[156]:
In [157]:
sum_numbers('one', 'two')
Out[157]:
In [158]:
sum_numbers('one', 2)
In [159]:
# positional vs. keyword arguments
def print_variables(first, second):
print('First variable: {0}'.format(first))
print('Second variable: {0}'.format(second))
In [160]:
print_variables(1, 2)
In [161]:
print_variables()
In [162]:
print_variables(1)
In [163]:
# positional vs. keyword arguments
def print_variables(first=None, second=None):
print('First variable: {0}'.format(first))
print('Second variable: {0}'.format(second))
In [164]:
print_variables()
In [165]:
print_variables(second=2)
In [189]:
mysequence = """>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA
"""
In [185]:
# First, we open a file in "w" write mode
fh = open("mysequence.fasta", "w")
# Second, we write the data into the file:
fh.write(mysequence)
# Third, we close:
fh.close()
Reading
In [186]:
# First, we open the file in read mode (r)
fh = open('mysequence.fasta', 'r')
# Second, we read the content of the file
data = fh.read()
# Third we close
fh.close()
# data is now a string that contains the content of the file being read
data
Out[186]:
In [187]:
print(data)
For both writing and reading you can use the context manager keyword "with" that will automatically close the file after using it, even in the case of an exception happening
Writing
In [190]:
# First, we open a file in "w" write mode with the context manager
with open("mysequence.fasta", "w") as fh:
# Second, we write the data into the file:
fh.write(mysequence)
# When getting out of the block, the file is automatically closed in a secure way
Reading
In [191]:
# First, we open the file in read mode (r) with the context manager
fh = open('mysequence.fasta', 'r')
# Second, we read the content of the file
data = fh.read()
# When getting out of the block, the file is automatically closed in a secure way
Notice the \n character (newline) in the string...
In [192]:
data
Out[192]:
In [193]:
data.split("\n")
Out[193]:
In [194]:
data.split("\n", 1)
Out[194]:
In [195]:
header, sequence = data.split("\n", 1)
In [196]:
header
Out[196]:
In [197]:
sequence
Out[197]:
In [198]:
# we want to get rid of the \n characters
seq1 = sequence.replace("\n","")
In [199]:
# another way is to use the split/join pair
seq2 = "".join(sequence.split("\n"))
In [200]:
seq1 == seq2
Out[200]:
In [201]:
# make sure that every letter is upper case
seq1 = seq1.upper()
In [202]:
# With the sequence, we can now play around
seq1.count('A')
Out[202]:
In [203]:
counter = {}
counter['A'] = seq1.count('A')
counter['T'] = seq1.count('T')
counter['C'] = seq1.count('C')
counter['G'] = seq1.count('G')
counter
Out[203]:
If a file is too big, using the "read" method could completely fill our memory! It is advisable to use a "for" loop.
In [204]:
for line in open('mysequence.fasta'):
# remove the newline character at the end of the line
# also removes spaces and tabs at the right end of the string
line = line.rstrip()
print(line)
In [205]:
header = ''
sequence = ''
for line in open('mysequence.fasta'):
# remove the newline character at the end of the line
# also removes spaces and tabs at the right end of the string
line = line.rstrip()
if line.startswith('>'):
header = line
else:
sequence += line
In [206]:
header
Out[206]:
In [207]:
sequence
Out[207]: