Basic Python and native data structures

In a nutshell

Scripting language
Multi-platform (OsX, Linux, Windows)
Battery-included
Lots of third-party library (catching up with R for computational biology)
Lots of help available online (e.g. stackoverflow)

"Scripting language" means:

no type declaration required.
many built-in data structures are already available: dictionary, lists...
no need for memory handling: there is a memory garbage collector

Multi-platform

Byte code can be executed on different platforms.

"Battery included" means:

Many modules are already provided (e.g. to parse csv files)
No need to install additional libraries for most simple tasks

All of that in a more poetic form



In [1]:

    
import this









    



The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Resources

Hello world example



In [2]:

    
print("hello")









    



hello

About indentation

Before starting, you need to know that in Python, code indentation is an essential part of the syntax. It is used to delimitate code blocks such as loops and functions. It may seem cumbersome, but it makes all Python code consistent and readable. The following code is incorrect:

>>> a = 1
>>>   b = 2

since the two statements are not aligned despite being part of the same block of statements (the main block). Instead, they must be indented in the same way:

>>> a = 1
>>> b = 2

Here is another example involving a loop and a function (def):

def example():
    for i in [1, 2, 3]:
        print(i)

In C, it may look like

void example(){
  int i;
  for (i=1; i<=3; i++){
      printf("%d\n", i);
  }
}

void example(){
int i;
for (i=1; i<=3; i++)
{
printf("%d\n", i);
}
}

Note: both tabs and spaces can be used to define the indentation, but conventionally 4 spaces are preferred.

Rules and conventions on naming variables

Variable names are unlimited in length
Variable names start with a letter or underscore _ followed by letters, numbers or underscores.
Variable names are case-sensitive
Variable names cannot be named with special keywords (see below)

Variable names conventionally have lower-case letters, with multiple words seprated by underscores.

Other rules and style conventions: PEP8 style recommendations (https://www.python.org/dev/peps/pep-0008/)

Basic numeric types

Integers



In [3]:

    
a = 10  
b = 2
a + b









    Out[3]:





12



In [4]:

    
# incremental operators
a = 10
a += 2    # equivalent to a = a + 2   (there is no ++ operators like in C/C++])
a









    Out[4]:





12



In [5]:

    
a = 10
a = a + 2
a









    Out[5]:





12

Boolean



In [6]:

    
test = True
if test:
    print(test)









    



True



In [7]:

    
test = False
if not test:
    print(test)









    



False



In [8]:

    
# Other types can be treated as boolean
# Main example are integers
true_value = 1
false_value = 0
if true_value:
    print(true_value)
if not false_value:
    print(false_value)

1
0

Integers, Long, Float and Complex



In [9]:

    
long_integer = 2**63

float1 = 2.1           
float2 = 2.0
float3 = 2.

complex_value = 1 + 2j



In [10]:

    
long_integer









    Out[10]:





9223372036854775808



In [11]:

    
float3









    Out[11]:





2.0

Basic mathematical operators



In [12]:

    
1 + 2









    Out[12]:





3



In [13]:

    
1 - 2









    Out[13]:





-1



In [14]:

    
3 * 2









    Out[14]:





6



In [1]:

    
3 / 2









    Out[1]:





1.5



In [16]:

    
3









    Out[16]:





3



In [17]:

    
float(3)









    Out[17]:





3.0



In [2]:

    
3 // 2









    Out[2]:





1



In [19]:

    
3 % 2









    Out[19]:





1



In [20]:

    
3 ** 2









    Out[20]:





9

Promotion: when you mix numeric types in an expression, all operands are converted (or coerced) to the type with highest precision



In [21]:

    
5 + 3.1









    Out[21]:





8.1

Converting types: casting

A variable belonging to one type can be converted to another type through "casting"



In [22]:

    
int(3.1)









    Out[22]:





3



In [23]:

    
float(3)









    Out[23]:





3.0



In [24]:

    
bool(1)









    Out[24]:





True



In [25]:

    
bool(0)









    Out[25]:





False

Keywords

keywords are special names that are part of the Python language.
A variable cannot be named after a keywords --> SyntaxError would be raised
The list of keywords can be obtained using these commands (import and print are themselves keywords that will be explained along this course)



In [26]:

    
import keyword
# Here we are using the "dot" operator, which allows us to access objects (variables, that is) attributes and functions
print(keyword.kwlist)









    



['False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']



In [28]:

    
raise = 1









    



  File "<ipython-input-28-bdd5c3e307c3>", line 1
    raise = 1
          ^
SyntaxError: invalid syntax

A note about objects

Everything in Python is an object, which can be seen as an advanced version of a variable
objects have methods
the dir keyword allows the user to discover them



In [29]:

    
print(dir(bool))









    



['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']

Importing standard python modules

Standard python modules are libraries that are available without the need to install additional software (they come together with the python interpreter). They only need to be imported. The import keyword allows us to import standard (and non standard) Python modules. Some common ones:

os
math
sys
urllib2
tens of others are available. See https://docs.python.org/2/py-modindex.html



In [30]:

    
import os
os.listdir('.')









    Out[30]:





['[0]-Introduction_to_Jupyter_Notebook.ipynb',
 '[4]-Useful_third_party_libraries_for_data_analysis.ipynb',
 'yeast.gexf.zip.1',
 'y2.txt',
 'Yeast.clu',
 '[3a]-Exercises.ipynb',
 '[3]-Data_visualization.ipynb',
 'Yeast.paj',
 '.ipynb_checkpoints',
 '1g59.pdb',
 'mysequence.fasta',
 'YeastL.net',
 'yeast.gexf',
 '[1a]-Exercises.ipynb',
 '[2a]-Exercices.ipynb',
 '.keep',
 'yeast.gexf.zip',
 'y1.txt',
 'yeast.zip',
 'YeastS.net',
 '[1]-Basic_python_and_native_python_data_structures.ipynb',
 '[2]-Advanced_data_structures-and-file-parsing.ipynb',
 'ecoli.fasta',
 '[4a]-Exercises.ipynb']



In [31]:

    
os.path.exists('data.txt')









    Out[31]:





False



In [32]:

    
os.path.isdir('.ipynb_checkpoints/')









    Out[32]:





True

Import comes in different flavors



In [33]:

    
import math
math.pi









    Out[33]:





3.141592653589793



In [34]:

    
from math import pi
pi









    Out[34]:





3.141592653589793



In [35]:

    
# alias are possible on the module itself
import math as m
m.pi









    Out[35]:





3.141592653589793



In [36]:

    
# or alias on the function/variable itself
from math import pi as PI
PI









    Out[36]:





3.141592653589793



In [38]:

    
# pi was deleted earlier and from math import pi as PI did not created pi 
# variable in the local space as expected hence the error
del pi
pi









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-38-b8f8306817c3> in <module>()
      1 # pi was deleted earlier and from math import pi as PI did not created pi
      2 # variable in the local space as expected hence the error
----> 3 del pi
      4 pi

NameError: name 'pi' is not defined



In [40]:

    
math.sqrt(4.)









    Out[40]:





2.0

Data structures

There are quite a few data structures available. The builtins data structures are:

lists
tuples
dictionaries
strings
sets

Lists, strings and tuples are ordered sequences of objects. Unlike strings that contain only characters, list and tuples can contain any type of objects. Lists and tuples are like arrays. Tuples like strings are immutables. Lists are mutables so they can be extended or reduced at will. Sets are mutable unordered sequence of unique elements.

Lists are enclosed in brackets:

l = [1, 2, "a"]

Tuples are enclosed in parentheses:

t = (1, 2, "a")

Tuples are faster and consume less memory.

Dictionaries are built with curly brackets:

d = {"a":1, "b":2}

Sets are made using the set builtin function. More about the data structures here below:

	immutable	mutable
ordered sequence	string
ordered sequence	tuple	list
unordered sequence		set
hash table		dict

Indexing starts at 0, like in C



In [41]:

    
s1 = "Example"
s1[0]









    Out[41]:





'E'



In [42]:

    
# last index is therefore the length of the string minus 1
s1[len(s1)-1]









    Out[42]:





'e'



In [43]:

    
s1[6]









    Out[43]:





'e'



In [44]:

    
# Negative index can be used to start from the end
s1[-2]









    Out[44]:





'l'



In [46]:

    
# Careful with indexing out of bounds
s1[100]









    



---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-46-b700b117c112> in <module>()
      1 # Careful with indexing out of bounds
----> 2 s1[100]

IndexError: string index out of range

Strings and slicing

There are 4 ways to represent strings:

with single quotes
with double quotes
with triple single quotes
with triple double quotes



In [47]:

    
"Simple string"









    Out[47]:





'Simple string'



In [48]:

    
'Simple string'









    Out[48]:





'Simple string'



In [49]:

    
#single quotes can be used to use double quotes and vice versa
"John's book"









    Out[49]:





"John's book"



In [50]:

    
#we can also use escaping
'John\'s book'









    Out[50]:





"John's book"



In [51]:

    
"""This is an example of 
a long string on several lines"""









    Out[51]:





'This is an example of \na long string on several lines'

A little bit more on the print function: formatting



In [52]:

    
print('This {0} is {1} on format'.format('example', 'based'))









    



This example is based on format



In [53]:

    
print('This {0} is {1} on format, isn't it a nice {0}?'.format('example', 'based'))









    



  File "<ipython-input-53-adf69de48897>", line 1
    print('This {0} is {1} on format, isn't it a nice {0}?'.format('example', 'based'))
                                          ^
SyntaxError: invalid syntax



In [54]:

    
# Notice the escaping of the quote char
print('This {0} is {1} on format, isn\'t it a nice {0}?'.format('example', 'based'))









    



This example is based on format, isn't it a nice example?



In [55]:

    
print("You can also use %s %s\n" % ('C-like', 'formatting'))









    



You can also use C-like formatting



In [56]:

    
print("You can also format integers %d\n" % (1))









    



You can also format integers 1



In [57]:

    
print("You can also specify the precision of floats: %f or %.20f\n" % (1., 1.))









    



You can also specify the precision of floats: 1.000000 or 1.00000000000000000000

String operations



In [58]:

    
s1 = "First string"
s2 = "Second string"
# + operator concatenates strings
s1 + " and " + s2









    Out[58]:





'First string and Second string'



In [59]:

    
# Strings are immutables
# Try
s1[0] = 'e'









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-59-0b3d9214f3f4> in <module>()
      1 # Strings are immutables
      2 # Try
----> 3 s1[0] = 'e'

TypeError: 'str' object does not support item assignment



In [60]:

    
# to change an item, you got to create a new string
'N' + s1[1:]









    Out[60]:





'Nirst string'

Slicing sequence syntax

- [start:end:step]   most general slicing
- [start:end:]      (step=1)
- [start:end]       (step=1)
- [start:]          (step=1,end=-1)
- [:]               (start=0,end=-1, step=1)
- [::2]             (start=0, end=-1, step=2)



In [61]:

    
s1 = 'Banana'
s1[1:6:2]









    Out[61]:





'aaa'



In [62]:

    
s = 'TEST'
s[-1:-4:-2]









    Out[62]:





'TE'



In [63]:

    
# slicing. using one : character means from start to end index.
s1 = "First string"
s1[:]









    Out[63]:





'First string'



In [64]:

    
s1[::2]









    Out[64]:





'Frtsrn'



In [65]:

    
# indexing
s1[0]









    Out[65]:





'F'

Other string operations



In [66]:

    
print(dir(s1))









    



['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Well, that's a lot ! Here are the common useful ones:

split
find
index
replace
lower
upper
endswith
startswith
strip



In [67]:

    
# split is very useful when parsing files
s = 'first second third'
s.split()









    Out[67]:





['first', 'second', 'third']



In [68]:

    
# a different character can be used as well as separator
s = 'first,second,third'
s.split(',')









    Out[68]:





['first', 'second', 'third']



In [69]:

    
# Upper is a very easy and handy method
s.upper()









    Out[69]:





'FIRST,SECOND,THIRD'



In [70]:

    
# Methods can be chained as well!
s.upper().lower().split(',')









    Out[70]:





['first', 'second', 'third']

Lists

The syntax to create a list can be the function list or square brackets []



In [71]:

    
# you can  any kind of objects in a lists. This is not an array !
l = [1, 'a', 3]
l









    Out[71]:





[1, 'a', 3]



In [72]:

    
# slicing and indexing like for strings are available
l[0]
l[0::2]









    Out[72]:





[1, 3]



In [73]:

    
l









    Out[73]:





[1, 'a', 3]



In [74]:

    
# list are mutable sequences:
l[1] = 2
l









    Out[74]:





[1, 2, 3]

Mathematical operators can be applied to lists as well



In [75]:

    
[1, 2] + [3, 4]









    Out[75]:





[1, 2, 3, 4]



In [76]:

    
[1, 2] * 10









    Out[76]:





[1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

Adding elements to a list: append Vs. expand

Lists have several methods amongst which the append and extend methods. The former appends an object to the end of the list (e.g., another list) while the later appends each element of the iterable object (e.g., anothee list) to the end of the list.

For example, we can append an object (here the character 'c') to the end of a simple list as follows:



In [77]:

    
stack = ['a','b']
stack.append('c')
stack









    Out[77]:





['a', 'b', 'c']



In [78]:

    
stack.append(['d', 'e', 'f'])
stack









    Out[78]:





['a', 'b', 'c', ['d', 'e', 'f']]



In [79]:

    
stack[3]









    Out[79]:





['d', 'e', 'f']

The object ['d', 'e', 'f'] has been appended to the exiistng list. However, it happens that sometimes what we want is to append the elements one by one of a given list rather the list itself. You can do that manually of course, but a better solution is to use the :func:extend() method as follows:



In [80]:

    
# the manual way
stack = ['a', 'b', 'c']
stack.append('d')
stack.append('e')
stack.append('f')
stack









    Out[80]:





['a', 'b', 'c', 'd', 'e', 'f']



In [81]:

    
# semi-manual way, using a "for" loop
stack = ['a', 'b', 'c']
to_add = ['d', 'e', 'f']
for element in to_add:
    stack.append(element)
stack









    Out[81]:





['a', 'b', 'c', 'd', 'e', 'f']



In [82]:

    
# the smarter way
stack = ['a', 'b', 'c']
stack.extend(['d', 'e','f'])
stack









    Out[82]:





['a', 'b', 'c', 'd', 'e', 'f']

Tuples

Tuples are sequences similar to lists but immutables. Use the parentheses to create a tuple



In [83]:

    
t = (1, 2, 3)
t









    Out[83]:





(1, 2, 3)



In [84]:

    
# simple creation:
t = 1, 2, 3
print(t)
t[0] = 3









    



(1, 2, 3)






    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-469366529247> in <module>()
      2 t = 1, 2, 3
      3 print(t)
----> 4 t[0] = 3

TypeError: 'tuple' object does not support item assignment



In [85]:

    
# Would this work?
(1)









    Out[85]:





1



In [86]:

    
# To enforce a tuple creation, add a comma
(1,)









    Out[86]:





(1,)

Same operators as lists



In [87]:

    
(1,) * 5









    Out[87]:





(1, 1, 1, 1, 1)



In [88]:

    
t1 = (1,0)
t1 += (1,)
t1









    Out[88]:





(1, 0, 1)

Why tuples instead of lists?

faster than list
protects the data (immutable)
tuples can be used as keys on dictionaries (more on that later)

Sets

Sets are constructed from a sequence (or some other iterable object). Since sets cannot have duplicates, there are usually used to build sequence of unique items (e.g., set of identifiers).

The syntax to create a set can be the function set or curly braces {}



In [89]:

    
a = {'1', '2', 'a', '4'}
a









    Out[89]:





{'1', '2', '4', 'a'}



In [90]:

    
a = [1, 1, 1, 2, 2, 3, 4]
a









    Out[90]:





[1, 1, 1, 2, 2, 3, 4]



In [91]:

    
a = {1, 2, 1, 2, 2, 3, 4}
a









    Out[91]:





{1, 2, 3, 4}



In [92]:

    
a = []
to_add = [1, 1, 1, 2, 2, 3, 4]
for element in to_add:
    if element in a:
        continue
    else:
        a.append(element)
a









    Out[92]:





[1, 2, 3, 4]



In [93]:

    
# Sets have the very handy "add" method
a = set()
to_add = [1, 1, 1, 2, 2, 3, 4]
for element in to_add:
    a.add(element)
a









    Out[93]:





{1, 2, 3, 4}

Sets have very interesting operators

What operators do we have ?

| for union
& for intersection
< for subset
- for difference
^ for symmetric difference



In [94]:

    
a = {'a', 'b', 'c'}
b = {'a', 'b', 'd'}
c = {'a', 'e', 'f'}



In [95]:

    
# intersection
a & b









    Out[95]:





{'a', 'b'}



In [96]:

    
# union
a | b









    Out[96]:





{'a', 'b', 'c', 'd'}



In [97]:

    
# difference
a - b









    Out[97]:





{'c'}



In [98]:

    
# symmetric difference
a ^ b









    Out[98]:





{'c', 'd'}



In [99]:

    
# is my set a subset of the other?
a < b









    Out[99]:





False



In [100]:

    
# operators can be chained as well
a & b & c









    Out[100]:





{'a'}



In [101]:

    
# the same operations can be performed using the operator's name
a.intersection(b).intersection(c)









    Out[101]:





{'a'}



In [102]:

    
# a more complex operation
a.intersection(b).difference(c)









    Out[102]:





{'b'}

Dictionaries

A dictionary is a sequence of items.
Each item is a pair made of a key and a value.
Dictionaries are unordered.
You can access to the list of keys or values independently.



In [104]:

    
d = {} # an empty dictionary



In [105]:

    
d = {'first':1, 'second':2} # initialise a dictionary



In [106]:

    
# access to value given a key:
d['first']









    Out[106]:





1



In [107]:

    
# add a new pair of key/value:
d['third'] = 3



In [108]:

    
# what are the keys ?
d.keys()









    Out[108]:





dict_keys(['second', 'first', 'third'])



In [109]:

    
# what are the values ?
d.values()









    Out[109]:





dict_values([2, 1, 3])



In [110]:

    
# what are the key/values pairs?
d.items()









    Out[110]:





dict_items([('second', 2), ('first', 1), ('third', 3)])



In [111]:

    
# can be used in a for loop as well
for key, value in d.items():
    print(key, value)









    



second 2
first 1
third 3



In [112]:

    
# Delete a key (and its value)
del d['third']
d









    Out[112]:





{'first': 1, 'second': 2}



In [113]:

    
# naive for loop approach:
for key in d.keys():
    print(key, d[key])









    



second 2
first 1



In [114]:

    
# no need to call the "keys" method explicitly
for key in d:
    print(key, d[key])









    



second 2
first 1



In [115]:

    
# careful not to look for keys that are NOT in the dictionary
d['fourth']









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-115-b5107ecd4c8a> in <module>()
      1 # careful not to look for keys that are NOT in the dictionary
----> 2 d['fourth']

KeyError: 'fourth'



In [116]:

    
# the "get" method allows a safe retrieval of a key
d.get('fourth')



In [117]:

    
# the "get" method returns a type "None" if the key is not present
# a different value can be specified in case of a missed key
d.get('fourth', 4)









    Out[117]:





4

Note on the "None" type



In [118]:

    
n = None
n



In [119]:

    
print(n)









    



None



In [120]:

    
None + 1









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-120-365f14229875> in <module>()
----> 1 None + 1

TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'



In [121]:

    
# equivalent to False
if n is None:
    print(1)
else:
    print(0)



In [122]:

    
# we can explicitly test for a variable being "None"
value = d.get('fourth')
if value is None:
    print('Key not found!')









    



Key not found!

Lists, sets and dictionary comprehension: more compact constructors

There is a more concise and advanced way to create a list, set or dictionary



In [123]:

    
range(10)









    Out[123]:





range(0, 10)



In [124]:

    
my_list = [x*2 for x in range(10)]
my_list









    Out[124]:





[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]



In [126]:

    
redundant_list = [x for x in my_list*2]
redundant_list









    Out[126]:





[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]



In [127]:

    
my_set = {x for x in my_list*2}
my_set









    Out[127]:





{0, 2, 4, 6, 8, 10, 12, 14, 16, 18}



In [128]:

    
my_dict = {x:x+1 for x in my_list}
my_dict









    Out[128]:





{0: 1, 2: 3, 4: 5, 6: 7, 8: 9, 10: 11, 12: 13, 14: 15, 16: 17, 18: 19}



In [129]:

    
# if/else can be used in comprehension as well
even_numbers = [x for x in range(10) if not x%2]
even_numbers









    Out[129]:





[0, 2, 4, 6, 8]



In [130]:

    
# if/else can also be used to assign values based on a test
even_numbers = ['odd' if x%2 else 'even' for x in range(10)]
even_numbers









    Out[130]:





['even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd']

On objects and references

A common source of errors for python beginners



In [131]:

    
a = [1, 2, 3]
a









    Out[131]:





[1, 2, 3]



In [132]:

    
b  = a # is a reference
b[0] = 10
print(a)









    



[10, 2, 3]



In [133]:

    
# How to de-reference (copy) a list
a = [1, 2, 3]

# First, use the list() function
b1 = list(a)

# Second use the slice operator
b2 = a[:]  # using slicing
b1[0] = 10
b2[0] = 10
a #unchanged









    Out[133]:





[1, 2, 3]



In [134]:

    
# What about this ?
a = [1,2,3,[4,5,6]]
# copying the object
b = a[:]
b[3][0] = 10 # let us change the first item of the 4th item (4 to 10)
a









    Out[134]:





[1, 2, 3, [10, 5, 6]]

Here we see that there is still a reference. When copying, a shallow copy is performed. You'll need to use the copy module.



In [135]:

    
from copy import deepcopy
a = [1,2,3,[4,5,6]]
b = deepcopy(a)
b[3][0] = 10 # let us change the first item of the 4th item (4 to 10)
a









    Out[135]:





[1, 2, 3, [4, 5, 6]]

Flow control operators



In [136]:

    
# if/elif/else
animals = ['dog', 'cat', 'cow']
if 'cats' in animals:
    print('Cats found!')
elif 'cat' in animals:
    print('Only one cat found!')
else:
    print('Nothing found!')









    



Only one cat found!



In [137]:

    
# for loop
foods = ['pasta', 'rice', 'lasagna']
for food in foods:
    print(food)









    



pasta
rice
lasagna



In [138]:

    
# nested for loops
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
    for desert in deserts:
        print(food, desert)









    



pasta cake
pasta biscuit
rice cake
rice biscuit
lasagna cake
lasagna biscuit



In [139]:

    
# for loop glueing together two lists
for animal, food in zip(animals, foods):
    print('The {0} is eating {1}'.format(animal, food))









    



The dog is eating pasta
The cat is eating rice
The cow is eating lasagna



In [140]:

    
animals









    Out[140]:





['dog', 'cat', 'cow']



In [141]:

    
# "zip" will glue together lists only until the shortest one 
foods = ['pasta', 'rice']
for animal, food in zip(animals, foods):
    print('The {0} is eating {1}'.format(animal, food))









    



The dog is eating pasta
The cat is eating rice



In [142]:

    
# while loop
counter = 0
while counter < 10:
    counter += 1
counter









    Out[142]:





10



In [143]:

    
# A normal loop
for value in range(10):
    print(value)



In [144]:

    
# continue: skip the rest of the expression and go to the next element
for value in range(10):
    if not value%3:
        continue
    print(value)



In [145]:

    
# break: exit the "for" loop
for value in range(10):
    if not value%3:
        break
    print(value)



In [146]:

    
# break: exit the "for" loop
for value in range(1, 10):
    if not value%3:
        break
    print(value)

1
2



In [147]:

    
# break and continue will only exit from the innermost loop
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
    for desert in deserts:
        if food == 'rice':
            continue
        print(food, desert)









    



pasta cake
pasta biscuit
lasagna cake
lasagna biscuit



In [148]:

    
# break and continue will only exit from the innermost loop
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
    for desert in deserts:
        if desert == 'biscuit':
            break
        print(food, desert)









    



pasta cake
rice cake
lasagna cake

Exceptions

Used to avoid crashes and handle unexpected errors



In [149]:

    
d = {'first': 1,
     'second': 2}
d['third']









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-149-e48bcf49137b> in <module>()
      1 d = {'first': 1,
      2      'second': 2}
----> 3 d['third']

KeyError: 'third'



In [150]:

    
# Exceptions can be intercepted and cashes can be avoided
try:
    d['third']
except:
    print('Key not present')









    



Key not present



In [151]:

    
# Specific exceptions can be intercepted
try:
    d['third']
except KeyError:
    print('Key not present')
except:
    print('Another error occurred')









    



Key not present



In [152]:

    
# Specific exceptions can be intercepted
try:
    d['second'].non_existent_method()
except KeyError:
    print('Key not present')
except:
    print('Another error occurred')









    



Another error occurred



In [153]:

    
# The exception can be assigned to a variable to inspect it
try:
    d['second'].non_existent_method()
except KeyError:
    print('Key not present')
except Exception, e:
    print('Another error occurred: {0}'.format(e))









    



  File "<ipython-input-153-cedd53e305e8>", line 6
    except Exception, e:
                    ^
SyntaxError: invalid syntax



In [154]:

    
# Exception can be created and "raised" by the user
if d['second'] == 2:
    raise Exception('I don\'t like 2 as a number')









    



---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-154-3bfde4ce523d> in <module>()
      1 # Exception can be created and "raised" by the user
      2 if d['second'] == 2:
----> 3     raise Exception('I don\'t like 2 as a number')

Exception: I don't like 2 as a number

Functions

Allows to re-use code in a flexible way



In [155]:

    
def sum_numbers(first, second):
    return first + second



In [156]:

    
sum_numbers(1, 2)









    Out[156]:





3



In [157]:

    
sum_numbers('one', 'two')









    Out[157]:





'onetwo'



In [158]:

    
sum_numbers('one', 2)









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-158-f4c3468a59df> in <module>()
----> 1 sum_numbers('one', 2)

<ipython-input-155-f60a4a054510> in sum_numbers(first, second)
      1 def sum_numbers(first, second):
----> 2     return first + second

TypeError: Can't convert 'int' object to str implicitly



In [159]:

    
# positional vs. keyword arguments
def print_variables(first, second):
    print('First variable: {0}'.format(first))
    print('Second variable: {0}'.format(second))



In [160]:

    
print_variables(1, 2)









    



First variable: 1
Second variable: 2



In [161]:

    
print_variables()









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-161-c5ae5e6c57c2> in <module>()
----> 1 print_variables()

TypeError: print_variables() missing 2 required positional arguments: 'first' and 'second'



In [162]:

    
print_variables(1)









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-162-16c4637c0fbb> in <module>()
----> 1 print_variables(1)

TypeError: print_variables() missing 1 required positional argument: 'second'



In [163]:

    
# positional vs. keyword arguments
def print_variables(first=None, second=None):
    print('First variable: {0}'.format(first))
    print('Second variable: {0}'.format(second))



In [164]:

    
print_variables()









    



First variable: None
Second variable: None



In [165]:

    
print_variables(second=2)









    



First variable: None
Second variable: 2

Simple file reading/writing

Writing



In [189]:

    
mysequence = """>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA
"""



In [185]:

    
# First, we open a file in "w" write mode
fh = open("mysequence.fasta", "w")
# Second, we write the data into the file:
fh.write(mysequence)
# Third, we close:
fh.close()

Reading



In [186]:

    
# First, we open the file in read mode (r)
fh = open('mysequence.fasta', 'r')
# Second, we read the content of the file
data = fh.read()
# Third we close
fh.close()
# data is now a string that contains the content of the file being read
data









    Out[186]:





'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2\nMNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n'



In [187]:

    
print(data)









    



>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA

For both writing and reading you can use the context manager keyword "with" that will automatically close the file after using it, even in the case of an exception happening

Writing



In [190]:

    
# First, we open a file in "w" write mode with the context manager
with open("mysequence.fasta", "w") as fh:
    # Second, we write the data into the file:
    fh.write(mysequence)
# When getting out of the block, the file is automatically closed in a secure way

Reading



In [191]:

    
# First, we open the file in read mode (r) with the context manager
fh = open('mysequence.fasta', 'r')
# Second, we read the content of the file
data = fh.read()
# When getting out of the block, the file is automatically closed in a secure way

Notice the \n character (newline) in the string...



In [192]:

    
data









    Out[192]:





'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2\nMNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n'



In [193]:

    
data.split("\n")









    Out[193]:





['>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2',
 'MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL',
 'KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP',
 'SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ',
 'VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL',
 'LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS',
 'NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED',
 'VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA',
 'PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL',
 'AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK',
 'LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN',
 'ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW',
 'MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA',
 'NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA',
 'HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP',
 'SAAQDMVERVKELGHSTQQFRRVLGQLAAA',
 '']



In [194]:

    
data.split("\n", 1)









    Out[194]:





['>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2',
 'MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n']



In [195]:

    
header, sequence = data.split("\n", 1)



In [196]:

    
header









    Out[196]:





'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2'



In [197]:

    
sequence









    Out[197]:





'MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n'



In [198]:

    
# we want to get rid of the \n characters
seq1 = sequence.replace("\n","")



In [199]:

    
# another way is to use the split/join pair
seq2 = "".join(sequence.split("\n"))



In [200]:

    
seq1 == seq2









    Out[200]:





True



In [201]:

    
# make sure that every letter is upper case
seq1 = seq1.upper()



In [202]:

    
# With the sequence, we can now play around 
seq1.count('A')









    Out[202]:





88



In [203]:

    
counter = {}
counter['A'] = seq1.count('A')
counter['T'] = seq1.count('T')
counter['C'] = seq1.count('C')
counter['G'] = seq1.count('G')
counter









    Out[203]:





{'A': 88, 'C': 4, 'G': 67, 'T': 44}

If a file is too big, using the "read" method could completely fill our memory! It is advisable to use a "for" loop.



In [204]:

    
for line in open('mysequence.fasta'):
    # remove the newline character at the end of the line
    # also removes spaces and tabs at the right end of the string
    line = line.rstrip()
    print(line)









    



>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA



In [205]:

    
header = ''
sequence = ''
for line in open('mysequence.fasta'):
    # remove the newline character at the end of the line
    # also removes spaces and tabs at the right end of the string
    line = line.rstrip()
    if line.startswith('>'):
        header = line
    else:
        sequence += line



In [206]:

    
header









    Out[206]:





'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2'



In [207]:

    
sequence









    Out[207]:





'MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRLKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTPSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHLLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPSNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAEDVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREAPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDLAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAKLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGNASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGWMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLANWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVAHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSPSAAQDMVERVKELGHSTQQFRRVLGQLAAA'