Basic Python and native data structures

In a nutshell

  • Scripting language
  • Multi-platform (OsX, Linux, Windows)
  • Battery-included
  • Lots of third-party library (catching up with R for computational biology)
  • Lots of help available online (e.g. stackoverflow)

"Scripting language" means:

  • no type declaration required.
  • many built-in data structures are already available: dictionary, lists...
  • no need for memory handling: there is a memory garbage collector

Multi-platform

  • Byte code can be executed on different platforms.

"Battery included" means:

  • Many modules are already provided (e.g. to parse csv files)
  • No need to install additional libraries for most simple tasks

All of that in a more poetic form


In [1]:
import this


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Hello world example


In [2]:
print("hello")


hello

About indentation

Before starting, you need to know that in Python, code indentation is an essential part of the syntax. It is used to delimitate code blocks such as loops and functions. It may seem cumbersome, but it makes all Python code consistent and readable. The following code is incorrect:

>>> a = 1
>>>   b = 2

since the two statements are not aligned despite being part of the same block of statements (the main block). Instead, they must be indented in the same way:

>>> a = 1
>>> b = 2

Here is another example involving a loop and a function (def):

def example():
    for i in [1, 2, 3]:
        print(i)

In C, it may look like

void example(){
  int i;
  for (i=1; i<=3; i++){
      printf("%d\n", i);
  }
}

OR

void example(){
int i;
for (i=1; i<=3; i++)
{
printf("%d\n", i);
}
}

Note: both tabs and spaces can be used to define the indentation, but conventionally 4 spaces are preferred.

Rules and conventions on naming variables

  • Variable names are unlimited in length
  • Variable names start with a letter or underscore _ followed by letters, numbers or underscores.
  • Variable names are case-sensitive
  • Variable names cannot be named with special keywords (see below)

Variable names conventionally have lower-case letters, with multiple words seprated by underscores.

Other rules and style conventions: PEP8 style recommendations (https://www.python.org/dev/peps/pep-0008/)

Basic numeric types

Integers


In [3]:
a = 10  
b = 2
a + b


Out[3]:
12

In [4]:
# incremental operators
a = 10
a += 2    # equivalent to a = a + 2   (there is no ++ operators like in C/C++])
a


Out[4]:
12

In [5]:
a = 10
a = a + 2
a


Out[5]:
12

Boolean


In [6]:
test = True
if test:
    print(test)


True

In [7]:
test = False
if not test:
    print(test)


False

In [8]:
# Other types can be treated as boolean
# Main example are integers
true_value = 1
false_value = 0
if true_value:
    print(true_value)
if not false_value:
    print(false_value)


1
0

Integers, Long, Float and Complex


In [9]:
long_integer = 2**63

float1 = 2.1           
float2 = 2.0
float3 = 2.

complex_value = 1 + 2j

In [10]:
long_integer


Out[10]:
9223372036854775808

In [11]:
float3


Out[11]:
2.0

Basic mathematical operators


In [12]:
1 + 2


Out[12]:
3

In [13]:
1 - 2


Out[13]:
-1

In [14]:
3 * 2


Out[14]:
6

In [1]:
3 / 2


Out[1]:
1.5

In [16]:
3


Out[16]:
3

In [17]:
float(3)


Out[17]:
3.0

In [2]:
3 // 2


Out[2]:
1

In [19]:
3 % 2


Out[19]:
1

In [20]:
3 ** 2


Out[20]:
9

Promotion: when you mix numeric types in an expression, all operands are converted (or coerced) to the type with highest precision


In [21]:
5 + 3.1


Out[21]:
8.1

Converting types: casting

A variable belonging to one type can be converted to another type through "casting"


In [22]:
int(3.1)


Out[22]:
3

In [23]:
float(3)


Out[23]:
3.0

In [24]:
bool(1)


Out[24]:
True

In [25]:
bool(0)


Out[25]:
False

Keywords

  • keywords are special names that are part of the Python language.
  • A variable cannot be named after a keywords --> SyntaxError would be raised
  • The list of keywords can be obtained using these commands (import and print are themselves keywords that will be explained along this course)

In [26]:
import keyword
# Here we are using the "dot" operator, which allows us to access objects (variables, that is) attributes and functions
print(keyword.kwlist)


['False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']

In [28]:
raise = 1


  File "<ipython-input-28-bdd5c3e307c3>", line 1
    raise = 1
          ^
SyntaxError: invalid syntax

A note about objects

  • Everything in Python is an object, which can be seen as an advanced version of a variable
  • objects have methods
  • the dir keyword allows the user to discover them

In [29]:
print(dir(bool))


['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']

Importing standard python modules

Standard python modules are libraries that are available without the need to install additional software (they come together with the python interpreter). They only need to be imported. The import keyword allows us to import standard (and non standard) Python modules. Some common ones:


In [30]:
import os
os.listdir('.')


Out[30]:
['[0]-Introduction_to_Jupyter_Notebook.ipynb',
 '[4]-Useful_third_party_libraries_for_data_analysis.ipynb',
 'yeast.gexf.zip.1',
 'y2.txt',
 'Yeast.clu',
 '[3a]-Exercises.ipynb',
 '[3]-Data_visualization.ipynb',
 'Yeast.paj',
 '.ipynb_checkpoints',
 '1g59.pdb',
 'mysequence.fasta',
 'YeastL.net',
 'yeast.gexf',
 '[1a]-Exercises.ipynb',
 '[2a]-Exercices.ipynb',
 '.keep',
 'yeast.gexf.zip',
 'y1.txt',
 'yeast.zip',
 'YeastS.net',
 '[1]-Basic_python_and_native_python_data_structures.ipynb',
 '[2]-Advanced_data_structures-and-file-parsing.ipynb',
 'ecoli.fasta',
 '[4a]-Exercises.ipynb']

In [31]:
os.path.exists('data.txt')


Out[31]:
False

In [32]:
os.path.isdir('.ipynb_checkpoints/')


Out[32]:
True

Import comes in different flavors


In [33]:
import math
math.pi


Out[33]:
3.141592653589793

In [34]:
from math import pi
pi


Out[34]:
3.141592653589793

In [35]:
# alias are possible on the module itself
import math as m
m.pi


Out[35]:
3.141592653589793

In [36]:
# or alias on the function/variable itself
from math import pi as PI
PI


Out[36]:
3.141592653589793

In [38]:
# pi was deleted earlier and from math import pi as PI did not created pi 
# variable in the local space as expected hence the error
del pi
pi


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-38-b8f8306817c3> in <module>()
      1 # pi was deleted earlier and from math import pi as PI did not created pi
      2 # variable in the local space as expected hence the error
----> 3 del pi
      4 pi

NameError: name 'pi' is not defined

In [40]:
math.sqrt(4.)


Out[40]:
2.0

Data structures

There are quite a few data structures available. The builtins data structures are:

  • lists
  • tuples
  • dictionaries
  • strings
  • sets

Lists, strings and tuples are ordered sequences of objects. Unlike strings that contain only characters, list and tuples can contain any type of objects. Lists and tuples are like arrays. Tuples like strings are immutables. Lists are mutables so they can be extended or reduced at will. Sets are mutable unordered sequence of unique elements.

Lists are enclosed in brackets:

l = [1, 2, "a"]

Tuples are enclosed in parentheses:

t = (1, 2, "a")

Tuples are faster and consume less memory.

Dictionaries are built with curly brackets:

d = {"a":1, "b":2}

Sets are made using the set builtin function. More about the data structures here below:

immutable mutable
ordered sequence string
ordered sequence tuple list
unordered sequence set
hash table dict

Indexing starts at 0, like in C


In [41]:
s1 = "Example"
s1[0]


Out[41]:
'E'

In [42]:
# last index is therefore the length of the string minus 1
s1[len(s1)-1]


Out[42]:
'e'

In [43]:
s1[6]


Out[43]:
'e'

In [44]:
# Negative index can be used to start from the end
s1[-2]


Out[44]:
'l'

In [46]:
# Careful with indexing out of bounds
s1[100]


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-46-b700b117c112> in <module>()
      1 # Careful with indexing out of bounds
----> 2 s1[100]

IndexError: string index out of range

Strings and slicing

There are 4 ways to represent strings:

  • with single quotes
  • with double quotes
  • with triple single quotes
  • with triple double quotes

In [47]:
"Simple string"


Out[47]:
'Simple string'

In [48]:
'Simple string'


Out[48]:
'Simple string'

In [49]:
#single quotes can be used to use double quotes and vice versa
"John's book"


Out[49]:
"John's book"

In [50]:
#we can also use escaping
'John\'s book'


Out[50]:
"John's book"

In [51]:
"""This is an example of 
a long string on several lines"""


Out[51]:
'This is an example of \na long string on several lines'

A little bit more on the print function: formatting


In [52]:
print('This {0} is {1} on format'.format('example', 'based'))


This example is based on format

In [53]:
print('This {0} is {1} on format, isn't it a nice {0}?'.format('example', 'based'))


  File "<ipython-input-53-adf69de48897>", line 1
    print('This {0} is {1} on format, isn't it a nice {0}?'.format('example', 'based'))
                                          ^
SyntaxError: invalid syntax

In [54]:
# Notice the escaping of the quote char
print('This {0} is {1} on format, isn\'t it a nice {0}?'.format('example', 'based'))


This example is based on format, isn't it a nice example?

In [55]:
print("You can also use %s %s\n" % ('C-like', 'formatting'))


You can also use C-like formatting


In [56]:
print("You can also format integers %d\n" % (1))


You can also format integers 1


In [57]:
print("You can also specify the precision of floats: %f or %.20f\n" % (1., 1.))


You can also specify the precision of floats: 1.000000 or 1.00000000000000000000

String operations


In [58]:
s1 = "First string"
s2 = "Second string"
# + operator concatenates strings
s1 + " and " + s2


Out[58]:
'First string and Second string'

In [59]:
# Strings are immutables
# Try
s1[0] = 'e'


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-59-0b3d9214f3f4> in <module>()
      1 # Strings are immutables
      2 # Try
----> 3 s1[0] = 'e'

TypeError: 'str' object does not support item assignment

In [60]:
# to change an item, you got to create a new string
'N' + s1[1:]


Out[60]:
'Nirst string'

Slicing sequence syntax

- [start:end:step]   most general slicing
- [start:end:]      (step=1)
- [start:end]       (step=1)
- [start:]          (step=1,end=-1)
- [:]               (start=0,end=-1, step=1)
- [::2]             (start=0, end=-1, step=2)

In [61]:
s1 = 'Banana'
s1[1:6:2]


Out[61]:
'aaa'

In [62]:
s = 'TEST'
s[-1:-4:-2]


Out[62]:
'TE'

In [63]:
# slicing. using one : character means from start to end index.
s1 = "First string"
s1[:]


Out[63]:
'First string'

In [64]:
s1[::2]


Out[64]:
'Frtsrn'

In [65]:
# indexing
s1[0]


Out[65]:
'F'

Other string operations


In [66]:
print(dir(s1))


['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Well, that's a lot ! Here are the common useful ones:

  • split
  • find
  • index
  • replace
  • lower
  • upper
  • endswith
  • startswith
  • strip

In [67]:
# split is very useful when parsing files
s = 'first second third'
s.split()


Out[67]:
['first', 'second', 'third']

In [68]:
# a different character can be used as well as separator
s = 'first,second,third'
s.split(',')


Out[68]:
['first', 'second', 'third']

In [69]:
# Upper is a very easy and handy method
s.upper()


Out[69]:
'FIRST,SECOND,THIRD'

In [70]:
# Methods can be chained as well!
s.upper().lower().split(',')


Out[70]:
['first', 'second', 'third']

Lists

The syntax to create a list can be the function list or square brackets []


In [71]:
# you can  any kind of objects in a lists. This is not an array !
l = [1, 'a', 3]
l


Out[71]:
[1, 'a', 3]

In [72]:
# slicing and indexing like for strings are available
l[0]
l[0::2]


Out[72]:
[1, 3]

In [73]:
l


Out[73]:
[1, 'a', 3]

In [74]:
# list are mutable sequences:
l[1] = 2
l


Out[74]:
[1, 2, 3]

Mathematical operators can be applied to lists as well


In [75]:
[1, 2] + [3, 4]


Out[75]:
[1, 2, 3, 4]

In [76]:
[1, 2] * 10


Out[76]:
[1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

Adding elements to a list: append Vs. expand

Lists have several methods amongst which the append and extend methods. The former appends an object to the end of the list (e.g., another list) while the later appends each element of the iterable object (e.g., anothee list) to the end of the list.

For example, we can append an object (here the character 'c') to the end of a simple list as follows:


In [77]:
stack = ['a','b']
stack.append('c')
stack


Out[77]:
['a', 'b', 'c']

In [78]:
stack.append(['d', 'e', 'f'])
stack


Out[78]:
['a', 'b', 'c', ['d', 'e', 'f']]

In [79]:
stack[3]


Out[79]:
['d', 'e', 'f']

The object ['d', 'e', 'f'] has been appended to the exiistng list. However, it happens that sometimes what we want is to append the elements one by one of a given list rather the list itself. You can do that manually of course, but a better solution is to use the :func:extend() method as follows:


In [80]:
# the manual way
stack = ['a', 'b', 'c']
stack.append('d')
stack.append('e')
stack.append('f')
stack


Out[80]:
['a', 'b', 'c', 'd', 'e', 'f']

In [81]:
# semi-manual way, using a "for" loop
stack = ['a', 'b', 'c']
to_add = ['d', 'e', 'f']
for element in to_add:
    stack.append(element)
stack


Out[81]:
['a', 'b', 'c', 'd', 'e', 'f']

In [82]:
# the smarter way
stack = ['a', 'b', 'c']
stack.extend(['d', 'e','f'])
stack


Out[82]:
['a', 'b', 'c', 'd', 'e', 'f']

Tuples

Tuples are sequences similar to lists but immutables. Use the parentheses to create a tuple


In [83]:
t = (1, 2, 3)
t


Out[83]:
(1, 2, 3)

In [84]:
# simple creation:
t = 1, 2, 3
print(t)
t[0] = 3


(1, 2, 3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-469366529247> in <module>()
      2 t = 1, 2, 3
      3 print(t)
----> 4 t[0] = 3

TypeError: 'tuple' object does not support item assignment

In [85]:
# Would this work?
(1)


Out[85]:
1

In [86]:
# To enforce a tuple creation, add a comma
(1,)


Out[86]:
(1,)

Same operators as lists


In [87]:
(1,) * 5


Out[87]:
(1, 1, 1, 1, 1)

In [88]:
t1 = (1,0)
t1 += (1,)
t1


Out[88]:
(1, 0, 1)

Why tuples instead of lists?

  • faster than list
  • protects the data (immutable)
  • tuples can be used as keys on dictionaries (more on that later)

Sets

Sets are constructed from a sequence (or some other iterable object). Since sets cannot have duplicates, there are usually used to build sequence of unique items (e.g., set of identifiers).

The syntax to create a set can be the function set or curly braces {}


In [89]:
a = {'1', '2', 'a', '4'}
a


Out[89]:
{'1', '2', '4', 'a'}

In [90]:
a = [1, 1, 1, 2, 2, 3, 4]
a


Out[90]:
[1, 1, 1, 2, 2, 3, 4]

In [91]:
a = {1, 2, 1, 2, 2, 3, 4}
a


Out[91]:
{1, 2, 3, 4}

In [92]:
a = []
to_add = [1, 1, 1, 2, 2, 3, 4]
for element in to_add:
    if element in a:
        continue
    else:
        a.append(element)
a


Out[92]:
[1, 2, 3, 4]

In [93]:
# Sets have the very handy "add" method
a = set()
to_add = [1, 1, 1, 2, 2, 3, 4]
for element in to_add:
    a.add(element)
a


Out[93]:
{1, 2, 3, 4}

Sets have very interesting operators

What operators do we have ?

  • | for union
  • & for intersection
  • < for subset
  • - for difference
  • ^ for symmetric difference

In [94]:
a = {'a', 'b', 'c'}
b = {'a', 'b', 'd'}
c = {'a', 'e', 'f'}

In [95]:
# intersection
a & b


Out[95]:
{'a', 'b'}

In [96]:
# union
a | b


Out[96]:
{'a', 'b', 'c', 'd'}

In [97]:
# difference
a - b


Out[97]:
{'c'}

In [98]:
# symmetric difference
a ^ b


Out[98]:
{'c', 'd'}

In [99]:
# is my set a subset of the other?
a < b


Out[99]:
False

In [100]:
# operators can be chained as well
a & b & c


Out[100]:
{'a'}

In [101]:
# the same operations can be performed using the operator's name
a.intersection(b).intersection(c)


Out[101]:
{'a'}

In [102]:
# a more complex operation
a.intersection(b).difference(c)


Out[102]:
{'b'}

Dictionaries

  • A dictionary is a sequence of items.
  • Each item is a pair made of a key and a value.
  • Dictionaries are unordered.
  • You can access to the list of keys or values independently.

In [104]:
d = {} # an empty dictionary

In [105]:
d = {'first':1, 'second':2} # initialise a dictionary

In [106]:
# access to value given a key:
d['first']


Out[106]:
1

In [107]:
# add a new pair of key/value:
d['third'] = 3

In [108]:
# what are the keys ?
d.keys()


Out[108]:
dict_keys(['second', 'first', 'third'])

In [109]:
# what are the values ?
d.values()


Out[109]:
dict_values([2, 1, 3])

In [110]:
# what are the key/values pairs?
d.items()


Out[110]:
dict_items([('second', 2), ('first', 1), ('third', 3)])

In [111]:
# can be used in a for loop as well
for key, value in d.items():
    print(key, value)


second 2
first 1
third 3

In [112]:
# Delete a key (and its value)
del d['third']
d


Out[112]:
{'first': 1, 'second': 2}

In [113]:
# naive for loop approach:
for key in d.keys():
    print(key, d[key])


second 2
first 1

In [114]:
# no need to call the "keys" method explicitly
for key in d:
    print(key, d[key])


second 2
first 1

In [115]:
# careful not to look for keys that are NOT in the dictionary
d['fourth']


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-115-b5107ecd4c8a> in <module>()
      1 # careful not to look for keys that are NOT in the dictionary
----> 2 d['fourth']

KeyError: 'fourth'

In [116]:
# the "get" method allows a safe retrieval of a key
d.get('fourth')

In [117]:
# the "get" method returns a type "None" if the key is not present
# a different value can be specified in case of a missed key
d.get('fourth', 4)


Out[117]:
4

Note on the "None" type


In [118]:
n = None
n

In [119]:
print(n)


None

In [120]:
None + 1


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-120-365f14229875> in <module>()
----> 1 None + 1

TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

In [121]:
# equivalent to False
if n is None:
    print(1)
else:
    print(0)


1

In [122]:
# we can explicitly test for a variable being "None"
value = d.get('fourth')
if value is None:
    print('Key not found!')


Key not found!

Lists, sets and dictionary comprehension: more compact constructors

There is a more concise and advanced way to create a list, set or dictionary


In [123]:
range(10)


Out[123]:
range(0, 10)

In [124]:
my_list = [x*2 for x in range(10)]
my_list


Out[124]:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [126]:
redundant_list = [x for x in my_list*2]
redundant_list


Out[126]:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [127]:
my_set = {x for x in my_list*2}
my_set


Out[127]:
{0, 2, 4, 6, 8, 10, 12, 14, 16, 18}

In [128]:
my_dict = {x:x+1 for x in my_list}
my_dict


Out[128]:
{0: 1, 2: 3, 4: 5, 6: 7, 8: 9, 10: 11, 12: 13, 14: 15, 16: 17, 18: 19}

In [129]:
# if/else can be used in comprehension as well
even_numbers = [x for x in range(10) if not x%2]
even_numbers


Out[129]:
[0, 2, 4, 6, 8]

In [130]:
# if/else can also be used to assign values based on a test
even_numbers = ['odd' if x%2 else 'even' for x in range(10)]
even_numbers


Out[130]:
['even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd']

On objects and references

A common source of errors for python beginners


In [131]:
a = [1, 2, 3]
a


Out[131]:
[1, 2, 3]

In [132]:
b  = a # is a reference
b[0] = 10
print(a)


[10, 2, 3]

In [133]:
# How to de-reference (copy) a list
a = [1, 2, 3]

# First, use the list() function
b1 = list(a)

# Second use the slice operator
b2 = a[:]  # using slicing
b1[0] = 10
b2[0] = 10
a #unchanged


Out[133]:
[1, 2, 3]

In [134]:
# What about this ?
a = [1,2,3,[4,5,6]]
# copying the object
b = a[:]
b[3][0] = 10 # let us change the first item of the 4th item (4 to 10)
a


Out[134]:
[1, 2, 3, [10, 5, 6]]

Here we see that there is still a reference. When copying, a shallow copy is performed. You'll need to use the copy module.


In [135]:
from copy import deepcopy
a = [1,2,3,[4,5,6]]
b = deepcopy(a)
b[3][0] = 10 # let us change the first item of the 4th item (4 to 10)
a


Out[135]:
[1, 2, 3, [4, 5, 6]]

Flow control operators


In [136]:
# if/elif/else
animals = ['dog', 'cat', 'cow']
if 'cats' in animals:
    print('Cats found!')
elif 'cat' in animals:
    print('Only one cat found!')
else:
    print('Nothing found!')


Only one cat found!

In [137]:
# for loop
foods = ['pasta', 'rice', 'lasagna']
for food in foods:
    print(food)


pasta
rice
lasagna

In [138]:
# nested for loops
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
    for desert in deserts:
        print(food, desert)


pasta cake
pasta biscuit
rice cake
rice biscuit
lasagna cake
lasagna biscuit

In [139]:
# for loop glueing together two lists
for animal, food in zip(animals, foods):
    print('The {0} is eating {1}'.format(animal, food))


The dog is eating pasta
The cat is eating rice
The cow is eating lasagna

In [140]:
animals


Out[140]:
['dog', 'cat', 'cow']

In [141]:
# "zip" will glue together lists only until the shortest one 
foods = ['pasta', 'rice']
for animal, food in zip(animals, foods):
    print('The {0} is eating {1}'.format(animal, food))


The dog is eating pasta
The cat is eating rice

In [142]:
# while loop
counter = 0
while counter < 10:
    counter += 1
counter


Out[142]:
10

In [143]:
# A normal loop
for value in range(10):
    print(value)


0
1
2
3
4
5
6
7
8
9

In [144]:
# continue: skip the rest of the expression and go to the next element
for value in range(10):
    if not value%3:
        continue
    print(value)


1
2
4
5
7
8

In [145]:
# break: exit the "for" loop
for value in range(10):
    if not value%3:
        break
    print(value)

In [146]:
# break: exit the "for" loop
for value in range(1, 10):
    if not value%3:
        break
    print(value)


1
2

In [147]:
# break and continue will only exit from the innermost loop
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
    for desert in deserts:
        if food == 'rice':
            continue
        print(food, desert)


pasta cake
pasta biscuit
lasagna cake
lasagna biscuit

In [148]:
# break and continue will only exit from the innermost loop
foods = ['pasta', 'rice', 'lasagna']
deserts = ['cake', 'biscuit']
for food in foods:
    for desert in deserts:
        if desert == 'biscuit':
            break
        print(food, desert)


pasta cake
rice cake
lasagna cake

Exceptions

Used to avoid crashes and handle unexpected errors


In [149]:
d = {'first': 1,
     'second': 2}
d['third']


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-149-e48bcf49137b> in <module>()
      1 d = {'first': 1,
      2      'second': 2}
----> 3 d['third']

KeyError: 'third'

In [150]:
# Exceptions can be intercepted and cashes can be avoided
try:
    d['third']
except:
    print('Key not present')


Key not present

In [151]:
# Specific exceptions can be intercepted
try:
    d['third']
except KeyError:
    print('Key not present')
except:
    print('Another error occurred')


Key not present

In [152]:
# Specific exceptions can be intercepted
try:
    d['second'].non_existent_method()
except KeyError:
    print('Key not present')
except:
    print('Another error occurred')


Another error occurred

In [153]:
# The exception can be assigned to a variable to inspect it
try:
    d['second'].non_existent_method()
except KeyError:
    print('Key not present')
except Exception, e:
    print('Another error occurred: {0}'.format(e))


  File "<ipython-input-153-cedd53e305e8>", line 6
    except Exception, e:
                    ^
SyntaxError: invalid syntax

In [154]:
# Exception can be created and "raised" by the user
if d['second'] == 2:
    raise Exception('I don\'t like 2 as a number')


---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-154-3bfde4ce523d> in <module>()
      1 # Exception can be created and "raised" by the user
      2 if d['second'] == 2:
----> 3     raise Exception('I don\'t like 2 as a number')

Exception: I don't like 2 as a number

Functions

Allows to re-use code in a flexible way


In [155]:
def sum_numbers(first, second):
    return first + second

In [156]:
sum_numbers(1, 2)


Out[156]:
3

In [157]:
sum_numbers('one', 'two')


Out[157]:
'onetwo'

In [158]:
sum_numbers('one', 2)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-158-f4c3468a59df> in <module>()
----> 1 sum_numbers('one', 2)

<ipython-input-155-f60a4a054510> in sum_numbers(first, second)
      1 def sum_numbers(first, second):
----> 2     return first + second

TypeError: Can't convert 'int' object to str implicitly

In [159]:
# positional vs. keyword arguments
def print_variables(first, second):
    print('First variable: {0}'.format(first))
    print('Second variable: {0}'.format(second))

In [160]:
print_variables(1, 2)


First variable: 1
Second variable: 2

In [161]:
print_variables()


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-161-c5ae5e6c57c2> in <module>()
----> 1 print_variables()

TypeError: print_variables() missing 2 required positional arguments: 'first' and 'second'

In [162]:
print_variables(1)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-162-16c4637c0fbb> in <module>()
----> 1 print_variables(1)

TypeError: print_variables() missing 1 required positional argument: 'second'

In [163]:
# positional vs. keyword arguments
def print_variables(first=None, second=None):
    print('First variable: {0}'.format(first))
    print('Second variable: {0}'.format(second))

In [164]:
print_variables()


First variable: None
Second variable: None

In [165]:
print_variables(second=2)


First variable: None
Second variable: 2

Simple file reading/writing

Writing


In [189]:
mysequence = """>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA
"""

In [185]:
# First, we open a file in "w" write mode
fh = open("mysequence.fasta", "w")
# Second, we write the data into the file:
fh.write(mysequence)
# Third, we close:
fh.close()

Reading


In [186]:
# First, we open the file in read mode (r)
fh = open('mysequence.fasta', 'r')
# Second, we read the content of the file
data = fh.read()
# Third we close
fh.close()
# data is now a string that contains the content of the file being read
data


Out[186]:
'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2\nMNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n'

In [187]:
print(data)


>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA

For both writing and reading you can use the context manager keyword "with" that will automatically close the file after using it, even in the case of an exception happening

Writing


In [190]:
# First, we open a file in "w" write mode with the context manager
with open("mysequence.fasta", "w") as fh:
    # Second, we write the data into the file:
    fh.write(mysequence)
# When getting out of the block, the file is automatically closed in a secure way

Reading


In [191]:
# First, we open the file in read mode (r) with the context manager
fh = open('mysequence.fasta', 'r')
# Second, we read the content of the file
data = fh.read()
# When getting out of the block, the file is automatically closed in a secure way

Notice the \n character (newline) in the string...


In [192]:
data


Out[192]:
'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2\nMNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n'

In [193]:
data.split("\n")


Out[193]:
['>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2',
 'MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL',
 'KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP',
 'SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ',
 'VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL',
 'LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS',
 'NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED',
 'VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA',
 'PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL',
 'AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK',
 'LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN',
 'ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW',
 'MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA',
 'NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA',
 'HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP',
 'SAAQDMVERVKELGHSTQQFRRVLGQLAAA',
 '']

In [194]:
data.split("\n", 1)


Out[194]:
['>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2',
 'MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n']

In [195]:
header, sequence = data.split("\n", 1)

In [196]:
header


Out[196]:
'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2'

In [197]:
sequence


Out[197]:
'MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL\nKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP\nSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ\nVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL\nLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS\nNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED\nVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA\nPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL\nAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK\nLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN\nASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW\nMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA\nNWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA\nHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP\nSAAQDMVERVKELGHSTQQFRRVLGQLAAA\n'

In [198]:
# we want to get rid of the \n characters
seq1 = sequence.replace("\n","")

In [199]:
# another way is to use the split/join pair
seq2 = "".join(sequence.split("\n"))

In [200]:
seq1 == seq2


Out[200]:
True

In [201]:
# make sure that every letter is upper case
seq1 = seq1.upper()

In [202]:
# With the sequence, we can now play around 
seq1.count('A')


Out[202]:
88

In [203]:
counter = {}
counter['A'] = seq1.count('A')
counter['T'] = seq1.count('T')
counter['C'] = seq1.count('C')
counter['G'] = seq1.count('G')
counter


Out[203]:
{'A': 88, 'C': 4, 'G': 67, 'T': 44}

If a file is too big, using the "read" method could completely fill our memory! It is advisable to use a "for" loop.


In [204]:
for line in open('mysequence.fasta'):
    # remove the newline character at the end of the line
    # also removes spaces and tabs at the right end of the string
    line = line.rstrip()
    print(line)


>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2
MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRL
KILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTP
SKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQ
VPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHL
LAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPS
NHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAED
VYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREA
PAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDL
AGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAK
LSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGN
ASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGW
MEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLA
NWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVA
HSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSP
SAAQDMVERVKELGHSTQQFRRVLGQLAAA

In [205]:
header = ''
sequence = ''
for line in open('mysequence.fasta'):
    # remove the newline character at the end of the line
    # also removes spaces and tabs at the right end of the string
    line = line.rstrip()
    if line.startswith('>'):
        header = line
    else:
        sequence += line

In [206]:
header


Out[206]:
'>sp|P56945|BCAR1_HUMAN Breast cancer anti-estrogen resistance protein 1 OS=Homo sapiens GN=BCAR1 PE=1 SV=2'

In [207]:
sequence


Out[207]:
'MNHLNVLAKALYDNVAESPDELSFRKGDIMTVLEQDTQGLDGWWLCSLHGRQGIVPGNRLKILVGMYDKKPAGPGPGPPATPAQPQPGLHAPAPPASQYTPMLPNTYQPQPDSVYLVPTPSKAQQGLYQVPGPSPQFQSPPAKQTSTFSKQTPHHPFPSPATDLYQVPPGPGGPAQDIYQVPPSAGMGHDIYQVPPSMDTRSWEGTKPPAKVVVPTRVGQGYVYEAAQPEQDEYDIPRHLLAPGPQDIYDVPPVRGLLPSQYGQEVYDTPPMAVKGPNGRDPLLEVYDVPPSVEKGLPPSNHHAVYDVPPSVSKDVPDGPLLREETYDVPPAFAKAKPFDPARTPLVLAAPPPDSPPAEDVYDVPPPAPDLYDVPPGLRRPGPGTLYDVPRERVLPPEVADGGVVDSGVYAVPPPAEREAPAEGKRLSASSTGSTRSSQSASSLEVAGPGREPLELEVAVEALARLQQGVSATVAHLLDLAGSAGATGSWRSPSEPQEPLVQDLQAAVAAVQSAVHELLEFARSAVGNAAHTSDRALHAKLSRQLQKMEDVHQTLVAHGQALDAGRGGSGATLEDLDRLVACSRAVPEDAKQLASFLHGNASLLFRRTKATAPGPEGGGTLHPNPTDKTSSIQSRPLPSPPKFTSQDSPDGQYENSEGGWMEDYDYVHLQGKEEFEKTQKELLEKGSITRQGKSQLELQQLKQFERLEQEVSRPIDHDLANWTPAQPLAPGRTGGLGPSDRQLLLFYLEQCEANLTTLTNAVDAFFTAVATNQPPKIFVAHSKFVILSAHKLVFIGDTLSRQAKAADVRSQVTHYSNLLCDLLRGIVATTKAAALQYPSPSAAQDMVERVKELGHSTQQFRRVLGQLAAA'