As well as the basic data types we introduced above, very commonly you will want to store and operate on collections of values, and python has several data structures that you can use to do this. The general idea is that you can place several items into a single collection and then refer to that collection as a whole. Which one you will use will depend on what problem you are trying to solve.
A tuple is created by using round brackets around the items it contains, with commas seperating the individual elements.
In [ ]:
a = (123, 54, 92) # tuple of 4 integers
b = () # empty tuple
c = ("Ala",) # tuple of a single string (note the trailing ",")
d = (2, 3, False, "Arg", None) # a tuple of mixed types
print(a)
print(b)
print(c)
print(d)
You can of course use variables in tuples and other data structures
In [ ]:
x = 1.2
y = -0.3
z = 0.9
t = (x, y, z)
print(t)
Tuples can be packed and unpacked with a convenient syntax. The number of variables used to unpack the tuple must match the number of elements in the tuple.
In [ ]:
t = 2, 3, 4 # tuple packing
print('t is', t)
x, y, z = t # tuple unpacking
print('x is', x)
print('y is', y)
print('z is', z)
In [ ]:
a = [1, 3, 9]
b = ["ATG"]
c = []
print(a)
print(b)
print(c)
Lists and tuples can contain other list and tuples, or any other type of collection:
In [ ]:
matrix = [[1, 0], [0, 2]]
print(matrix)
You can convert between tuples and lists with the tuple and list functions. Note that these create a new collection with the same items, and leave the original unaffected.
In [ ]:
a = (1, 4, 9, 16) # A tuple of numbers
b = ['G','C','A','T'] # A list of characters
print(a)
print(b)
l = list(a) # Make a list based on a tuple
print(l)
t = tuple(b) # Make a tuple based on a list
print(t)
In [ ]:
t = (123, 54, 92, 87, 33)
x = [123, 54, 92, 87, 33]
print('t is', t)
print('t[0] is', t[0])
print('t[2] is', t[2])
print('x is', x)
print('x[-1] is', x[-1])
In [ ]:
t = (123, 54, 92, 87, 33)
x = [123, 54, 92, 87, 33]
print('t[1:3] is', t[1:3])
print('x[2:] is', x[2:])
print('x[:-1] is', x[:-1])
In [ ]:
t = (123, 54, 92, 87, 33)
x = [123, 54, 92, 87, 33]
print('123 in', x, 123 in x)
print('234 in', t, 234 in t)
print('999 not in', x, 999 not in x)
In [ ]:
t = (123, 54, 92, 87, 33)
x = [123, 54, 92, 87, 33]
print("length of t is", len(t))
print("number of 33s in x is", x.count(33))
In [ ]:
x = [123, 54, 92, 87, 33]
print(x)
x[2] = 33
print(x)
Tuples cannot be altered once they have been created, if you try to do so, you'll get an error.
In [ ]:
t = (123, 54, 92, 87, 33)
print(t)
t[1] = 4
You can add elements to the end of a list with append()
In [ ]:
x = [123, 54, 92, 87, 33]
x.append(101)
print(x)
or insert values at a certain position with insert(), by supplying the desired position as well as the new value
In [ ]:
x = [123, 54, 92, 87, 33]
x.insert(3, 1111)
print(x)
You can remove values with remove()
In [ ]:
x = [123, 54, 92, 87, 33]
x.remove(123)
print(x)
and delete values by index with del
In [ ]:
x = [123, 54, 92, 87, 33]
print(x)
del x[0]
print(x)
It's often useful to be able to combine arrays together, which can be done with extend() (as append would add the whole list as a single element in the list)
In [ ]:
a = [1,2,3]
b = [4,5,6]
a.extend(b)
print(a)
a.append(b)
print(a)
The plus symbol + is shorthand for the extend operation when applied to lists:
In [ ]:
a = [1, 2, 3]
b = [4, 5, 6]
a = a + b
print(a)
Slice syntax can be used on the left hand side of an assignment operation to assign subregions of a list
In [ ]:
a = [1, 2, 3, 4, 5, 6]
a[1:3] = [9, 9, 9, 9]
print(a)
You can change the order of elements in a list
In [ ]:
a = [1, 3, 5, 4, 2]
a.reverse()
print(a)
a.sort()
print(a)
Note that both of these change the list, if you want a sorted copy of the list while leaving the original untouched, use sorted()
In [ ]:
a = [2, 5, 7, 1]
b = sorted(a)
print(a)
print(b)
The most useful information is online on https://www.python.org/ website and should be used as a reference guide.
help()
In [ ]:
help(len)
In [ ]:
help(list)
In [ ]:
help(list.insert)
In [ ]:
help(list.count)
In [ ]:
text = "ATGTCATTTGT"
print(text[0])
print(text[-2])
print(text[0:6])
print("ATG" in text)
print("TGA" in text)
print(len(text))
Just as with tuples, trying to assign a value to an element of a string results in an error
In [ ]:
text = "ATGTCATTTGT"
text[0:2] = "CCC"
Python provides a number of useful functions that let you manipulate strings
The in operator lets you check if a substring is contained within a larger string, but it does not tell you where the substring is located. This is often useful to know and python provides the .find() method which returns the index of the first occurrence of the search string, and the .rfind() method to start searching from the end of the string.
If the search string is not found in the string both these methods return -1.
In [ ]:
dna = "ATGTCACCGTTT"
index = dna.find("TCA")
print("TCA is at position:", index)
index = dna.rfind('C')
print("The last Cytosine is at position:", index)
print("Position of a stop codon:", dna.find("TGA"))
When we are reading text from files (which we will see later on), often there is unwanted whitespace at the start or end of the string. We can remove leading whitespace with the .lstrip() method, trailing whitespace with .rstrip(), and whitespace from both ends with .strip().
All of these methods return a copy of the changed string, so if you want to replace the original you can assign the result of the method call to the original variable.
In [ ]:
s = " Chromosome Start End "
print(len(s), s)
s = s.lstrip()
print(len(s), s)
s = s.rstrip()
print(len(s), s)
s = " Chromosome Start End "
s = s.strip()
print(len(s), s)
You can split a string into a list of substrings using the .split() method, supplying the delimiter as an argument to the method. If you don't supply any delimiter the method will split the string on whitespace by default (which is very often what you want!)
To split a string into its component characters you can simply cast the string to a list
In [ ]:
seq = "ATG TCA CCG GGC"
codons = seq.split(" ")
print(codons)
bases = list(seq) # a tuple of character converted into a list
print(bases)
.split() is the counterpart to the .join() method that lets you join the elements of a list into a string only if all the elements are of type String:
In [ ]:
seq = "ATG TCA CCG GGC"
codons = seq.split(" ")
print(codons)
print("|".join(codons))
We also saw earlier that the + operator lets you concatenate strings together into a larger string.
Note that this operator only works on variables of the same type. If you want to concatenate a string with an integer (or some other type), first you have to cast the integer to a string with the str() function.
In [ ]:
s = "chr"
chrom_number = 2
print(s + str(chrom_number))
To get more information about these two methods split()
and join()
we could find it online in the Python documentation starting from www.python.org or get help using the help()
builtin function.
In [ ]:
help(str.split)
help(str.join)
Go to our next notebook: python_basic_1_4