Week 2 - Printing and manipulating text

We started the first week by printing Hello world (you can try it below). This taught us a number of things. It taught us about strings, functions, statements. As we know, as biologists one of the primary entities that we deal with is string in the form of sequences, wether they be DNA, RNA, or amino acid sequences.


In [ ]:
print("Hello world")

To python, a sequences in a string. A string of what? A string of characters. A string is an ummutable (unchangeable) sequence of characters arranged in a specific order. What is a character? A character is letter, number, or punctuation mark...anything that can be represented on a keyboard.

Thus Hello world is the string that we used in print function of our Hello world program.

"Hello world"

What did we do with that string?

We "printed" or wrote the string to the terminal. The python command print is a function, a collection of python source code that has been written to perform a particular action. We call or use a function by typing its name, followed by an open and closed parenthesis. Functions always have parentheses! In the case of the print function you must also include a parameter or string to print.

Getting help

NOTE: Would you like to get more inforamtion on print? Type

print?
or
help(print)

In [3]:
help(print)


Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

Single or double quotes?

As we can see from the code below, python does not care if we use single or double quotes around our text.


In [4]:
"Hello world" == 'Hello world'


Out[4]:
True

In [5]:
print("Hello world")


Hello world

In [6]:
print('Hello world')


Hello world

In [8]:
print("Hello world") # What happen when you try to run this cell?


Hello world

Comments, or what is that text at the end of the statement above?

Comments are a way to include text in your code and is ignored by the compiler. Comments are very helpful to understand your code without interfeering with its execution or logic.

Comments are preceeded by a pound sign "#" and a space.

# This is a comment

Advanced:Docstrings

Splitting a statement over two lines

Sometimes, in fact often, a python statement will be longer then one line on your screen. Good python practice declares that your programming line should be no longer than 80 characters. If a line of code is longer then 80 characters, you can wrap the python statement and add a backslash "\" to each line that is continued.

print("This is a long python statemnt that nees to be wrapped.")

print("This is a long python statemnt \
that nees to be wrapped.")

Try typing each or something similiar below to test it out.


In [ ]:
print("This is a long python statemnt that nees to be wrapped.")

In [ ]:
print("hello") # this is a test of 
               # carry over

Special characters

The backslash, also called the escape character, enables us to use invisibile or special characters in our python statements.

Print a new line character and python will go to the next line. Like this:

print("Hello world\n!")

To see what special characters are availalble see this tutorial page.

What happens when you try it below?


In [10]:
# Please type your code here:

print("Hello world\nThis is the date\nMy name\nreport title")


Hello world
This is the date
My name
report title

Combining string

Strings can be combined using the plus operator. We know what one plus one is and so does python. 1 + 1 equals two. Well the plusoperator also can work for string. Try this below:

print("Hello" + "World!")

What is the result? Is there anythign wrong with it? If so, how do you fix it?


In [ ]:
# Please type your code here:

Variables for strings

Thus far we have been working directly with string or text. We can create a variable to store the text that we want.

message = "Hello world!"

We can then use that variable in our print statement:

print(message)

What happens when you run the statement above?


In [22]:
# Please type your code here:

message = "Hello world!"
print(message)


Hello world!

Variables as objects

In the cell above, the word "message" is a variable. It holds the string "Hello world!". From the python perspective, every variable is an object. In fact, everything in python is an object. What is an object? An object is a template or a cookie cutter that has certain characteristics, like strings, integers and floating point numbers. String objects have proprties and methods or built-in functions. We will look at a number of methods below.

variable_name_1 = "A value"
varaible_name_2 = 10

Check and see what type of obect something is by using the built-in function "type":

type(message)

In [21]:
# Please type your code here:
variable_name_1 = "A test"
variable_name_1


Out[21]:
'A test'

Utilizing the python str (string) methods

To see all available methods (functions) please look at the Python Standard Library Documentation String Methods


Methods available to all objects:

in

To check to see if a nucleotide (i.e. a character) is in our DNA sequence use the in operator.

'a' in new_dna    # Returns True
'atg' in new_dna    # Returns True
'aaa' in new_dna    # Returns False

More generically, to check to see if a python object (character, string, number, etc) is in another python object (string, list, etc):

x in s  # Return True if an item of s is equal to x, else False


            x not in s  False if an item of s is equal to x, else True
                s + t   the concatenation of s and t
        s * n or n * s  equivalent to adding s to itself n times

In [24]:
# Please type your code here:
message


Out[24]:
'Hello world!'

In [28]:
message.upper()


Out[28]:
'HELLO WORLD!'

In [29]:
new_dna = 'atgtag'

slice [ i : j : k ]

To slice a character or subsequence out of a sequence, use square brackets ("[", "]").

new_dna[0]

new_dna[0:3]

NOTE: The first number is INCLUSIVE (included), while the second number is EXCLUSIVE (not included).

Generically - s[i]    ith item of s, origin 0

s[i:j]  slice of s from i to j

s[i:j:k]    slice of s from i to j with step k

In [32]:
# Please type your code here:
new_dna


Out[32]:
'atgtag'

In [33]:
new_dna[0]


Out[33]:
'a'

In [36]:
new_dna[::-1]


Out[36]:
'gatgta'

len

To get teh length or total count of the residues in our sequence use the len function:

len(new_dna)    length of new_dna

In [37]:
# Please type your code here:

len(new_dna)


Out[37]:
6

count

To count the number of times teh nucleotide "A" occurs in our string we use the count function:

new_dna.count('A')  total number of occurrences of x in s

In [39]:
# Please type your code here:

new_dna.count('A')


Out[39]:
0

String methods (that are particularily important in bioinformatics)

To see what methods or properties (object variables) are availablel, type the name of an object, usually a variable name, type a period "." afterwards anf hit the tab key. IF he variable has already been defined you will see what methods and properies are availabble.

message.<hit tab>

Concatination

Like the plus sing (+) concatination joins string together. The concatanation symbol + will join two string into a single string. Lets say you would like to add two DNA sequences together. You would do the following:

dna1 = "atgaattgg"
dna2 = "ttaaggtag"
new_dna = dna1 & dna2

In [ ]:
# Please type your code here:

Changing case

We can also change the case of a string using the built in method name. Lets see how:

For uppercase, use the upper() method. In the documentation (above link) we see it listed as: str.upper()

new_dna.upper()

For lowercase, use the lower() method.

new_dna.lower()

In [ ]:
# Please type your code here:

Substring

One can extract a substring from a sequence as well using a built-in method. As we mentioned above, a string is a sequence or collection of characters (Unicode characters).

We use square brackets "[" and "]" to extract a subsequence.


In [ ]:
# Please type your code here:

Find

In addition to pull out a subsequence, we can find if a subsequence exista in a sequence.

str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is found within the slice s[start:end]. 
Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.

Note The find() method should be used only if you need to know the position of sub. To check if sub is a 
substring or not, use the __in__ operator:

'Py' in 'Python'

Find the position of the codon for methionine:

new_dna.find("atg")

Find the position of the stop codon:

new_dna.find("tag")

In [41]:
# Please type your code here:
new_dna.find('tag')


Out[41]:
3

In [43]:
new_dna[3:]


Out[43]:
'tag'

Reversing

We can make use of a trick of the slicing capability to reverse a string. Use a -1 in the final position as step to reverse.

new_dna[::-1]

In [ ]:
# Please type your code here: