In [ ]:
chinese_zodiac = "Rat Ox Tiger Rabbit Dragon Snake Horse Goat Monkey Rooster Dog Pig"
print(chinese_zodiac[0])
print(chinese_zodiac[1])
In [ ]:
You can get the length (number of characters) of a string by using the len operator
len(my_string)
In [ ]:
print(len(chinese_zodiac))
Be careful with length, it is the number of characters, not the last index.
The last index is len(string) - 1
In [ ]:
zlen = len(chinese_zodiac)
# WRONG
print(chinese_zodiac[zlen])
# RIGHT
print(chinese_zodiac[zlen - 1])
But actually you can use negative indexing to get the last character in a string. -1 is the last character, -2 is the second to last and so on.
Wondering why negative indexing starts with -1 and not 0? It's because -0 and 0 are the same thing, so you would just get the first character.
In [ ]:
print(chinese_zodiac[-1])
In [ ]:
In [ ]:
second_animal = chinese_zodiac[4:6]
print(second_animal)
You can omit the first index and it will start at the beginning, you can omit the last index and it will go to the end.
In [ ]:
first_six = chinese_zodiac[:32]
print(first_six)
last_six = chinese_zodiac[33:]
print(last_six)
In [ ]:
In [ ]:
# This will fail
chinese_zodiac[0] = 'A'
In [ ]:
# This is better
print(chinese_zodiac)
chinese_zodiac_minus_r = chinese_zodiac[1:]
print(chinese_zodiac_minus_r)
ate_the_rat = "C" + chinese_zodiac_minus_r
print(ate_the_rat)
In [ ]:
print('Cat in zodiac:')
print('Cat' in chinese_zodiac)
print('Dragon in zodiac:')
print('Dragon' in chinese_zodiac)
In [ ]:
for character in 'Rat':
print(character)
In [ ]:
# Print only vowels
for letter in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
if letter in 'AEIOUY':
print(letter)
In [ ]:
You can compare strings using the ==
, >
, >=
, <
, and <=
operators.
Numbers come first, then capital letters and then lowercase letters. They are actually sorted based on their ascii value http://ascii.cl/
In [ ]:
print('A' > 'a')
print('A' < 'a')
print('A' > 'B')
print('0' > 'A')
In [ ]:
# Case matters in equality
print(('cat' == 'Cat'))
print(('cat' == 'cat'))
In [ ]:
dir(chinese_zodiac)
lower, upper, title, capitalize, and swapcase all change the case of the string
HINT: use one of these methods to transform input to functions so that you don't have to worry about what case the user's input is in
In [ ]:
print(chinese_zodiac.lower())
print(chinese_zodiac.upper())
The split
method slices the string into a list
It takes one parameter, the character(s) to split on
my_str.split(slice_string)
And the join
method merges a list into a string
It operates on the 'glue' string, and the list is the parameter
glue_string.join(my_list)
In [ ]:
cz_list = chinese_zodiac.split(' ')
print(cz_list)
print(', '.join(cz_list))
The find
method finds the index of a substring (and -1 if it doesn't exist) (Why -1 and not 0?)
count
counts the occurrence of a substring
startswith
and endswith
checks if a string starts or ends with a given substring and returns a boolean
In [ ]:
print(chinese_zodiac.find('Snake'))
print(chinese_zodiac.count('at'))
print(chinese_zodiac.startswith('Ra'))
You can chain some string methods if the method also returns a string
In [ ]:
print(''.join(cz_list).lower().startswith('ra'))
In [ ]:
In [ ]:
# Lets find the zodiac animals between Ox and Monkey
ox_idx = chinese_zodiac.find('Ox')
monkey_idx = chinese_zodiac.find('Monkey')
print(chinese_zodiac[ox_idx:monkey_idx])
In [ ]:
# Wait, I wanted to exclude Ox
ox_end = ox_idx + len('Ox ')
print(chinese_zodiac[ox_end:monkey_idx])
In [ ]:
email = 'my.name@gmail.com'
String concatenation gets old really fast, and casting numbers and booleans as strings does too. Luckily, there is a better option.
String formatting allows you to include variables directly in you string.
'string {var} '.format(var1)
Each parameter passed to format has an index and can be accessed in the string using {idx}. They don't have to be in order and can be repeated
In [ ]:
print('The {0} has {1} toes per limb and thus is considered {2}'.format('ox', 4, 'yin'))
print('The {0} has {1} toes per limb and thus is considered {2}'.format('tiger', 5, 'yang'))
# I learned something when creating this notebook
You don't have to put the elements in the correct order
In [ ]:
print('The {1} has {0} toes per limb and thus is considered {2}'.format('ox', 4, 'yin'))
print('The {2} has {2} toes per limb and thus is considered {2}'.format('tiger', 5, 'yang'))
You can also use variable names. In the parameters use a dictionary (you'll learn about these soon) or key=value syntax.
In [ ]:
"The {animal}'s attribute is {attribute}".format(animal='snake', attribute='flexibility')
You can even format the variables in various ways. Reference the docs for everything, there is just too much you can do and I only have so much time to show you.
In [ ]:
print('{:<30}'.format('left aligned'))
print('{:>30}'.format('right aligned'))
print('{:0.2f}; {:0.7f}'.format(3.14, -3.14))
In [ ]:
You are going to create a program that does some very simple bioinformatics functions on a DNA input.
A little bit of molecular biology. Codons are non-overlapping triplets of nucleotides.
ATG CCC CTG GTA ... - this corresponds to four codons; spaces added for emphasis
The start codon is 'ATG'
Stop codons can be 'TGA' , 'TAA', or 'TAG', but they must be 'in frame' with the start codon. The first stop codon usually determines the end of the gene. In other words:
'ATGCCTGA...' - here TGA is not a stop codon, because the T is part of CCT
'ATGCCTTGA...' - here TGA is a stop codon because it is in frame (i.e. a multiple of 3 nucleic acids from ATG)
The gene is start codon to stop codon, inclusive Example:
dna - GGCATGAAAGTCAGGGCAGAGCCATCTATTTGAGCTTAC
gene - ATGAAAGTCAGGGCAGAGCCATCTATTTGA
numCodons
that takes a dna string and returns to you how many codons are in it (a codon is a group of 3 DNA bases). Examples: AAACCC -> 2 GT -> 0startCodonIndex
which finds the index of the first start codon 'ATG' and returns -1 if none are found.stopCodonIndex
which finds the index of the first stop codon 'TAA' or 'TAG' or 'TGA' in frame with the start codon (found from startCodonIndex
) and returns -1 if none are found.codingDNA
which returns the substring of the DNA from the beginning of the start codon to the end of the stop codon (please for the love of all things, use the functions you already wrote to calculate start and stop)transcription
that takes the DNA and translates it to RNA. Each letter should be translated using these mappings (A->U), (T->A), (C->G), (G->C).Write a function called DNAExtravaganza
that calls your functions and prints out (using string formatting)
DNA: [DNA]
CODONS: [Number of codons]
START: [start index]
STOP: [stop index]
CODING DNA: [coding DNA string]
TRANSCRIBED RNA: [transcribed DNA]
You can use these as test DNA string:
dna='GGCATGAAAGTCAGGGCAGAGCCATCTATTGCTTACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGACAGGTTTAAGGAGACCAATAGAAACTGGGCATGTGGAGACAGAGAAGACTCTTGGGTTTCTGATAGGCACTGACTCTCTCTGCCTATTGGTCTATTTTCCCACCCTTAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGGTGAGTCTATGGGACCCTTGATGTTTTCTTTCCCCTTCTTTTCTATGGTTAAGTTCATGTCATAGGAAGGGGAGAAGTAACAGGGTACAGTTTAGAATGGGAAACAGACGAATGATT'
dna = 'GGGATGTTTGGGCCCTACGGGCCCTGATCGGCT'
In [ ]: