2: Reading the file in


The story is stored in the "story.txt" file. Open the file and read the contents into the story variable.


In [28]:
# The story is stored in the file "story.txt".
f = open("story.txt", "r")
story = f.read()

There was once a great and noble frmer named Julius.  He was the best farmer in his village, and prabably even the whole world.  One day, he decidid to grw potatoes.

 Julius knew that potatoes were hard to grow, so he kniw he hd to goe to the magic farmer in the sky to seek his guidance.  Julius set out on his journey aroudn noon one day.  It started raining almosty immediately.

 Julius wondered if this was a sign that he shouldn't go on, but he perserved.  He became soaked, and stoped in the store to buy an umbrella.  He told the storekeeper, Reggie, about his journey.

 Reggie told him that he was crzy to seek out the magic farmer; the last 10 people to try to find him had never come back.  Julius was undetered and decided to keep going.

 He travelled many long days in alternatng searing heat and freezing cold.  At night, he curled into a ball and tried to sleep underneath trees along the roadside.

 After mich anguish, Julius found the magic farmer, who gave him the secret of growing potatoes.

 Julius came back to the village, and managed to grow the finest crop the village had ever seen.  Everyone had potatoes to eat for months, and sang Julius's praises.

3: Tokenizing the file


The story is loaded into the story variable.

Tokenize the story, and store the tokens into the tokenized_story variable.


In [29]:
# We can split strings into lists with the .split() method.
# If we use a space as the input to .split(), it will split based on the space.
text = "Bears are probably better than sharks, but I can't get close enough to one to be sure."
tokenized_text = text.split(" ")
tokenized_story = story.split(" ")


['There', 'was', 'once', 'a', 'great', 'and', 'noble', 'frmer', 'named', 'Julius.', '', 'He', 'was', 'the', 'best', 'farmer', 'in', 'his', 'village,', 'and', 'prabably', 'even', 'the', 'whole', 'world.', '', 'One', 'day,', 'he', 'decidid', 'to', 'grw', 'potatoes.\n\n', 'Julius', 'knew', 'that', 'potatoes', 'were', 'hard', 'to', 'grow,', 'so', 'he', 'kniw', 'he', 'hd', 'to', 'goe', 'to', 'the', 'magic', 'farmer', 'in', 'the', 'sky', 'to', 'seek', 'his', 'guidance.', '', 'Julius', 'set', 'out', 'on', 'his', 'journey', 'aroudn', 'noon', 'one', 'day.', '', 'It', 'started', 'raining', 'almosty', 'immediately.\n\n', 'Julius', 'wondered', 'if', 'this', 'was', 'a', 'sign', 'that', 'he', "shouldn't", 'go', 'on,', 'but', 'he', 'perserved.', '', 'He', 'became', 'soaked,', 'and', 'stoped', 'in', 'the', 'store', 'to', 'buy', 'an', 'umbrella.', '', 'He', 'told', 'the', 'storekeeper,', 'Reggie,', 'about', 'his', 'journey.\n\n', 'Reggie', 'told', 'him', 'that', 'he', 'was', 'crzy', 'to', 'seek', 'out', 'the', 'magic', 'farmer;', 'the', 'last', '10', 'people', 'to', 'try', 'to', 'find', 'him', 'had', 'never', 'come', 'back.', '', 'Julius', 'was', 'undetered', 'and', 'decided', 'to', 'keep', 'going.\n\n', 'He', 'travelled', 'many', 'long', 'days', 'in', 'alternatng', 'searing', 'heat', 'and', 'freezing', 'cold.', '', 'At', 'night,', 'he', 'curled', 'into', 'a', 'ball', 'and', 'tried', 'to', 'sleep', 'underneath', 'trees', 'along', 'the', 'roadside.\n\n', 'After', 'mich', 'anguish,', 'Julius', 'found', 'the', 'magic', 'farmer,', 'who', 'gave', 'him', 'the', 'secret', 'of', 'growing', 'potatoes.\n\n', 'Julius', 'came', 'back', 'to', 'the', 'village,', 'and', 'managed', 'to', 'grow', 'the', 'finest', 'crop', 'the', 'village', 'had', 'ever', 'seen.', '', 'Everyone', 'had', 'potatoes', 'to', 'eat', 'for', 'months,', 'and', 'sang', "Julius's", 'praises.\n']

4: Replacing punctuation


The story has been loaded into tokenized_story.

Replace all of the punctuation in each of the tokens.

You'll need to loop through tokenized_story to do so.

You'll need to use multiple replace statements, one for each punctuation character to replace.

Append the token to no_punctuation_tokens once you are done replacing characters.

Don't forget to remove newlines!

Print out no_punctuation_tokens if you want to see which types of punctuation are still in the data.


In [30]:
# We can use the .replace function to replace punctuation in a string.
text = "Who really shot John F. Kennedy?"
text = text.replace("?", "?!")

# The question mark has been replaced with ?!.

# We can replace strings with blank spaces, meaning that they are just removed.
text = text.replace("?", "")

# The question mark is gone now.

no_punctuation_tokens = []

for token in tokenized_story:
    for p in [".", ",", "\n", "'", ";", "?", "!", "-", ":"]:
        token = token.replace(p, "")

['There', 'was', 'once', 'a', 'great', 'and', 'noble', 'frmer', 'named', 'Julius', '', 'He', 'was', 'the', 'best', 'farmer', 'in', 'his', 'village', 'and', 'prabably', 'even', 'the', 'whole', 'world', '', 'One', 'day', 'he', 'decidid', 'to', 'grw', 'potatoes', 'Julius', 'knew', 'that', 'potatoes', 'were', 'hard', 'to', 'grow', 'so', 'he', 'kniw', 'he', 'hd', 'to', 'goe', 'to', 'the', 'magic', 'farmer', 'in', 'the', 'sky', 'to', 'seek', 'his', 'guidance', '', 'Julius', 'set', 'out', 'on', 'his', 'journey', 'aroudn', 'noon', 'one', 'day', '', 'It', 'started', 'raining', 'almosty', 'immediately', 'Julius', 'wondered', 'if', 'this', 'was', 'a', 'sign', 'that', 'he', 'shouldnt', 'go', 'on', 'but', 'he', 'perserved', '', 'He', 'became', 'soaked', 'and', 'stoped', 'in', 'the', 'store', 'to', 'buy', 'an', 'umbrella', '', 'He', 'told', 'the', 'storekeeper', 'Reggie', 'about', 'his', 'journey', 'Reggie', 'told', 'him', 'that', 'he', 'was', 'crzy', 'to', 'seek', 'out', 'the', 'magic', 'farmer', 'the', 'last', '10', 'people', 'to', 'try', 'to', 'find', 'him', 'had', 'never', 'come', 'back', '', 'Julius', 'was', 'undetered', 'and', 'decided', 'to', 'keep', 'going', 'He', 'travelled', 'many', 'long', 'days', 'in', 'alternatng', 'searing', 'heat', 'and', 'freezing', 'cold', '', 'At', 'night', 'he', 'curled', 'into', 'a', 'ball', 'and', 'tried', 'to', 'sleep', 'underneath', 'trees', 'along', 'the', 'roadside', 'After', 'mich', 'anguish', 'Julius', 'found', 'the', 'magic', 'farmer', 'who', 'gave', 'him', 'the', 'secret', 'of', 'growing', 'potatoes', 'Julius', 'came', 'back', 'to', 'the', 'village', 'and', 'managed', 'to', 'grow', 'the', 'finest', 'crop', 'the', 'village', 'had', 'ever', 'seen', '', 'Everyone', 'had', 'potatoes', 'to', 'eat', 'for', 'months', 'and', 'sang', 'Juliuss', 'praises']

5: Lowercasing the words


The tokens without punctuation have been loaded into no_punctuation_tokens.

Loop through the tokens and lowercase each one.

Append each token to lowercase_tokens when you're done lowercasing.


In [31]:
# We can make strings all lowercase using the .lower() method.
text = text.lower()

# The text is much nicer to read now.

lowercase_tokens = []

for token in no_punctuation_tokens:

my caps lock is stuck
['there', 'was', 'once', 'a', 'great', 'and', 'noble', 'frmer', 'named', 'julius', '', 'he', 'was', 'the', 'best', 'farmer', 'in', 'his', 'village', 'and', 'prabably', 'even', 'the', 'whole', 'world', '', 'one', 'day', 'he', 'decidid', 'to', 'grw', 'potatoes', 'julius', 'knew', 'that', 'potatoes', 'were', 'hard', 'to', 'grow', 'so', 'he', 'kniw', 'he', 'hd', 'to', 'goe', 'to', 'the', 'magic', 'farmer', 'in', 'the', 'sky', 'to', 'seek', 'his', 'guidance', '', 'julius', 'set', 'out', 'on', 'his', 'journey', 'aroudn', 'noon', 'one', 'day', '', 'it', 'started', 'raining', 'almosty', 'immediately', 'julius', 'wondered', 'if', 'this', 'was', 'a', 'sign', 'that', 'he', 'shouldnt', 'go', 'on', 'but', 'he', 'perserved', '', 'he', 'became', 'soaked', 'and', 'stoped', 'in', 'the', 'store', 'to', 'buy', 'an', 'umbrella', '', 'he', 'told', 'the', 'storekeeper', 'reggie', 'about', 'his', 'journey', 'reggie', 'told', 'him', 'that', 'he', 'was', 'crzy', 'to', 'seek', 'out', 'the', 'magic', 'farmer', 'the', 'last', '10', 'people', 'to', 'try', 'to', 'find', 'him', 'had', 'never', 'come', 'back', '', 'julius', 'was', 'undetered', 'and', 'decided', 'to', 'keep', 'going', 'he', 'travelled', 'many', 'long', 'days', 'in', 'alternatng', 'searing', 'heat', 'and', 'freezing', 'cold', '', 'at', 'night', 'he', 'curled', 'into', 'a', 'ball', 'and', 'tried', 'to', 'sleep', 'underneath', 'trees', 'along', 'the', 'roadside', 'after', 'mich', 'anguish', 'julius', 'found', 'the', 'magic', 'farmer', 'who', 'gave', 'him', 'the', 'secret', 'of', 'growing', 'potatoes', 'julius', 'came', 'back', 'to', 'the', 'village', 'and', 'managed', 'to', 'grow', 'the', 'finest', 'crop', 'the', 'village', 'had', 'ever', 'seen', '', 'everyone', 'had', 'potatoes', 'to', 'eat', 'for', 'months', 'and', 'sang', 'juliuss', 'praises']

7: Making a basic function


Define a function that takes degrees in fahrenheit as an input, and return degrees celsius

Use it to convert 100 degrees fahrenheit to celsius. Assign the result to celsius_100.

Use it to convert 150 degrees fahrenheit to celsius. Assign the result to celsius_150.


In [32]:
# A simple function that takes in a number of miles, and turns it into kilometers
# The input at position 0 will be put into the miles variable.
def miles_to_km(miles):
    # return is a special keyword that indicates that the function will output whatever comes after it.
    return miles/0.62137

# Returns the number of kilometers equivalent to one mile

# Convert a from 10 miles to kilometers
a = 10
a = miles_to_km(a)

# We can convert and assign to a different variable
b = 50
c = miles_to_km(b)

fahrenheit = 80
celsius = (fahrenheit - 32)/1.8

def f2c(f):
    c = (f - 32)/1.8
    return c

celsius_100 = f2c(100)

celsius_150 = f2c(150)

print(celsius_100, celsius_150)

37.77777777777778 65.55555555555556

8: Practice: functions


Make a function that takes a string as input and outputs a lowercase version.

Then use it to turn the string lowercase_me to lowercase.

Assign the result to lowercased_string.


In [33]:
def split_string(text):
    return text.split(" ")

sally = "Sally sells seashells by the seashore."
# This splits the string into a list.

# We can assign the output of a function to a variable.
sally_tokens = split_string(sally)

lowercase_me = "I wish I was in ALL lowercase"

def to_lowercase(text):
    return text.lower()

lowercased_string = to_lowercase(lowercase_me)

['Sally', 'sells', 'seashells', 'by', 'the', 'seashore.']
i wish i was in all lowercase

9: Types of errors


There are multiple syntax errors in the code cell below. You can tell because of the error showing up in the results panel. Fix the errors and get the code running properly. It should print all of the items in a.


In [34]:
# Sometimes, you will have problems with your code that cause python to throw an exception.
# Don't worry, it happens to all of us many times a day.
# An exception means that the program can't run, so you'll get an error in the results view instead of the normal output.
# There are a few different types of exceptions.
# The first we'll look at is a SyntaxError.
# This means that something is typed incorrectly (statements misspelled, quotes missing, and so on)

a = ["Errors are no fun!", "But they can be fixed", "Just fix the syntax and everything will be fine"]
b = 5

for item in a:
    if b == 5:

Errors are no fun!
But they can be fixed
Just fix the syntax and everything will be fine

10: More syntax errors


The code below has multiple syntax errors. Fix them so the code prints out "I never liked that 6"


In [35]:
a = 5

if a == 6:
    print("6 is obviously the best number")
    print("What's going on, guys?")
    print("I never liked that 6")

I never liked that 6