Assignment 3a: Revision of block 3

Due: Friday the 23th of November 2017 23:59 p.m.

  • Please name your ipython notebook with the following naming convention: ASSIGNMENT_3a_FIRSTNAME_LASTNAME.ipynb*

  • Please submit your assignment (notebooks of part 1 and 2 + additional files) as a single .zip file using this google form*

  • If you have questions about this topic, please refer to the forum on the Canvas site.

In this block, we covered a lot of ground:

  • Chapter 11 - Functions and scope
  • Chapter 12 - Importing external modules
  • Chapter 13 - Working with Python scripts
  • Chapter 14 - Reading and writing text files
  • Chapter 15 - Off to analyzing text

In this assignment, you will first complete a number of small exercises about each chapter to make sure you are familiar with the most important concepts. In the second part of the assignment, you will apply your newly acquired skills to write your very own text processing program (ASSIGNMENT-3b) :-). But don't worry, there will be instructions and hints along the way.

Part 1: Practicing some core notions

In the first part of this assignment, you will be revising some of the basic notions we covered in the previous chapters. Most of the exercises can be completed rather quickly. If you get stuck, you should be able to complete them by going bach to the chapters. The Purpose of this part is to make you gain some practice and confidence so you are all set and ready to move on to part 2 of the assignment - processing and analyzing some text!

Functions & scope

Excercise 1:

Define a function called split_sort_text which takes a string as input, splits it at space charaters and returns all the unique words in the string in alphabetical order.

  • Hint 1: There is a specific python container which does not allow for duplicates and simply removes them. Use this one.
  • Hint 2: There is a function which sorts items in an interable called 'sorted'. Look at the documentation to see how it is used.
  • Hint 3: Don't forget to write a docstring (here and in all future functions - we won't remind you every single time).
  • Hint 4: test your function

In [ ]:
# your code here

Working with external modules

Exercise 2

NLTK offers a way of using WordNet in python. Do some research (using google, because quite frankly, that's what we do very often) and see if you can find out how import it. WordNet is a computational lexicon which organizes words according to their senses (collected in synsets). See if you can print all the synsets (i.e. entries) of the word 'dog'.


In [ ]:
# your code here

Working with python scripts

Exercise 3

a.) Define a function called count which counts the words in a string. Do not use NLTK just yet. Find a way to test it.

  • Hint 1: Write a helper-function called preprocess which preprocesses the string (split it, remove punctuation, return it in a container that you think works best for the next steps).

  • Hint 2: Remember that there are string methods which you can use to get rid of unwanted characters. Test the preprocess function using the string 'this is a (tricky) test'.

  • Tip 3: Remember how we used dictionaries to count words? If not, have a look at the containers chapter.

  • Hint 4: Test your function using an example string which will tell you whether it fullfils the requirements (remove punctuation, split, count). You will get a point for good testing.

b.) Create a python script

Use your editor to create a python script called count_words.py. Move your code into the python script and add a function call. Move your helper function to a seperate script which you call utils.py. Import your helper function into word_counts.py. Test whether everything works as expected by calling the scipt word_counts.py from the terminal. Include your tests in the word_counts.py script.

Please submit your scripts together with this notebook in a single folder and upload the entire folder to the google form.

Don't forget to add docstrings to your functions.


In [ ]:
# Feel free to use this cell to try out your code.

Dealing with text files

Exercise 4

Playing with lyrics

a.) Write a function called load_text which opens and reads a file and returns the text in the file. It should take a filepath as a parameter. Test it by loading this file: ../Data/lyrics/walrus.txt

  • Hint: remember it is best practice to use a context manager

b.) Write a function called replace_walrus which takes lyrics as input and replaces every instance of 'walrus' by 'hippo' (make sure to account for upper and lower case - it is fine to transform everything to lower case). The function should write the new version of the song to a file called 'walrus_hippo.txt and stored in ../Data/lyrics.

Don't forget to add docstrings to your functions.


In [ ]:
# your code here

Analyzing text with nltk

Exercise 5

Building a simple NLP pipeline

For this exercise, you will need NLTK. Don't forget to import it.

Write a function called tag_text which takes raw text as input and returns the tagged text. To do this, make sure you follow the steps below:

  • Tokenize the text.

  • Perform part-of-speech tagging on the list of tokens.

  • Return the tagged text

Then test your function using the text snipped below (test_text) as input.


In [ ]:
test_text = """Two households, both alike in dignity,
    In fair Verona, where we lay our scene,
    From ancient grudge break to new mutiny,
    Where civil blood makes civil hands unclean."""

In [ ]:
# your code here

Python knowledge

Exercise 6

6.a) How many for-loops can you nest in one another?

[answer]


In [ ]:
# your code here

6.b) What is the difference between the modes 'w' and 'a' when opening a file?

[answer]