Assignment 6. Files

Exercise 1

Write a function called sed that takes as arguments a pattern string, a replacement string, and two filenames; it should read the first file and write the contents into the second file (creating it if necessary). If the pattern string appears anywhere in the file, it should be replaced with the replacement string.

If an error occurs while opening, reading, writing or closing files, your program should catch the exception, print an error message, and exit.

Exercise 2

Write a function anagram that reads a word list from a file and prints all the sets of words that are anagrams.

Here is an example of what the output might look like:

["deltas", "desalt", "lasted", "salted", "slated", "staled"]
["retainers", "ternaries"]
["generating", "greatening"]
["resmelts", "smelters", "termless"]

Hint: you might want to build a dictionary that maps from a collection of letters to an array of words that can be spelled with those letters. The question is, how can you represent the collection of letters in a way that can be used as a key?

Write two new functions: store_anagrams should store the anagram dictionary in a “shelf”; read_anagrams should look up a word and return an array of its anagrams.

Exercise* 3

In a large collection of MP3 files, there may be more than one copy of the same song, stored in different directories or with different file names. The goal of this exercise is to search for duplicates.

Write a program that searches a directory and all of its subdirectories, recursively, and returns a list of complete paths for all files with a given suffix (like .mp3).

To recognize duplicates, you can use md5sum to compute a “checksum” for each files. If two files have the same checksum, they probably have the same contents. To double-check, you can use the Unix command diff.