Now You Code 4: Email Harvest Training

Let's teach you how to extract emails from text. This has a variety of applications. The most common being buiding a list of emails for spamming... er, I meant "mass marketing."

The best way to find the emails in the mailbox file is to search for lines in the file that begin with "From: (similar to what we did in the lab). When you find an email write just the email address (not "From:" and the address) to "NYC4-emails.txt", and don't worry about duplicates.

The program should print the number of emails it wrote to the file.

Example Run:

Wrote 27 emails to NYC4-emails.txt

Step 1: Problem Analysis

Input: (No user input, but we do read from the file NYC4-mbox-short.txt

Output (The number of emails collected from the file, and a new file NYC4-emails.txt with one email per line.


(write here)

In [1]:
## Step 2: Write the code

In [2]:
## Step 3: Write another program to read the emails from NYC4-emails.txt and print them to the console

Step 4: Questions

  1. Does this code actually "detect" emails? How does it find emails in the text?


  1. Explain how this program can be improved to prompt for an email file at runtime?


  1. Devise an approach to remove duplicate emails from the output file. You don't have to write as code, just explain it.


Step 5: Reflection

Reflect upon your experience completing this assignment. This should be a personal narrative, in your own voice, and cite specifics relevant to the activity as to help the grader understand how you arrived at the code you submitted. Things to consider touching upon: Elaborate on the process itself. Did your original problem analysis work as designed? How many iterations did you go through before you arrived at the solution? Where did you struggle along the way and how did you overcome it? What did you learn from completing the assignment? What do you need to work on to get better? What was most valuable and least valuable about this exercise? Do you have any suggestions for improvements?

To make a good reflection, you should journal your thoughts, questions and comments while you complete the exercise.

Keep your response to between 100 and 250 words.

--== Write Your Reflection Below Here ==--

In [ ]:
from ist256.submission import Submission