Python Text Basics Assessment - Solutions

Welcome to your assessment! Complete the tasks described in bold below by typing the relevant code in the cells.

f-Strings

1. Print an f-string that displays NLP stands for Natural Language Processing using the variables provided.


In [1]:
abbr = 'NLP'
full_text = 'Natural Language Processing'

# Enter your code here:
print(f'{abbr} stands for {full_text}')


NLP stands for Natural Language Processing

Files

2. Create a file in the current working directory called contacts.txt by running the cell below:


In [6]:
%%writefile contacts.txt
First_Name Last_Name, Title, Extension, Email


Overwriting contacts.txt

3. Open the file and use .read() to save the contents of the file to a string called fields. Make sure the file is closed at the end.


In [3]:
# Write your code here:
with open('contacts.txt') as c:
    fields = c.read()

    
# Run fields to see the contents of contacts.txt:
fields


Out[3]:
'First_Name Last_Name, Title, Extension, Email'

Working with PDF Files

4. Use PyPDF2 to open the file Business_Proposal.pdf. Extract the text of page 2.


In [4]:
# Perform import
import PyPDF2

# Open the file as a binary object
f = open('Business_Proposal.pdf','rb')

# Use PyPDF2 to read the text of the file
pdf_reader = PyPDF2.PdfFileReader(f)


# Get the text from page 2 (CHALLENGE: Do this in one step!)
page_two_text = pdf_reader.getPage(1).extractText()



# Close the file
f.close()

# Print the contents of page_two_text
print(page_two_text)


AUTHORS:
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
 

5. Open the file contacts.txt in append mode. Add the text of page 2 from above to contacts.txt.

CHALLENGE: See if you can remove the word "AUTHORS:"


In [5]:
# Simple Solution:
with open('contacts.txt','a+') as c:
    c.write(page_two_text)
    c.seek(0)
    print(c.read())


First_Name Last_Name, Title, Extension, EmailAUTHORS:
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
 


In [7]:
# CHALLENGE Solution (re-run the %%writefile cell above to obtain an unmodified contacts.txt file):
with open('contacts.txt','a+') as c:
    c.write(page_two_text[8:])
    c.seek(0)
    print(c.read())


First_Name Last_Name, Title, Extension, Email
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
 

Regular Expressions

6. Using the page_two_text variable created above, extract any email addresses that were contained in the file Business_Proposal.pdf.


In [8]:
import re

# Enter your regex pattern here. This may take several tries!
pattern = r'\w+@\w+.\w{3}'

re.findall(pattern, page_two_text)


Out[8]:
['abaker@ourcompany.com',
 'cdonaldson@ourcompany.com',
 'efreeman@ourcompany.com']

Great job!