Python Text Basics Assessment - Solutions

Welcome to your assessment! Complete the tasks described in bold below by typing the relevant code in the cells.

f-Strings

1. Print an f-string that displays `NLP stands for Natural Language Processing` using the variables provided.



In [1]:

    
abbr = 'NLP'
full_text = 'Natural Language Processing'

# Enter your code here:
print(f'{abbr} stands for {full_text}')









    



NLP stands for Natural Language Processing

Files

2. Create a file in the current working directory called `contacts.txt` by running the cell below:



In [6]:

    
%%writefile contacts.txt
First_Name Last_Name, Title, Extension, Email









    



Overwriting contacts.txt

3. Open the file and use .read() to save the contents of the file to a string called `fields`. Make sure the file is closed at the end.



In [3]:

    
# Write your code here:
with open('contacts.txt') as c:
    fields = c.read()

    
# Run fields to see the contents of contacts.txt:
fields









    Out[3]:





'First_Name Last_Name, Title, Extension, Email'

Working with PDF Files

4. Use PyPDF2 to open the file `Business_Proposal.pdf`. Extract the text of page 2.



In [4]:

    
# Perform import
import PyPDF2

# Open the file as a binary object
f = open('Business_Proposal.pdf','rb')

# Use PyPDF2 to read the text of the file
pdf_reader = PyPDF2.PdfFileReader(f)


# Get the text from page 2 (CHALLENGE: Do this in one step!)
page_two_text = pdf_reader.getPage(1).extractText()



# Close the file
f.close()

# Print the contents of page_two_text
print(page_two_text)









    



AUTHORS:
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com

5. Open the file `contacts.txt` in append mode. Add the text of page 2 from above to `contacts.txt`.

CHALLENGE: See if you can remove the word "AUTHORS:"



In [5]:

    
# Simple Solution:
with open('contacts.txt','a+') as c:
    c.write(page_two_text)
    c.seek(0)
    print(c.read())









    



First_Name Last_Name, Title, Extension, EmailAUTHORS:
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com



In [7]:

    
# CHALLENGE Solution (re-run the %%writefile cell above to obtain an unmodified contacts.txt file):
with open('contacts.txt','a+') as c:
    c.write(page_two_text[8:])
    c.seek(0)
    print(c.read())









    



First_Name Last_Name, Title, Extension, Email
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com

Regular Expressions

6. Using the `page_two_text` variable created above, extract any email addresses that were contained in the file `Business_Proposal.pdf`.



In [8]:

    
import re

# Enter your regex pattern here. This may take several tries!
pattern = r'\w+@\w+.\w{3}'

re.findall(pattern, page_two_text)









    Out[8]:





['abaker@ourcompany.com',
 'cdonaldson@ourcompany.com',
 'efreeman@ourcompany.com']

Python Text Basics Assessment - Solutions

f-Strings

1. Print an f-string that displays NLP stands for Natural Language Processing using the variables provided.

Files

2. Create a file in the current working directory called contacts.txt by running the cell below:

3. Open the file and use .read() to save the contents of the file to a string called fields. Make sure the file is closed at the end.

Working with PDF Files

4. Use PyPDF2 to open the file Business_Proposal.pdf. Extract the text of page 2.

5. Open the file contacts.txt in append mode. Add the text of page 2 from above to contacts.txt.

CHALLENGE: See if you can remove the word "AUTHORS:"

Regular Expressions

6. Using the page_two_text variable created above, extract any email addresses that were contained in the file Business_Proposal.pdf.

Great job!

1. Print an f-string that displays `NLP stands for Natural Language Processing` using the variables provided.

2. Create a file in the current working directory called `contacts.txt` by running the cell below:

3. Open the file and use .read() to save the contents of the file to a string called `fields`. Make sure the file is closed at the end.

4. Use PyPDF2 to open the file `Business_Proposal.pdf`. Extract the text of page 2.

5. Open the file `contacts.txt` in append mode. Add the text of page 2 from above to `contacts.txt`.

6. Using the `page_two_text` variable created above, extract any email addresses that were contained in the file `Business_Proposal.pdf`.