Solutions to exercises

Building blocks

Function arguments



In [1]:

    
def my_power_func(base, pwr=2):
    return(base**pwr)

File I/O

Fixing encoding in CSV file

The file nobel-prize-winners.csv contains some odd-looking characters in the name-column, such as 'è'. These are the HTML codes for characters outside the limited ASCII set. Python is very capable at Unicode/UTF-8, so let's convert the characters to something more pleasant to the eye



In [12]:

    
import html  # part of the Python 3 standard library
with open('nobel-prize-winners.csv', 'rt') as fp:
    orig = fp.read()  # read the entire file as a single hunk of text

orig[727:780]  # show some characters, note the '\n'









    Out[12]:





'peace,Fr&eacute;d&eacute;ric Passy,\n1901,physics,Wilh'



In [13]:

    
print(orig[727:780])  # see how the '\n' gets converted to a newline









    



peace,Fr&eacute;d&eacute;ric Passy,
1901,physics,Wilh

With some Googling, we find this candidate function to fix the character



In [14]:

    
html.unescape?









    





Signature: html.unescape(s)
Docstring:
Convert all named and numeric character references (e.g. &gt;, &#62;,
&x3e;) in the string s to the corresponding unicode characters.
This function uses the rules defined by the HTML 5 standard
for both valid and invalid character references, and the list of
HTML 5 named character references defined in html.entities.html5.
File:      ~/miniconda_envs/ddhs/lib/python3.6/html/__init__.py
Type:      function



In [15]:

    
fixed = html.unescape(orig)  # one line, less than a second...



In [17]:

    
print(fixed[727:780])  # much better









    



peace,Frédéric Passy,
1901,physics,Wilhelm Conrad Rön



In [ ]:

    
with open('nobel-prize-winners-fixed.csv', 'wt') as fp:
    fp.write(fixed)  # write back to disk, and we're done!