Solutions to exercises

Building blocks

Function arguments


In [1]:
def my_power_func(base, pwr=2):
    return(base**pwr)

File I/O

Fixing encoding in CSV file

The file nobel-prize-winners.csv contains some odd-looking characters in the name-column, such as 'è'. These are the HTML codes for characters outside the limited ASCII set. Python is very capable at Unicode/UTF-8, so let's convert the characters to something more pleasant to the eye


In [12]:
import html  # part of the Python 3 standard library
with open('nobel-prize-winners.csv', 'rt') as fp:
    orig = fp.read()  # read the entire file as a single hunk of text

orig[727:780]  # show some characters, note the '\n'


Out[12]:
'peace,Frédéric Passy,\n1901,physics,Wilh'

In [13]:
print(orig[727:780])  # see how the '\n' gets converted to a newline


peace,Frédéric Passy,
1901,physics,Wilh

With some Googling, we find this candidate function to fix the character


In [14]:
html.unescape?


Signature: html.unescape(s)
Docstring:
Convert all named and numeric character references (e.g. >, >,
&x3e;) in the string s to the corresponding unicode characters.
This function uses the rules defined by the HTML 5 standard
for both valid and invalid character references, and the list of
HTML 5 named character references defined in html.entities.html5.
File:      ~/miniconda_envs/ddhs/lib/python3.6/html/__init__.py
Type:      function

In [15]:
fixed = html.unescape(orig)  # one line, less than a second...

In [17]:
print(fixed[727:780])  # much better


peace,Frédéric Passy,
1901,physics,Wilhelm Conrad Rön

In [ ]:
with open('nobel-prize-winners-fixed.csv', 'wt') as fp:
    fp.write(fixed)  # write back to disk, and we're done!