In [1]:
def my_power_func(base, pwr=2):
return(base**pwr)
The file nobel-prize-winners.csv contains some odd-looking characters in the name-column, such as 'è'. These are the HTML codes for characters outside the limited ASCII set. Python is very capable at Unicode/UTF-8, so let's convert the characters to something more pleasant to the eye
In [12]:
import html # part of the Python 3 standard library
with open('nobel-prize-winners.csv', 'rt') as fp:
orig = fp.read() # read the entire file as a single hunk of text
orig[727:780] # show some characters, note the '\n'
Out[12]:
In [13]:
print(orig[727:780]) # see how the '\n' gets converted to a newline
With some Googling, we find this candidate function to fix the character
In [14]:
html.unescape?
In [15]:
fixed = html.unescape(orig) # one line, less than a second...
In [17]:
print(fixed[727:780]) # much better
In [ ]:
with open('nobel-prize-winners-fixed.csv', 'wt') as fp:
fp.write(fixed) # write back to disk, and we're done!