Our goal: Scrape a table of U.S. nuclear reactors into a CSV.
In [1]:
In [5]:
# define the url
# get the page
# specify the encoding
# turn it into soup
In [8]:
In [27]:
# each <tr> has some <td> cells inside it; we'll move these into variables,
# do some string manipulations and write to the CSV
# reactor name, detail page link and docket number are all part of the first cell
# the .contents() method returns a list of a tag's children -->
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children
# license number is in the second cell
# reactor type is in the third cell
# location is in the fourth cell
# some of the locations have multiple internal spaces -- here's a trick for dealing with that
# https://stackoverflow.com/a/1546251
# owner is in the fifth cell
# region is in the sixth cell