Let's scrape some nuclear reactors

Our goal: Scrape a table of U.S. nuclear reactors into a CSV.

Import the libraries



In [1]:

Fetch and parse the HTML



In [5]:

    
# define the url


# get the page


# specify the encoding


# turn it into soup

Find the table



In [8]:

Loop over the rows and write to CSV



In [27]:

    
# each <tr> has some <td> cells inside it; we'll move these into variables,
        # do some string manipulations and write to the CSV


        # reactor name, detail page link and docket number are all part of the first cell
        # the .contents() method returns a list of a tag's children -->
        # https://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children



        
        # license number is in the second cell

        
        # reactor type is in the third cell


        # location is in the fourth cell

        
        # some of the locations have multiple internal spaces -- here's a trick for dealing with that
        # https://stackoverflow.com/a/1546251

        
        # owner is in the fifth cell

        
        # region is in the sixth cell