A full-fledged scraper

Import our modules or packages that we will need to scrape a website, including requests and bs4 and csv



In [ ]:

Make a request to the webpage url that we are scraping. The url is: https://s3-us-west-2.amazonaws.com/nicar-2015/Weekly+Rankings+-+Weekend+Box+Office+Results+++Rentrak.html



In [ ]:

Assign the html code from that site to a variable



In [ ]:

Alternatively, to access this from local file in html/ dir, uncomment the next lines

r= open('../project2/html/movies.html', 'r')
html = r.read()

Parse the html



In [ ]:

Isolate the table



In [ ]:

Find the rows, at the same time we are going to use slicing to skip the first two header rows.



In [ ]:

We are going to the csv module's DictWriter to write out our results. The DictWriter requires two things when we create it - the file and the fieldnames. First open our output file:



In [ ]:

Next specify the fieldnames.



In [ ]:

Point our csv.DictWriter at the output file and specify the fieldnames along with other necessary parameters.



In [ ]:



In [1]:

    
#loop through the rows

    #grab the table cells from each row
    
    #skip the blank rows
    #create a dictionary and assign the cell values to keys in our dictionary
    
    
    
    
    
    
    
    
    
    
    #write the variables out to a csv file



In [ ]:

    
#close the csv file



In [ ]:

    
#win