A full-fledged scraper

Import our modules or packages that we will need to scrape a website, including requests and bs4 and csv


In [ ]:

Make a request to the webpage url that we are scraping. The url is: https://s3-us-west-2.amazonaws.com/nicar-2015/Weekly+Rankings+-+Weekend+Box+Office+Results+++Rentrak.html


In [ ]:

Assign the html code from that site to a variable


In [ ]:

Alternatively, to access this from local file in html/ dir, uncomment the next lines

r= open('../project2/html/movies.html', 'r')
html = r.read()

Parse the html


In [ ]:

Isolate the table


In [ ]:

Find the rows, at the same time we are going to use slicing to skip the first two header rows.


In [ ]:

We are going to the csv module's DictWriter to write out our results. The DictWriter requires two things when we create it - the file and the fieldnames. First open our output file:


In [ ]:

Next specify the fieldnames.


In [ ]:

Point our csv.DictWriter at the output file and specify the fieldnames along with other necessary parameters.


In [ ]:


In [1]:
#loop through the rows

    #grab the table cells from each row
    
    #skip the blank rows
    #create a dictionary and assign the cell values to keys in our dictionary
    
    
    
    
    
    
    
    
    
    
    #write the variables out to a csv file

In [ ]:
#close the csv file

In [ ]:
#win