Import our modules or packages that we will need to scrape a website, including requests
and bs4
and csv
In [ ]:
Make a request to the webpage url that we are scraping. The url is: https://s3-us-west-2.amazonaws.com/nicar-2015/Weekly+Rankings+-+Weekend+Box+Office+Results+++Rentrak.html
In [ ]:
Assign the html code from that site to a variable
In [ ]:
Alternatively, to access this from local file in html/ dir, uncomment the next lines
r= open('../project2/html/movies.html', 'r')
html = r.read()
Parse the html
In [ ]:
Isolate the table
In [ ]:
Find the rows, at the same time we are going to use slicing to skip the first two header rows.
In [ ]:
We are going to the csv module's DictWriter to write out our results. The DictWriter requires two things when we create it - the file and the fieldnames. First open our output file:
In [ ]:
Next specify the fieldnames.
In [ ]:
Point our csv.DictWriter at the output file and specify the fieldnames along with other necessary parameters.
In [ ]:
In [1]:
#loop through the rows
#grab the table cells from each row
#skip the blank rows
#create a dictionary and assign the cell values to keys in our dictionary
#write the variables out to a csv file
In [ ]:
#close the csv file
In [ ]:
#win