Let's scrape the IRE homepage

Our goal: Print out the headlines from the IRE home page.

requests is a handy third-party library for making HTTP requests. It does the same thing your browser does when you type in a URL and hit enter -- sends a message to a server and requests a copy of the page -- but it allows us to do this programatically instead of pointing and clicking. For our purposes today, we're interested in the library's get() method.

Import the libraries



In [ ]:

Fetch and parse the HTML



In [ ]:

    
# use the `get()` method to fetch a copy of the IRE home page


# feed the text of the web page to a BeautifulSoup object

Target the headlines

View source on the IRE homepage and find the headlines. What's the pattern?



In [ ]:

    
# get a list of headlines we're interested in

Loop over the heds, printing out the text

You can drill down into a nested tag using a period.



In [ ]:

Exercise: Print the links

Your mission: Loop over the headlines and print the links (the href portion of the tag) for each one. You can access tag attributes like you'd access values in a dictionary. (This might require some Googling.)



In [ ]: