Fetching Webpages with the Requests Library To Demonstrate

Requests is good for times when you just need to fetch a webpage and do something with the raw HTML. It doesn't give you a whole lot more, but does do that incredibly well! The Requests homepage has lots of good examples and full documentation.


In [1]:
import requests
r = requests.get('http://www.imdb.com/name/nm0000125/')

You can make sure the request actually worked (ie. HTTP status code 200)


In [2]:
r.status_code


Out[2]:
200

You can check what type of content the webpage returned (ie. text, json, csv, etc)


In [3]:
r.headers['content-type']


Out[3]:
'text/html;charset=UTF-8'

You can check the character set (sure hope it is utf-8!)


In [18]:
r.encoding


Out[18]:
'UTF-8'

Of course, you can get the actual HTML text too!


In [4]:
r.text[0:200]


Out[4]:
u'\n\n\n\n<!DOCTYPE html>\n<html\nxmlns:og="http://ogp.me/ns#"\nxmlns:fb="http://www.facebook.com/2008/fbml">\n    <head>\n        <meta charset="utf-8">\n        <meta http-equiv="X-UA-Compatible" content="IE=ed'

But sometimes you just want the header content, for instance if you want to resolve redirects without actually downloading the full webpage content.


In [5]:
r = requests.head("http://feeds.foxnews.com/~r/foxnews/national/~3/vZ_mHFtNHag/", allow_redirects=True)
r.url


Out[5]:
u'http://www.foxnews.com/us/2014/11/21/nypd-rookie-kills-unarmed-man-in-brooklyn.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%253A+foxnews%252Fnational+%2528Internal+-+US+Latest+-+Text%2529'

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: