HTTP requests

In this tutorial ti is covered how to make requests via HTTP protocol. For more informations about related stuff see:

Keep in mind, that in this tutorial we work only with static content. How to obtain web dynamic content is not covered in this tutorial. If you want to deal with dynamic content, study Selenium Python Bindings.

Get HTML page content

In this section are examples how to get HTTP response with two different libraries:

  • urllib (standard library in Python 3)
  • Requests (instalable through pip)

In this tutorial is mainly used the Requests library, as a prefered option.

Urlib2 library

Example how to get static content of web page with Urlib2 follows:


In [1]:
from urllib.request import urlopen

r = urlopen('http://www.python.org/')
data = r.read()

print("Status code:", r.getcode())


Status code: 200

The variable data contains returned HTML code (full page) as string. You can process it, save it, or do anything else you need.

Requests

Example how to get static content of web page with Requests follows.


In [2]:
import requests

r = requests.get("http://www.python.org/")
data = r.text

print("Status code:", r.status_code)


Status code: 200

Get JSON data from an API

This task is demonstrated on Open Notify - an open source project that provides a simple programming interface for some of NASA’s awesome data.

The examples bellow cover how to obtain current possition of ISS. With Requests library it is possible to get the JSON from the API in the same way as HTML data.


In [3]:
import requests

r = requests.get("http://api.open-notify.org/iss-now.json")
obj = r.json()

print(obj)


{'message': 'success', 'iss_position': {'longitude': '-48.4604', 'latitude': '-33.8884'}, 'timestamp': 1490553838}

The Requests function json() convert the json response to Python dictionary. In next code block is demonstrated how to get data from obtained response.

Persistent session with Requests

Session with Requests are handy for cases where you need to use same cookies (session cookies for example) or authentication for multiple requests.


In [4]:
s = requests.Session()
print("No cookies on start: ")
print(dict(s.cookies))
r = s.get('http://google.cz/')
print("\nA cookie from google: ")
print(dict(s.cookies))
r = s.get('http://google.cz/?q=cat')
print("\nThe cookie is perstent:")
print(dict(s.cookies))


No cookies on start: 
{}

A cookie from google: 
{'NID': '99=F_pIEFBnvtQc1poKGMdpGVl8H7ekYVJIEkYZ__apuE5nibL1j9AbqfYzdlLgwJEQ2FGpB1dBGVvqpbRGMWPjQ4AlFyKI3vt7adS2DwMeMMfraFXvGK4liLBvmr3DPhU2'}

The cookie is perstent:
{'NID': '99=F_pIEFBnvtQc1poKGMdpGVl8H7ekYVJIEkYZ__apuE5nibL1j9AbqfYzdlLgwJEQ2FGpB1dBGVvqpbRGMWPjQ4AlFyKI3vt7adS2DwMeMMfraFXvGK4liLBvmr3DPhU2'}

Compare the output of the code above, with the example bellow.


In [5]:
r = requests.get('http://google.cz/')
print("\nA cookie from google: ")
print(dict(r.cookies))
r = requests.get('http://google.cz/?q=cat')
print("\nDifferent cookie:")
print(dict(r.cookies))


A cookie from google: 
{'NID': '99=n0oJjmdKl8ZBirmaW2Nn0y-o6MN3ZQ9_ZRBgcvP_zxxUS0il3u2vHycFuUIvrpglyoxHKVG58GlaMy41ADWEoQ7hjAUQLroOpfYHU5ueWcpEfa_dOVgLz2uoXQYkQNzC'}

Different cookie:
{'NID': '99=euuRxzgwcRT8rBwxVdWlYJ5vhZP4k-Hww-pXCSfml5LYu3jb6IuaZQo-cUqe3sypuMO8TE81TPbDQ8Ehp6BEdmG-MwnDOeBS8NKmDSdnlXgXjWrM3nGQtfc2ves_uRVT'}

Custom headers

Headers of the response are easy to check, example follows.


In [6]:
r = requests.get("http://www.python.org/")
print(r.headers)


CaseInsensitiveDict({'x-cache-hits': '4', 'strict-transport-security': 'max-age=63072000; includeSubDomains', 'connection': 'keep-alive', 'x-frame-options': 'SAMEORIGIN', 'content-length': '47426', 'via': '1.1 varnish', 'x-clacks-overhead': 'GNU Terry Pratchett', 'server': 'nginx', 'age': '3119', 'date': 'Sun, 26 Mar 2017 18:43:59 GMT', 'vary': 'Cookie', 'public-key-pins': 'max-age=600; includeSubDomains; pin-sha256="WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18="; pin-sha256="5C8kvU039KouVrl52D0eZSGf4Onjo4Khs8tmyTlV3nU="; pin-sha256="5C8kvU039KouVrl52D0eZSGf4Onjo4Khs8tmyTlV3nU="; pin-sha256="lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU="; pin-sha256="TUDnr0MEoJ3of7+YliBMBVFB4/gJsv5zO7IxD9+YoWI="; pin-sha256="x4QzPSC810K5/cMjb05Qm4k3Bw5zBn4lTdO/nEW/Td4=";', 'x-cache': 'HIT', 'x-served-by': 'cache-ams4143-AMS', 'x-timer': 'S1490553839.466339,VS0,VE0', 'accept-ranges': 'bytes', 'content-type': 'text/html; charset=utf-8'})

The request headers can be modified in simple way as follows.


In [7]:
headers = {
    "Accept": "text/plain",
}

r = requests.get("http://www.python.org/", headers=headers)
print(r.status_code)


200

More information about HTTP headers can be found at List of HTTP header fields wikipedia page.