In this tutorial ti is covered how to make requests via HTTP protocol. For more informations about related stuff see:
Keep in mind, that in this tutorial we work only with static content. How to obtain web dynamic content is not covered in this tutorial. If you want to deal with dynamic content, study Selenium Python Bindings.
In this section are examples how to get HTTP response with two different libraries:
In this tutorial is mainly used the Requests library, as a prefered option.
Example how to get static content of web page with Urlib2 follows:
In [1]:
from urllib.request import urlopen
r = urlopen('http://www.python.org/')
data = r.read()
print("Status code:", r.getcode())
In [2]:
import requests
r = requests.get("http://www.python.org/")
data = r.text
print("Status code:", r.status_code)
This task is demonstrated on Open Notify - an open source project that provides a simple programming interface for some of NASA’s awesome data.
The examples bellow cover how to obtain current possition of ISS. With Requests library it is possible to get the JSON from the API in the same way as HTML data.
In [3]:
import requests
r = requests.get("http://api.open-notify.org/iss-now.json")
obj = r.json()
print(obj)
The Requests function json() convert the json response to Python dictionary. In next code block is demonstrated how to get data from obtained response.
Session with Requests are handy for cases where you need to use same cookies (session cookies for example) or authentication for multiple requests.
In [4]:
s = requests.Session()
print("No cookies on start: ")
print(dict(s.cookies))
r = s.get('http://google.cz/')
print("\nA cookie from google: ")
print(dict(s.cookies))
r = s.get('http://google.cz/?q=cat')
print("\nThe cookie is perstent:")
print(dict(s.cookies))
Compare the output of the code above, with the example bellow.
In [5]:
r = requests.get('http://google.cz/')
print("\nA cookie from google: ")
print(dict(r.cookies))
r = requests.get('http://google.cz/?q=cat')
print("\nDifferent cookie:")
print(dict(r.cookies))
In [6]:
r = requests.get("http://www.python.org/")
print(r.headers)
The request headers can be modified in simple way as follows.
In [7]:
headers = {
"Accept": "text/plain",
}
r = requests.get("http://www.python.org/", headers=headers)
print(r.status_code)
More information about HTTP headers can be found at List of HTTP header fields wikipedia page.