Via Wikipedia:
Hypertext is structured text that uses logical links (hyperlinks) between nodes containing text. HTTP is the protocol to exchange or transfer hypertext.
Let's say, we want to know the currencies of Euro, US Dollar and Polish PLN. There is a website offering this service for free on www.nbp.pl . We can access that information using a HTTP client.
What is a client? For example, it is the web browser like Chrome. We can also write our own small Python client.
The NBP portal offers a public API to access this data using HTTP messages. They provide instructions here: http://api.nbp.pl/ API - Application Programming Interface - is a set of rules for other programs to know how to communicate using messages that are run automatically from code.
A message can do a lot of things.
It can retrieve something, so "download", using the GET message.
You can also remove something, using the DELETE message.
You can replace something, using the PUT message.
You can create a new item, using POST message.
Python already has some built-in modules that allow HTTP communication. We just need to use them.
Our algorithm for now will be:
In [2]:
# Let's import required Python libraries
import requests
In [13]:
# Define connection address in raw string - raw is indicated by 'r' in front,
# and means that it should not interpret the slash sign as the 'escape character'.
address = r"http://api.nbp.pl"
# In the instructions it is written, that we should create the request address by the following rule:
# http://api.nbp.pl/api/exchangerates/rates/ {table} / {code} / {date}
# Where:
# {table} - can be A, B, C
# {date} - is a date in a format: 2017-10-12
# {code} - currency code, like EUR, USD
rates_path = "api/exchangerates/rates"
table = "a"
code = "usd"
date = "2017-09-12"
# A few ways of creating the query address:
# Add manually
resource_address = address + "/" + rates_path + "/" + table + "/" + code + "/" + date
print(resource_address)
# Join using string.join method
resource_address = "/".join([address, rates_path, table, code, date])
print(resource_address)
In [14]:
# Perform the GET query
response = requests.get(resource_address)
print(response)
HTTP response codes: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Most popular: 200 = OK 404 = NOT FOUND
Let's inspect the response
In [19]:
type(response)
Out[19]:
In [20]:
dir(response)
Out[20]:
In [25]:
# Let's view only the attributes, and not the Python built-in object fields.
attributes = response.__dict__
print(attributes)
In [26]:
# Let's print it better.
response.__dict__
Out[26]:
Now. Some of these are only about the HTTP message:
The response we actually need is:
One other thing to notice is that, response inside 'content' came in in a specific format. The most often used formats are:
Our format of response can be seen inside 'headers':
In [27]:
response.headers
Out[27]:
In [29]:
# The format itself:
response.headers['Content-Type']
Out[29]:
In [30]:
# The response:
response._content
Out[30]:
In [33]:
# Let's inspect the contents.
contents = response._content
In [34]:
type(contents)
Out[34]:
In [35]:
len(contents)
Out[35]:
That's not quite usable yet. It's an object of type 'bytes'. Let's decode the bytes to string.
In [36]:
contents = contents.decode('utf-8')
print(contents)
In [37]:
type(contents)
Out[37]:
We are close. Now, to make this string usable at all, let's use Python module designed for using JSON format.
In [38]:
import json
contents_from_json = json.loads(contents)
print(contents_from_json)
In [39]:
type(contents_from_json)
Out[39]:
Great! Now we can access the various elements of this information using dictionary elements. Let's see how:
In [40]:
contents_from_json['currency']
Out[40]:
In [45]:
currency_value = contents_from_json['rates'][0]['mid']
print(currency_value)
Let's write it all again in a concise way.
In [46]:
import requests
import json
# Address.
address = r"http://api.nbp.pl"
rates_path = "api/exchangerates/rates"
# Resource.
table = "a"
code = "usd"
date = "2017-09-12"
# Message.
resource_address = "/".join([address, rates_path, table, code, date])
response = requests.get(resource_address)
# Work on response.
contents = response._content
contents = contents.decode('utf-8')
contents_from_json = json.loads(contents)
# Print value.
currency_value = contents_from_json['rates'][0]['mid']
print(currency_value)
Now, let's try to do the same thing, but using a different data format - XML. NBP.pl offers also XML responses, but we have to specify this when we send the query message.
In [47]:
import requests
# Address.
address = r"http://api.nbp.pl"
rates_path = "api/exchangerates/rates"
# Resource.
table = "a"
code = "usd"
date = "2017-09-12"
# Message.
resource_address = "/".join([address, rates_path, table, code, date])
# Specify format.
resource_address += "/?format=xml"
response = requests.get(resource_address)
# Work on response.
contents = response._content
contents = contents.decode('utf-8')
print(contents)
In [48]:
type(contents)
Out[48]:
While our response is in string, it is not much readable. We need to again create a dictionary from this XML.
We can also use the ElementTree module - XML data is pretty much a 'tree', that we can go through. For example, we start from the root. Then we visit the branches going out from the root. Then the branches from that branches... etc..
In [52]:
# Let's import the XML library, selecting only a small subset module called ElementTree
import xml.etree.ElementTree as ET
contents_tree = ET.fromstring(contents)
In [53]:
type(contents_tree)
Out[53]:
In [61]:
# Lets see the root Node.
contents_tree.tag
Out[61]:
In [63]:
contents_tree.getchildren()
Out[63]:
In [73]:
element_mid = contents_tree.getchildren()[3].getchildren()[0].getchildren()[2]
In [74]:
element_mid.text
Out[74]:
Let's write that again in a more clear way.
In [76]:
elemend_mid = contents_tree.findall(r".//Mid")
element_mid.text
Out[76]:
Now the final program.
In [81]:
import requests
import xml.etree.ElementTree as ET
# Address.
address = r"http://api.nbp.pl"
rates_path = "api/exchangerates/rates"
# Resource.
table = "a"
code = "usd"
date = "2017-09-12"
# Message.
resource_address = "/".join([address, rates_path, table, code, date])
# Specify format.
resource_address += "/?format=xml"
response = requests.get(resource_address)
# Work on response.
contents = response._content
contents = contents.decode('utf-8')
contents_tree = ET.fromstring(contents)
element_mid = contents_tree.findall(r".//Mid")
print(str(element_mid[0].text))
To make the program look better and handle both formats, we should refactor it. Let's write more functions.
In [83]:
import xml.etree.ElementTree as ET
import json
def extract_value(string_contents, data_format):
if data_format == 'xml':
return extract_value_from_xml(string_contents)
else:
return extract_value_from_json(string_contents)
def extract_value_from_xml(string_contents):
contents_tree = ET.fromstring(contents)
element_mid = contents_tree.findall(r".//Mid")
# It returns a list, let's take the only element from inside - 'unpack from list'
return element_mid[0].text
def extract_value_from_json(string_contents):
contents_from_json = json.loads(contents)
return contents_from_json['rates'][0]['mid']
def construct_address(table, code, date, data_type):
data_type = "/?format={0}".format(data_type)
address = r"http://api.nbp.pl/api/exchangerates/rates"
return "/".join([address, rates_path, table, code, date, data_type])
Now, our program looks like this:
In [84]:
import requests
# Address.
data_type = "json"
address = construct_address("a", "usd", "2017-09-12", data_type)
response = requests.get(address)
# Work on response.
contents = response._content
contents = contents.decode('utf-8')
value = extract_value(contents, data_type)
print(value)
In [ ]: