Lesson_01_HTTP_client


Python WebApps - Lesson 1 - HTTP client console application

Via Wikipedia:

The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, and hypermedia information systems.[1] HTTP is the foundation of data communication for the World Wide Web.

Hypertext is structured text that uses logical links (hyperlinks) between nodes containing text. HTTP is the protocol to exchange or transfer hypertext.

Let's say, we want to know the currencies of Euro, US Dollar and Polish PLN. There is a website offering this service for free on www.nbp.pl . We can access that information using a HTTP client.

What is a client? For example, it is the web browser like Chrome. We can also write our own small Python client.

The NBP portal offers a public API to access this data using HTTP messages. They provide instructions here: http://api.nbp.pl/ API - Application Programming Interface - is a set of rules for other programs to know how to communicate using messages that are run automatically from code.

HTTP message types

A message can do a lot of things.

It can retrieve something, so "download", using the GET message.

You can also remove something, using the DELETE message.

You can replace something, using the PUT message.

You can create a new item, using POST message.

Let's 'GET' a value of PLN currency

Python already has some built-in modules that allow HTTP communication. We just need to use them.

Our algorithm for now will be:

  1. Define connection - we want to connect to api.nbp.pl website interface
  2. Compose a message query - we need to say what exactly we want, like from which date, which currency etc
  3. Send the message
  4. The program will wait for a response.
  5. Receive the response.
  6. Inspect the contents of response.
  7. Try to extract from the whole message, the information that is interesting for us.

In [2]:
# Let's import required Python libraries
import requests

In [13]:
# Define connection address in raw string - raw is indicated by 'r' in front,
# and means that it should not interpret the slash sign as the 'escape character'.
address = r"http://api.nbp.pl"

# In the instructions it is written, that we should create the request address by the following rule:
# http://api.nbp.pl/api/exchangerates/rates/  {table}  /  {code}  /  {date}
# Where:
# {table} - can be A, B, C
# {date} - is a date in a format: 2017-10-12
# {code} - currency code, like EUR, USD
rates_path = "api/exchangerates/rates"
table = "a"
code = "usd"
date = "2017-09-12"

# A few ways of creating the query address:

# Add manually
resource_address = address + "/" + rates_path + "/" + table + "/" + code + "/" + date
print(resource_address)
# Join using string.join method
resource_address = "/".join([address, rates_path, table, code, date])
print(resource_address)


http://api.nbp.pl/api/exchangerates/rates/a/usd/2017-09-12
http://api.nbp.pl/api/exchangerates/rates/a/usd/2017-09-12

In [14]:
# Perform the GET query
response = requests.get(resource_address)
print(response)


<Response [200]>

HTTP response codes: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

Most popular: 200 = OK 404 = NOT FOUND

Let's inspect the response


In [19]:
type(response)


Out[19]:
requests.models.Response

In [20]:
dir(response)


Out[20]:
['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

In [25]:
# Let's view only the attributes, and not the Python built-in object fields.
attributes = response.__dict__
print(attributes)


{'_content': b'{"table":"A","currency":"dolar ameryka\xc5\x84ski","code":"USD","rates":[{"no":"176/A/NBP/2017","effectiveDate":"2017-09-12","mid":3.5552}]}', '_content_consumed': True, '_next': None, 'status_code': 200, 'headers': {'Cache-Control': 'no-cache', 'Pragma': 'no-cache', 'Content-Length': '134', 'Content-Type': 'application/json; charset=utf-8', 'Expires': '-1', 'ETag': '"MJbg40JKR9bFS2a08iC96X93xYmps65GNaz4N32vlG4="', 'Date': 'Sat, 14 Oct 2017 11:12:38 GMT'}, 'raw': <urllib3.response.HTTPResponse object at 0x04B2C6D0>, 'url': 'http://api.nbp.pl/api/exchangerates/rates/a/usd/2017-09-12', 'encoding': 'utf-8', 'history': [], 'reason': 'OK', 'cookies': <RequestsCookieJar[]>, 'elapsed': datetime.timedelta(0, 0, 97624), 'request': <PreparedRequest [GET]>, 'connection': <requests.adapters.HTTPAdapter object at 0x04B13D50>}

In [26]:
# Let's print it better.
response.__dict__


Out[26]:
{'_content': b'{"table":"A","currency":"dolar ameryka\xc5\x84ski","code":"USD","rates":[{"no":"176/A/NBP/2017","effectiveDate":"2017-09-12","mid":3.5552}]}',
 '_content_consumed': True,
 '_next': None,
 'connection': <requests.adapters.HTTPAdapter at 0x4b13d50>,
 'cookies': <RequestsCookieJar[]>,
 'elapsed': datetime.timedelta(0, 0, 97624),
 'encoding': 'utf-8',
 'headers': {'Cache-Control': 'no-cache', 'Pragma': 'no-cache', 'Content-Length': '134', 'Content-Type': 'application/json; charset=utf-8', 'Expires': '-1', 'ETag': '"MJbg40JKR9bFS2a08iC96X93xYmps65GNaz4N32vlG4="', 'Date': 'Sat, 14 Oct 2017 11:12:38 GMT'},
 'history': [],
 'raw': <urllib3.response.HTTPResponse at 0x4b2c6d0>,
 'reason': 'OK',
 'request': <PreparedRequest [GET]>,
 'status_code': 200,
 'url': 'http://api.nbp.pl/api/exchangerates/rates/a/usd/2017-09-12'}

Now. Some of these are only about the HTTP message:

  • status_code: 200 - means OK
  • reason: OK - means OK
  • url - Uniform Resource Locator - means which resource we just downloaded
  • headers - very important part, it says which headers the RESPONSE message carried
  • encoding: utf-8 - means how were the characters encoded, with which special character sets etc
  • elapsed: how long did it take for the message to get to us

The response we actually need is:

  • _content - contains all response details

One other thing to notice is that, response inside 'content' came in in a specific format. The most often used formats are:

  • JSON - Javascript Object Notation
  • XML - Extensible Markup Language

Our format of response can be seen inside 'headers':


In [27]:
response.headers


Out[27]:
{'Cache-Control': 'no-cache', 'Pragma': 'no-cache', 'Content-Length': '134', 'Content-Type': 'application/json; charset=utf-8', 'Expires': '-1', 'ETag': '"MJbg40JKR9bFS2a08iC96X93xYmps65GNaz4N32vlG4="', 'Date': 'Sat, 14 Oct 2017 11:12:38 GMT'}

In [29]:
# The format itself:
response.headers['Content-Type']


Out[29]:
'application/json; charset=utf-8'

In [30]:
# The response:
response._content


Out[30]:
b'{"table":"A","currency":"dolar ameryka\xc5\x84ski","code":"USD","rates":[{"no":"176/A/NBP/2017","effectiveDate":"2017-09-12","mid":3.5552}]}'

In [33]:
# Let's inspect the contents.
contents = response._content

In [34]:
type(contents)


Out[34]:
bytes

In [35]:
len(contents)


Out[35]:
134

That's not quite usable yet. It's an object of type 'bytes'. Let's decode the bytes to string.


In [36]:
contents = contents.decode('utf-8')
print(contents)


{"table":"A","currency":"dolar amerykański","code":"USD","rates":[{"no":"176/A/NBP/2017","effectiveDate":"2017-09-12","mid":3.5552}]}

In [37]:
type(contents)


Out[37]:
str

We are close. Now, to make this string usable at all, let's use Python module designed for using JSON format.


In [38]:
import json

contents_from_json = json.loads(contents)
print(contents_from_json)


{'table': 'A', 'currency': 'dolar amerykański', 'code': 'USD', 'rates': [{'no': '176/A/NBP/2017', 'effectiveDate': '2017-09-12', 'mid': 3.5552}]}

In [39]:
type(contents_from_json)


Out[39]:
dict

Great! Now we can access the various elements of this information using dictionary elements. Let's see how:


In [40]:
contents_from_json['currency']


Out[40]:
'dolar amerykański'

In [45]:
currency_value = contents_from_json['rates'][0]['mid']
print(currency_value)


3.5552

Let's write it all again in a concise way.


In [46]:
import requests
import json

# Address.
address = r"http://api.nbp.pl"
rates_path = "api/exchangerates/rates"

# Resource.
table = "a"
code = "usd"
date = "2017-09-12"

# Message.
resource_address = "/".join([address, rates_path, table, code, date])

response = requests.get(resource_address)

# Work on response.
contents = response._content
contents = contents.decode('utf-8')
contents_from_json = json.loads(contents)

# Print value.
currency_value = contents_from_json['rates'][0]['mid']
print(currency_value)


3.5552

Now, let's try to do the same thing, but using a different data format - XML. NBP.pl offers also XML responses, but we have to specify this when we send the query message.


In [47]:
import requests

# Address.
address = r"http://api.nbp.pl"
rates_path = "api/exchangerates/rates"

# Resource.
table = "a"
code = "usd"
date = "2017-09-12"

# Message.
resource_address = "/".join([address, rates_path, table, code, date])

# Specify format.
resource_address += "/?format=xml"

response = requests.get(resource_address)

# Work on response.
contents = response._content
contents = contents.decode('utf-8')

print(contents)


<?xml version="1.0" encoding="utf-8"?><ExchangeRatesSeries xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Table>A</Table><Currency>dolar amerykański</Currency><Code>USD</Code><Rates><Rate><No>176/A/NBP/2017</No><EffectiveDate>2017-09-12</EffectiveDate><Mid>3.5552</Mid></Rate></Rates></ExchangeRatesSeries>

In [48]:
type(contents)


Out[48]:
str

While our response is in string, it is not much readable. We need to again create a dictionary from this XML.

We can also use the ElementTree module - XML data is pretty much a 'tree', that we can go through. For example, we start from the root. Then we visit the branches going out from the root. Then the branches from that branches... etc..


In [52]:
# Let's import the XML library, selecting only a small subset module called ElementTree
import xml.etree.ElementTree as ET

contents_tree = ET.fromstring(contents)

In [53]:
type(contents_tree)


Out[53]:
xml.etree.ElementTree.Element

In [61]:
# Lets see the root Node.
contents_tree.tag


Out[61]:
'ExchangeRatesSeries'

In [63]:
contents_tree.getchildren()


Out[63]:
[<Element 'Table' at 0x04BB3330>,
 <Element 'Currency' at 0x04BB3360>,
 <Element 'Code' at 0x04BB3390>,
 <Element 'Rates' at 0x04BB33C0>]

In [73]:
element_mid = contents_tree.getchildren()[3].getchildren()[0].getchildren()[2]

In [74]:
element_mid.text


Out[74]:
'3.5552'

Let's write that again in a more clear way.


In [76]:
elemend_mid = contents_tree.findall(r".//Mid")
element_mid.text


Out[76]:
'3.5552'

Now the final program.


In [81]:
import requests
import xml.etree.ElementTree as ET

# Address.
address = r"http://api.nbp.pl"
rates_path = "api/exchangerates/rates"

# Resource.
table = "a"
code = "usd"
date = "2017-09-12"

# Message.
resource_address = "/".join([address, rates_path, table, code, date])

# Specify format.
resource_address += "/?format=xml"

response = requests.get(resource_address)

# Work on response.
contents = response._content
contents = contents.decode('utf-8')

contents_tree = ET.fromstring(contents)
element_mid = contents_tree.findall(r".//Mid")

print(str(element_mid[0].text))


3.5552

To make the program look better and handle both formats, we should refactor it. Let's write more functions.


In [83]:
import xml.etree.ElementTree as ET
import json


def extract_value(string_contents, data_format):
    if data_format == 'xml':
        return extract_value_from_xml(string_contents)
    else:
        return extract_value_from_json(string_contents)

    
def extract_value_from_xml(string_contents):
    contents_tree = ET.fromstring(contents)
    element_mid = contents_tree.findall(r".//Mid")
    
    # It returns a list, let's take the only element from inside - 'unpack from list'
    return element_mid[0].text
    

def extract_value_from_json(string_contents):
    contents_from_json = json.loads(contents)
    return contents_from_json['rates'][0]['mid']


def construct_address(table, code, date, data_type):
    data_type = "/?format={0}".format(data_type)
    address = r"http://api.nbp.pl/api/exchangerates/rates"
    return "/".join([address, rates_path, table, code, date, data_type])

Now, our program looks like this:


In [84]:
import requests

# Address.
data_type = "json"
address = construct_address("a", "usd", "2017-09-12", data_type)

response = requests.get(address)

# Work on response.
contents = response._content
contents = contents.decode('utf-8')

value = extract_value(contents, data_type)
print(value)


3.5552

In [ ]: