APIs

Let's start by looking at OMDb API.

The OMDb API is a free web service to obtain movie information, all content and images on the site are contributed and maintained by our users.

The Python package urllib can be used to fetch resources from the internet.

OMDb tells us what kinds of requests we can make. We are going to do a title search. As you can see below, we have an additional parameter "&Season=1" which does not appear in the parameter tables. If you read through the change log, you will see it documented there.

Using the urllib and json packages allow us to call an API and store the results locally.


In [ ]:
import json
import urllib.request

In [ ]:
data = json.loads(urllib.request.urlopen('http://www.omdbapi.com/?t=Game%20of%20Thrones&Season=1').read().\
                  decode('utf8'))

What should we expect the type to be for the variable data?


In [ ]:
print(type(data))

What do you think the data will look like?


In [ ]:
data

We now have a dictionary object of our data. We can use python to manipulate it in a variety of ways. For example, we can print all the titles of the episodes.


In [ ]:
for episode in data['Episodes']:
  print(episode['Title'], episode['imdbRating'])

We can use pandas to convert the episode information to a dataframe.


In [ ]:
import pandas as pd

df = pd.DataFrame.from_dict(data['Episodes'])

In [ ]:
df

And, we can save our data locally to use later.


In [ ]:
with open('omdb_api_data.json', 'w') as f:
    json.dump(data, f)

Let's try an API that requires an API key!

"The Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world. It strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science."

And, they have an API.

In order to use the API, you need to request a key. You can do this with an HTTP POST request.

If you are using OS X or Linux, replace "YOUR_EMAIL@example.com" in the cell below with your email address and execute the cell. This will send the rquest to DPLA and they will email your API key to the email address you provided. To successfully query the API, you must include the ?api_key= parameter with the 32-character hash following.


In [ ]:
# execute this on OS X or Linux
! curl -v -XPOST http://api.dp.la/v2/api_key/nicole@nicoledonnelly.me

If you are on Windows 7 or 10, open PowerShell. Replace "YOUR_EMAIL@example.com" in the cell below with your email address. Copy the code and paste it at the command prompt in PowerShell. This will send the rquest to DPLA and they will email your API key to the email address you provided. To successfully query the API, you must include the ?api_key= parameter with the 32-character hash following.


In [ ]:
#execute this on Windows
Invoke-WebRequest -Uri ("http://api.dp.la/v2/api_key/YOUR_EMAIL@example.com") -Method POST -Verbose -usebasicparsing

You will get a response similar to what is shown below and will receive an email fairly quickly from DPLA with your key.

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
*   Trying 52.2.169.251...
* Connected to api.dp.la (52.2.169.251) port 80 (#0)
> POST /v2/api_key/YOUR_EMAIL@example.com HTTP/1.1
> Host: api.dp.la
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 201 Created
< Access-Control-Allow-Origin: *
< Cache-Control: max-age=0, private, must-revalidate
< Content-Type: application/json; charset=utf-8
< Date: Thu, 20 Oct 2016 20:53:24 GMT
< ETag: "8b66d9fe7ded79e3151d5a22f0580d99"
< Server: nginx/1.1.19
< Status: 201 Created
< X-Request-Id: d61618751a376452ac3540b3157dcf48
< X-Runtime: 0.179920
< X-UA-Compatible: IE=Edge,chrome=1
< Content-Length: 89
< Connection: keep-alive
< 
* Connection #0 to host api.dp.la left intact
{"message":"API key created and sent via email. Be sure to check your Spam folder, too."}

It is good practice not to put your keys in your code. You should store them in a file and read them in from there. If you are pushing your code to GitHub, make sure you put your key files in .gitignore.

I created a file on my drive called "dpla_config_secret.json". The contents of the file look like this:

{ "api_key" : "my api key here" }

I can then write code to read the information in.


In [ ]:
with open("../api/dpla_config_secret.json") as key_file:
    key = json.load(key_file)

In [ ]:
key

Then, when I create my API query, I can use a variable in place of my actual key.

The Requests library allows us to build urls with different parameters. You build the parameters as a dictionary that contains key/value pairs for everything after the '?' in your url.


In [ ]:
import requests

In [ ]:
# we are specifying our url and parameters here as variables
url = 'http://api.dp.la/v2/items/'
params = {'api_key' : key['api_key'], 'q' : 'goats+AND+cats'}

In [ ]:
# we are creating a response object, r
r = requests.get(url, params=params)

In [ ]:
type(r)

In [ ]:
# we can look at the url that was created by requests with our specified variables
r.url

In [ ]:
# we can check the status code of our request
r.status_code

In [ ]:
# we can look at the content of our request
print(r.content)

By default, DPLA returns 10 items at a time. We can see from the count value, our query has 29 results. DPLA does give us a paramter we can set to change this to get up to 500 items at a time.


In [ ]:
params = {'api_key' : key['api_key'], 'q' : 'goats+AND+cats', 'page_size': 500}
r = requests.get(url, params=params)
print(r.content)

If we were working with an API that limited us to only 10 items at a time, we could write a loop to pull our data.

The file "seeclickfix_api.py" in the api folder of this repo is an example of how you can pull multiple pages of data from an API.