Graded = 7/8
NYT
All API's: http://developer.nytimes.com/ Article search API: http://developer.nytimes.com/article_search_v2.json Best-seller API: http://developer.nytimes.com/books_api.json#/Documentation Test/build queries: http://developer.nytimes.com/
Tip: Remember to include your API key in all requests! And their interactive web thing is pretty bad. You'll need to register for the API key.
In [2]:
import config
import requests
#imports key from config file
nyt_articles_api = config.nyt_articles_api
nyt_books_api = config.nyt_books_api
nyt_movie_api = config.nyt_movie_api
response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api)
data = response.json()
# print(data)
1) What books topped the Hardcover Fiction NYT best-sellers list on Mother's Day in 2009 and 2010? How about Father's Day?
In [61]:
published = "";
# response = requests.get('https://api.nytimes.com/svc/books/v3/lists//.json?api-key=' + nyt_books_api + "&list-name=hardcover-fiction&published-date=2009-10-05")
#mother's day 2009 - 10 - 05
# mother's day 2010 2010-09-05
# father's day 2009 -21-06
# father day 2010 - 20 -06
dates = ['2009-05-10', '2010-05-09', '2009-06-21', '2010-06-20']
for date in dates:
response = requests.get('https://api.nytimes.com/svc/books/v3/lists//.json?api-key=' + nyt_books_api + "&list-name=hardcover-fiction&published-date=" + date)
bestseller_data = response.json()
bestseller_data['results']
results = bestseller_data['results'][0]
# print(type(results))
print("The best selling book on", date, "was", results['book_details'][0]['title'])
# print(bestseller_data)
#print(results['book_details'])
2) What are all the different book categories the NYT ranked in June 6, 2009? How about June 6, 2015?
In [85]:
response = requests.get('https://api.nytimes.com/svc/books/v3/lists/names.json?api-key=' + nyt_books_api)
bestseller_ldata = response.json()
bestseller_ldata['results']
# print(bestseller_ldata['results'][0])
#The lists
print("On June, 6th, 2009 the NYT published the following bestsellers lists:")
for book in bestseller_ldata['results']:
if book['oldest_published_date'] < '2009-06-06' and book['newest_published_date'] >= '2009-06-06':
print(book['display_name'])
else:
pass
print("\nOn June, 6th, 2015 the NYT published the following bestsellers lists:")
for book in bestseller_ldata['results']:
if book['oldest_published_date'] < '2015-06-06' and book['newest_published_date'] >= '2015-06-06':
print(book['display_name'])
else:
pass
# print("Too young")
# for book in bestseller_ldata:
3) Muammar Gaddafi's name can be transliterated many many ways. His last name is often a source of a million and one versions - Gadafi, Gaddafi, Kadafi, and Qaddafi to name a few. How many times has the New York Times referred to him by each of those names?
Tip: Add "Libya" to your search to make sure (-ish) you're talking about the right guy.
In [239]:
ppl = ['Gaddafi','Gadafi', 'Kadafi','Qaddafi']
for person in ppl:
# fq yields a lot more results than just q need to figure out difference b/w hits and times
# response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&fq=' + person + ' Libya')
response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&q=' + person + ' Libya')
muammar_data = response.json()
print("Muammar was referred to as ", person, muammar_data['response']['meta']['hits'], "times in the New York Times.")
# print(muammar_data)
# print(muammar_data['response']['docs'])
# print(muammar_data['response']['docs'][0]['keywords'])
# print(muammar_data['response']['docs'][0])
# keywords = []
# ppl = ['Gaddafi','Gadafi', 'Kadafi','Qaddafi']
#for article in muammar_data['response']['docs']:
# for keyword in article['keywords']:
# print(keyword['value'])
# for person in ppl:
#print(x)
# if person in keyword:
# print("print", keyword['value'], "was found")
# print(keyword['value'])
# keywords.append(keyword['value'])
#from collections import Counter
#counts = Counter(keywords)
#print(counts)
In [189]:
#for article in muammar_data['response']['docs']:
# print(article["keywords"])
In [190]:
# len(muammar_data['response']['docs'])
4) What's the title of the first story to mention the word 'hipster' in 1995? What's the first paragraph?
In [224]:
response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&q=hipster&begin_date=19950101&end_date=19951231&sort=oldest')
hipster_data = response.json()
# print(hipster_data['response']['docs'])
hippie = hipster_data['response']['docs']
print("The first story to mention the word 'hipster' in 1995 was titled", hippie[0]['headline']['kicker'] + "; " + hippie[0]['headline']['main'])
5) How many times was gay marriage mentioned in the NYT between 1950-1959, 1960-1969, 1970-1978, 1980-1989, 1990-2099, 2000-2009, and 2010-present?
Tip: You'll want to put quotes around the search term so it isn't just looking for "gay" and "marriage" in the same article.
Tip: Write code to find the number of mentions between Jan 1, 1950 and Dec 31, 1959.
In [294]:
#Ta-Stephan: Beause you added to the start and end date early, the 1950s weren't counted.
start_date = 19500101
end_date = 19591231
for n in [1,2,3,4,5,6]:
if (n <= 5):
start_date = start_date + 100000
end_date = end_date + 100000
else:
start_date = start_date + 100000
end_date = 20160609
response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&q="\"gay marriage\""&begin_date=' + str(start_date) + '&end_date=' + str(end_date) + '&sort=oldest')
gay_marriage_data = response.json()
gay_marriage_hits = gay_marriage_data['response']['meta']['hits']
start_str = str(start_date)
start_str = start_str[:4]
end_str = str(end_date)
end_str = end_str[:4]
print("There were", gay_marriage_hits, "mentions of gay marriage between", start_str, "and", end_str)
6) What section talks about motorcycles the most?
Tip: You'll be using facets
In [298]:
response = requests.get('https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=' + nyt_articles_api + '&q=motorcycle&facet_field=section_name')
moto_data = response.json()
# print(moto_data['response']['facets']['section_name']['terms'])
# documentation found re: facets in NYT API
# https://data-gov.tw.rpi.edu/wiki/How_to_use_New_York_Times_Article_Search_API
moto_sections = moto_data['response']['facets']['section_name']['terms']
moto_count = 0
most_motos = ""
for section in moto_sections:
if section['count'] > moto_count:
moto_count = section['count']
most_motos = section['term']
print("The section of the New York Times that mentions motorcycles the most is the", most_motos, "section which mentions motorcycles", moto_count, "times.")
7) How many of the last 20 movies reviewed by the NYT were Critics' Picks? How about the last 40? The last 60?
Tip: You really don't want to do this 3 separate times (1-20, 21-40 and 41-60) and add them together. What if, perhaps, you were able to figure out how to combine two lists? Then you could have a 1-20 list, a 1-40 list, and a 1-60 list, and then just run similar code for each of them.
In [46]:
criticPickCount = 0
for offset in [0,1,2,3]:
offset = offset * 20
# print(offset)
response = requests.get('https://api.nytimes.com/svc/movies/v2/reviews/search.json?api-key=' + nyt_movie_api + '&offset=' + str(offset))
movie_data = response.json()
# print(movie_data)
# print(movie_data['results'])
for movie in movie_data['results']:
if movie['critics_pick'] == 1:
# print(movie['display_title'])
criticPickCount = criticPickCount + 1
if offset == 0:
print("There were", criticPickCount, "Critic' Picks in the last 20 movies that were reviewed.")
if offset == 20:
print("There were", criticPickCount, "Critic' Picks in the last 40 movies that were reviewed.")
if offset == 40:
print("There were", criticPickCount, "Critic' Picks in the last 60 movies that were reviewd.")
if offset == 60:
print("There were", criticPickCount, "Critic' Picks in the last 80 movies that were reviewed.")
# print("There were", criticPickCount, "Critic' Picks.")
8) Out of the last 40 movie reviews from the NYT, which critic has written the most reviews?
In [34]:
for offset in [0,1,2]:
offset = offset * 20
# print(offset)
response = requests.get('https://api.nytimes.com/svc/movies/v2/reviews/search.json?api-key=' + nyt_movie_api + '&offset=' + str(offset))
movie_data = response.json()
# print(movie_data)
criticPickCount = 0
authors = []
# print(movie_data['results'])
#the critics name is stored in the byline
for movie in movie_data['results']:
authors.append(movie['byline'])
# print(movie['byline'])
from collections import Counter
counts = Counter(authors)
# print(counts)
print(Counter(authors).most_common(1) , 'has written the most reviews out of the last 40 NYT reviews.')
In [ ]: