Mining acknowledgments in ADS

Author: David Jaimes

Date: 2017 June 24

Note: This Jupyter Notebook was created with Python 3.6.1 :: Anaconda 4.4.0 (x86_64).

Generate ADS API Token key

Go to https://ui.adsabs.harvard.edu and create an account login. Go to Account > Customize Settings > API Token and click the Generate a new key button.

Python code

Import package libraries.



In [2]:

    
import json
import numpy
import os
import requests
import matplotlib.pyplot as plt
%matplotlib inline

Get ADS API Token key from environment variable and set the base URL.



In [3]:

    
ADS_KEY = os.environ["ADS_API_KEY"]
BASE_URL = "https://api.adsabs.harvard.edu/v1/search/query"

Define query_acknowledgments function:



In [4]:

    
def query_acknowledgments(word):
    # Set query parameters
    params = {
              'q': 'ack:{0:s},property:REFEREED'.format(word),
              'fl': 'pubdate',
              'rows': '200',
              'start': 0,
             }
    pub_years = []
    while True:
        # Execute the query
        headers = {'Authorization': 'Bearer:' + ADS_KEY}
        r = requests.get(BASE_URL, params=params, headers=headers)
        
        # Check if anything went wrong
        if r.status_code != requests.codes.ok:
            e = json.loads(r.text)
            perror = "Error retrieving results: {0:s}\n".format(e['error'])
            sys.stderr.write(perror)
            continue
            
        # Extract results
        data = json.loads(r.text)
        for d in data['response']['docs']:
            pub_years.append(float(d['pubdate'].split('-')[0]))
            
        # Update starting point
        params['start'] += 200
        
        # Check if finished
        if params['start'] >= data["response"]["numFound"]:
            break
    return numpy.array(pub_years)

Define total_number function and get total number of refereed papers per year. For normalization purposes.



In [5]:

    
def total_number(year):
    params = {
          'q': 'pubdate:{0:s},property:REFEREED'.format(year),
          'rows': 1
          }
    headers = {'Authorization': 'Bearer:' + ADS_KEY}
    r = requests.get(BASE_URL, params=params, headers=headers)
    data = json.loads(r.text)
    return data["response"]["numFound"]


YEARS = list(range(1995, 2017))
TOTAL_COUNT = []
for year in YEARS:
    date = '{0:04d}'.format(year)
    TOTAL_COUNT.append(total_number(date))
TOTAL_COUNT = numpy.array(TOTAL_COUNT)

Define plot_yearly_trend function:



In [6]:

    
def plot_yearly_trend(keyword, label=None):
    pub_years = query_acknowledgments(keyword)
    query_count = numpy.array([numpy.sum(pub_years == year) for year in YEARS])
    plt.plot(YEARS, query_count / TOTAL_COUNT * 100., label=label, lw=2,
             alpha=0.8)

Plot online databases:



In [7]:

    
plot_yearly_trend('ned', label='NED')
plot_yearly_trend('simbad', label='Simbad')
plot_yearly_trend('vizier', label='Vizier')
plt.legend(loc=2)
plt.xlabel("Year")
plt.ylabel("% of papers mentioning various keywords");

Plot programming languages:



In [9]:

    
plot_yearly_trend('fortran', label='Fortran')
plot_yearly_trend('idl', label='IDL')
plot_yearly_trend('julia', label='Julia')
plot_yearly_trend('matlab', label='Matlab')
plot_yearly_trend('python', label='Python')
plt.legend(loc=2)
plt.xlabel("Year")
plt.ylabel("% of papers mentioning various keywords");

Plot ADS:



In [10]:

    
plot_yearly_trend('Astrophysics Data System', label='Astrophysics Data System')
plt.legend(loc=2)
plt.xlabel("Year")
plt.ylabel("% of papers mentioning various keywords");



In [ ]: