Dates pivot table


In [1]:
from datetime import datetime
start = datetime.utcnow() # For measuring the total processing time

In [2]:
import json
from urllib.request import urlopen
import pandas as pd
import numpy as np


/home/ednilson/.virtualenvs/jupyter/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Get collection information from ArticleMeta


In [3]:
AMC_URL = "http://articlemeta.scielo.org/api/v1/collection/identifiers/"
amc_data = pd.DataFrame(json.load(urlopen(AMC_URL)))

print("Number of collections: " + str(amc_data.shape[0]+1))
amc_data.head(2)


Number of collections: 34
Out[3]:
acron acron2 code document_count domain has_analytics is_active journal_count name original_name status type
0 arg ar arg 39006.0 www.scielo.org.ar True True {'deceased': 22, 'current': 125} {'en': 'Argentina', 'pt': 'Argentina', 'es': '... Argentina certified journals
1 chl cl chl 63467.0 www.scielo.cl True True {'deceased': 13, 'suspended': 1, 'current': 105} {'en': 'Chile', 'pt': 'Chile', 'es': 'Chile'} Chile certified journals
Filtering valid collections and renames 'code' to 'collection'

Some collections won't be analyzed, mainly to avoid duplicates (there are articles in more than one collection). The spa (Public Health collection) should have part of it kept in the result, but it's not a collection whose journals/articles are assigned to a single country. The collections below are linked to a single country:


In [4]:
dont_evaluate = ["bio", "cci", "cic", "ecu", "psi", "pry", "rve", "rvo", "rvt", "sss", "spa", "wid"]
amc_names_map = {"code": "collection"}
amc_pairs = amc_data[(amc_data["acron2"].str.len() == 2) & 
                     ~amc_data["code"].isin(dont_evaluate)]\
                    .rename(columns=amc_names_map)\
#                    "~amc_data["code"].isin(dont_evaluate)]" is denying the list "dont_evaluate"
print("Number of collections: " + str(amc_pairs.shape[0]+1))

collections = amc_pairs[['collection']].copy()
collections


Number of collections: 15
Out[4]:
collection
0 arg
1 chl
2 col
3 cub
4 esp
5 mex
6 prt
8 scl
11 sza
12 ven
14 bol
15 cri
16 per
19 ury

ISSN selection from spa

These journals in the spa collection have the following countries:


In [5]:
spa_issn_country = pd.DataFrame([
    ("0021-2571"),
    ("0042-9686"),
    ("1020-4989"),
    ("1555-7960"),
], columns=["issn"])
spa_issn_country # For collection = "spa", only!


Out[5]:
issn
0 0021-2571
1 0042-9686
2 1020-4989
3 1555-7960

Dates dataset

This dataset is the Network spreadsheet/CSV pack which can be found in the SciELO Analytics report web page. The first two rows of it are:

Unzip the CSV file


In [6]:
import zipfile

# Use the Zip file in jcatalog/data/scielo
with zipfile.ZipFile( "../../data/scielo/tabs_network_190210.zip", 'r') as zip_ref:
    zip_ref.extract('documents_dates.csv', 'csv_files')

In [7]:
df0 = pd.read_csv('csv_files/documents_dates.csv', keep_default_na=False, low_memory=False)
df0.shape


Out[7]:
(877068, 49)

Simplify the column names


In [8]:
names_map = {
    "ISSN SciELO": "issn",
    "title at SciELO":"title",
    "document publishing ID (PID SciELO)": "docs",
    "document type":"type",
    "document is citable": "is_citable",
    "document publishing year": "year"
}
#  df[list(names_map.keys())].rename(columns=names_map, inplace=True)
df0.rename(columns=names_map, inplace=True)
df0.head(2)


Out[8]:
extraction date study unit collection issn ISSN's title title thematic areas title is agricultural sciences title is applied social sciences title is biological sciences ... document published at month document published at day document published in SciELO at document published in SciELO at year document published in SciELO at month document published in SciELO at day document updated in SciELO at document updated in SciELO at year document updated in SciELO at month document updated in SciELO at day
0 2019-02-10 document scl 0100-879X 0100-879X;1414-431X Brazilian Journal of Medical and Biological Re... Biological Sciences;Health Sciences 0 0 1 ... 08 1998-09-21 1998 9 21 2016-06-30 2016 6 30
1 2019-02-10 document scl 0100-879X 0100-879X;1414-431X Brazilian Journal of Medical and Biological Re... Biological Sciences;Health Sciences 0 0 1 ... 08 1998-09-21 1998 9 21 2016-06-30 2016 6 30

2 rows × 49 columns

Creates a new DataFrame: filtering SPA and discarding those collections that are not analyzable


In [9]:
df = pd.concat([
    pd.merge(df0[df0["collection"] != "spa"], collections,      how="inner", on="collection"),
    pd.merge(df0[df0["collection"] == "spa"], spa_issn_country, how="inner", on="issn"),
])

In [10]:
df.head(2)


Out[10]:
extraction date study unit collection issn ISSN's title title thematic areas title is agricultural sciences title is applied social sciences title is biological sciences ... document published at month document published at day document published in SciELO at document published in SciELO at year document published in SciELO at month document published in SciELO at day document updated in SciELO at document updated in SciELO at year document updated in SciELO at month document updated in SciELO at day
0 2019-02-10 document scl 0100-879X 0100-879X;1414-431X Brazilian Journal of Medical and Biological Re... Biological Sciences;Health Sciences 0 0 1 ... 08 1998-09-21 1998 9 21 2016-06-30 2016 6 30
1 2019-02-10 document scl 0100-879X 0100-879X;1414-431X Brazilian Journal of Medical and Biological Re... Biological Sciences;Health Sciences 0 0 1 ... 08 1998-09-21 1998 9 21 2016-06-30 2016 6 30

2 rows × 49 columns


In [11]:
# compare
df0.shape


Out[11]:
(877068, 49)

In [12]:
df.shape


Out[12]:
(793648, 49)

In [13]:
set(df.collection)


Out[13]:
{'arg',
 'bol',
 'chl',
 'col',
 'cri',
 'cub',
 'esp',
 'mex',
 'per',
 'prt',
 'scl',
 'spa',
 'sza',
 'ury',
 'ven'}

Add pub_year (ate_1996)


In [14]:
df["pub_year"] = np.where(df['year'] <= 1996, 'ate_1996', df["year"])

Convert string to int


In [15]:
df['document published at year'] = pd.to_numeric(df['document published at year'], errors='coerce')
df['document published at month'] = pd.to_numeric(df['document published at month'], errors='coerce')

df['document accepted at year'] = pd.to_numeric(df['document accepted at year'], errors='coerce')
df['document accepted at month'] = pd.to_numeric(df['document accepted at month'], errors='coerce')

df['document submitted at year'] = pd.to_numeric(df['document submitted at year'], errors='coerce')
df['document submitted at month'] = pd.to_numeric(df['document submitted at month'], errors='coerce')

Get the current Year


In [16]:
current_year = datetime.now().year
print(current_year)


2019

Insert columns for checking


In [17]:
df['check_doc_pub_scielo'] = np.where(
    (df['document published in SciELO at year'] >= 1997) & 
    (df['document published in SciELO at year'] <= current_year) & 
    (df['document published in SciELO at month'] >= 1) & 
    (df['document published in SciELO at month'] <= 12) &
    (df['document published in SciELO at day'] >= 1) & 
    (df['document published in SciELO at day'] <= 31), 0,1)

In [18]:
df['check_doc_pub'] = np.where(
    (df['document published at year'] >= 1997) & 
    (df['document published at year'] <= current_year) & 
    (df['document published at month'] >= 1) & 
    (df['document published at month'] <= 12), 0, 1)

In [19]:
df['check_doc_accepted'] = np.where(
    (df['document accepted at year'] >= 1997) & 
    (df['document accepted at year'] <= current_year) & 
    (df['document accepted at month'] >= 1) & 
    (df['document accepted at month'] <= 12), 0, 1)

In [20]:
df['check_doc_submitted'] = np.where(
    (df['document submitted at year'] >= 1997) & 
    (df['document submitted at year'] <= current_year) & 
    (df['document submitted at month'] >= 1) & 
    (df['document submitted at month'] <= 12), 0, 1)

Insert columns with calcule of months


In [21]:
df['meses_sub_aprov'] = np.where(
    (df.check_doc_submitted == 0) & (df.check_doc_accepted == 0),
    (df['document accepted at year'] * 12 + df['document accepted at month']) - 
    (df['document submitted at year'] * 12 + df['document submitted at month']), np.nan)

In [22]:
df['meses_aprov_pub'] = np.where(
    (df.check_doc_accepted == 0) & (df.check_doc_pub == 0),
    (df['document published at year'] * 12 + df['document published at month']) - 
    (df['document accepted at year'] * 12 + df['document accepted at month']), np.nan)

In [23]:
df['meses_sub_pub'] = np.where(
    (df.check_doc_submitted == 0) & (df.check_doc_pub == 0),
    (df['document published at year'] * 12 + df['document published at month']) - 
    (df['document submitted at year'] * 12 + df['document submitted at month']), np.nan)

In [24]:
df['meses_aprov_pub_scielo'] = np.where(
    (df.check_doc_accepted == 0) & (df.check_doc_pub_scielo == 0),
    (df['document published in SciELO at year'] * 12 + df['document published in SciELO at month']) - 
    (df['document accepted at year'] * 12 + df['document accepted at month']), np.nan)

In [25]:
df['meses_sub_pub_scielo'] = np.where(
    (df.check_doc_submitted == 0) & (df.check_doc_pub_scielo == 0),
    (df['document published in SciELO at year'] * 12 + df['document published in SciELO at month']) - 
    (df['document submitted at year'] * 12 + df['document submitted at month']), np.nan)

Filter citables documents


In [26]:
dfcit = df[df['is_citable'] == 1]
dfcit.shape


Out[26]:
(700756, 59)

Pivot Table


In [27]:
values_list = ['meses_sub_aprov', 
               'meses_aprov_pub', 
               'meses_sub_pub', 
               'meses_aprov_pub_scielo', 
               'meses_sub_pub_scielo']

td = dfcit.pivot_table(
     index=["issn"],
     values=values_list,
     columns=["pub_year"],
     aggfunc=[np.nanmean, np.nanstd],
     fill_value="")


/home/ednilson/.virtualenvs/jupyter/local/lib/python3.6/site-packages/pandas/core/groupby/groupby.py:1062: RuntimeWarning: Mean of empty slice
  f = lambda x: func(x, *args, **kwargs)
/home/ednilson/.virtualenvs/jupyter/local/lib/python3.6/site-packages/numpy/lib/nanfunctions.py:1545: RuntimeWarning: Degrees of freedom <= 0 for slice.
  keepdims=keepdims)

In [28]:
td.head(10).T


Out[28]:
issn 0001-3714 0001-3765 0001-6002 0001-6365 0002-0591 0002-192X 0002-7014 0003-2573 0004-0592 0004-0614
pub_year
nanmean meses_aprov_pub 1997
1998 1.46875
1999 1.42553
2000 5.10638 4.22222 6.16667
2001 5.71739 2.15 6.8125
2002 5.16 3.45455 6.95238
2003 4.675 2.5 8.5
2004 5.10976 3.26087 11.7826
2005 6.91667 3.26087 11.9286 10.0357
2006 7.70968 4.56667 16.5484 11.2041
2007 8 3.54839 14.4348 4.96552 6.43396
2008 6.44068 5.8125 14.1791 6.55102 3.55556
2009 7.0274 8.19231 14.4712 5.13636 4.52174 19.5897
2010 8.86022 5.22727 15.1667 -3.86667 7.08571 23.9714
2011 7.57009 4.2 -12.3636 1.71429 16.5366
2012 10.8081 4.83871 -13.6667 3.3913 9.64062
2013 9.86395 4.48276 11.4444 -12.2778 7 9.51429
2014 8.05405 4.76923 10.5385 -16.3 10.0714 5.97368
2015 7.56618 5 5.67857
2016 9.0359 7.2069
2017 6.1032 7.52
2018 7.85502 4 8.64
2019 6 5
ate_1996
meses_aprov_pub_scielo 1997
1998 7.53125
1999 8.06349
2000 7.38298 113.333 68.8333
2001 5.97826 87.6 57.5938
2002 6.46 88.6364 45.1905
... ... ... ... ... ... ... ... ... ... ... ... ...
nanstd meses_sub_pub 2015 4.78488 3.07613 4.72456
2016 4.16104 5.6816
2017 5.85289 4.52124
2018 6.65419 4.2459 4.81929
2019 0 5.93191
ate_1996
meses_sub_pub_scielo 1997
1998 4.33632
1999 6.95057
2000 5.22876 3.65148 4.71298
2001 5.50477 26.8892 3.94382
2002 5.16624 6.13973 4.96792
2003 10.5067 3.61613 4.45897 8.02557
2004 5.55682 3.34867 3.4527 6.61438
2005 4.57727 16.2863 7.49365 8.73957 8.51462
2006 4.34889 3.97157 3.71266 5.87918 4.72289
2007 8.30634 11.3478 4.52797 8.45538 8.26919
2008 6.77534 6.16663 8.64687 6.39515 4.84513 2.38537
2009 4.01793 9.55527 6.93194 10.5727 5.61425 7.02256
2010 5.23306 8.46346 3.4154 8.19241 6.98219 7.57047
2011 6.97633 4.29809 10.1592 4.38748 10.4556
2012 5.7614 4.40957 11.7364 7.95512 7.33939 7.55499
2013 5.911 3.52973 12.6745 6.06956 8.94841 6.97033
2014 4.70691 3.3435 5.00177 0.916515 8.66047 7.82911
2015 4.79202 3.10303 5
2016 4.50642 5.18206
2017 6.79606 4.23585
2018 6.67466 4.37277 4.34953
2019 0 5.93191
ate_1996

240 rows × 10 columns


In [29]:
td.columns.levels


Out[29]:
FrozenList([['nanmean', 'nanstd'], ['meses_aprov_pub', 'meses_aprov_pub_scielo', 'meses_sub_aprov', 'meses_sub_pub', 'meses_sub_pub_scielo'], ['1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', 'ate_1996']])

Renames the labels for CSV


In [30]:
td.keys()
for k in td.keys():
    print(k)


('nanmean', 'meses_aprov_pub', '1997')
('nanmean', 'meses_aprov_pub', '1998')
('nanmean', 'meses_aprov_pub', '1999')
('nanmean', 'meses_aprov_pub', '2000')
('nanmean', 'meses_aprov_pub', '2001')
('nanmean', 'meses_aprov_pub', '2002')
('nanmean', 'meses_aprov_pub', '2003')
('nanmean', 'meses_aprov_pub', '2004')
('nanmean', 'meses_aprov_pub', '2005')
('nanmean', 'meses_aprov_pub', '2006')
('nanmean', 'meses_aprov_pub', '2007')
('nanmean', 'meses_aprov_pub', '2008')
('nanmean', 'meses_aprov_pub', '2009')
('nanmean', 'meses_aprov_pub', '2010')
('nanmean', 'meses_aprov_pub', '2011')
('nanmean', 'meses_aprov_pub', '2012')
('nanmean', 'meses_aprov_pub', '2013')
('nanmean', 'meses_aprov_pub', '2014')
('nanmean', 'meses_aprov_pub', '2015')
('nanmean', 'meses_aprov_pub', '2016')
('nanmean', 'meses_aprov_pub', '2017')
('nanmean', 'meses_aprov_pub', '2018')
('nanmean', 'meses_aprov_pub', '2019')
('nanmean', 'meses_aprov_pub', 'ate_1996')
('nanmean', 'meses_aprov_pub_scielo', '1997')
('nanmean', 'meses_aprov_pub_scielo', '1998')
('nanmean', 'meses_aprov_pub_scielo', '1999')
('nanmean', 'meses_aprov_pub_scielo', '2000')
('nanmean', 'meses_aprov_pub_scielo', '2001')
('nanmean', 'meses_aprov_pub_scielo', '2002')
('nanmean', 'meses_aprov_pub_scielo', '2003')
('nanmean', 'meses_aprov_pub_scielo', '2004')
('nanmean', 'meses_aprov_pub_scielo', '2005')
('nanmean', 'meses_aprov_pub_scielo', '2006')
('nanmean', 'meses_aprov_pub_scielo', '2007')
('nanmean', 'meses_aprov_pub_scielo', '2008')
('nanmean', 'meses_aprov_pub_scielo', '2009')
('nanmean', 'meses_aprov_pub_scielo', '2010')
('nanmean', 'meses_aprov_pub_scielo', '2011')
('nanmean', 'meses_aprov_pub_scielo', '2012')
('nanmean', 'meses_aprov_pub_scielo', '2013')
('nanmean', 'meses_aprov_pub_scielo', '2014')
('nanmean', 'meses_aprov_pub_scielo', '2015')
('nanmean', 'meses_aprov_pub_scielo', '2016')
('nanmean', 'meses_aprov_pub_scielo', '2017')
('nanmean', 'meses_aprov_pub_scielo', '2018')
('nanmean', 'meses_aprov_pub_scielo', '2019')
('nanmean', 'meses_aprov_pub_scielo', 'ate_1996')
('nanmean', 'meses_sub_aprov', '1997')
('nanmean', 'meses_sub_aprov', '1998')
('nanmean', 'meses_sub_aprov', '1999')
('nanmean', 'meses_sub_aprov', '2000')
('nanmean', 'meses_sub_aprov', '2001')
('nanmean', 'meses_sub_aprov', '2002')
('nanmean', 'meses_sub_aprov', '2003')
('nanmean', 'meses_sub_aprov', '2004')
('nanmean', 'meses_sub_aprov', '2005')
('nanmean', 'meses_sub_aprov', '2006')
('nanmean', 'meses_sub_aprov', '2007')
('nanmean', 'meses_sub_aprov', '2008')
('nanmean', 'meses_sub_aprov', '2009')
('nanmean', 'meses_sub_aprov', '2010')
('nanmean', 'meses_sub_aprov', '2011')
('nanmean', 'meses_sub_aprov', '2012')
('nanmean', 'meses_sub_aprov', '2013')
('nanmean', 'meses_sub_aprov', '2014')
('nanmean', 'meses_sub_aprov', '2015')
('nanmean', 'meses_sub_aprov', '2016')
('nanmean', 'meses_sub_aprov', '2017')
('nanmean', 'meses_sub_aprov', '2018')
('nanmean', 'meses_sub_aprov', '2019')
('nanmean', 'meses_sub_aprov', 'ate_1996')
('nanmean', 'meses_sub_pub', '1997')
('nanmean', 'meses_sub_pub', '1998')
('nanmean', 'meses_sub_pub', '1999')
('nanmean', 'meses_sub_pub', '2000')
('nanmean', 'meses_sub_pub', '2001')
('nanmean', 'meses_sub_pub', '2002')
('nanmean', 'meses_sub_pub', '2003')
('nanmean', 'meses_sub_pub', '2004')
('nanmean', 'meses_sub_pub', '2005')
('nanmean', 'meses_sub_pub', '2006')
('nanmean', 'meses_sub_pub', '2007')
('nanmean', 'meses_sub_pub', '2008')
('nanmean', 'meses_sub_pub', '2009')
('nanmean', 'meses_sub_pub', '2010')
('nanmean', 'meses_sub_pub', '2011')
('nanmean', 'meses_sub_pub', '2012')
('nanmean', 'meses_sub_pub', '2013')
('nanmean', 'meses_sub_pub', '2014')
('nanmean', 'meses_sub_pub', '2015')
('nanmean', 'meses_sub_pub', '2016')
('nanmean', 'meses_sub_pub', '2017')
('nanmean', 'meses_sub_pub', '2018')
('nanmean', 'meses_sub_pub', '2019')
('nanmean', 'meses_sub_pub', 'ate_1996')
('nanmean', 'meses_sub_pub_scielo', '1997')
('nanmean', 'meses_sub_pub_scielo', '1998')
('nanmean', 'meses_sub_pub_scielo', '1999')
('nanmean', 'meses_sub_pub_scielo', '2000')
('nanmean', 'meses_sub_pub_scielo', '2001')
('nanmean', 'meses_sub_pub_scielo', '2002')
('nanmean', 'meses_sub_pub_scielo', '2003')
('nanmean', 'meses_sub_pub_scielo', '2004')
('nanmean', 'meses_sub_pub_scielo', '2005')
('nanmean', 'meses_sub_pub_scielo', '2006')
('nanmean', 'meses_sub_pub_scielo', '2007')
('nanmean', 'meses_sub_pub_scielo', '2008')
('nanmean', 'meses_sub_pub_scielo', '2009')
('nanmean', 'meses_sub_pub_scielo', '2010')
('nanmean', 'meses_sub_pub_scielo', '2011')
('nanmean', 'meses_sub_pub_scielo', '2012')
('nanmean', 'meses_sub_pub_scielo', '2013')
('nanmean', 'meses_sub_pub_scielo', '2014')
('nanmean', 'meses_sub_pub_scielo', '2015')
('nanmean', 'meses_sub_pub_scielo', '2016')
('nanmean', 'meses_sub_pub_scielo', '2017')
('nanmean', 'meses_sub_pub_scielo', '2018')
('nanmean', 'meses_sub_pub_scielo', '2019')
('nanmean', 'meses_sub_pub_scielo', 'ate_1996')
('nanstd', 'meses_aprov_pub', '1997')
('nanstd', 'meses_aprov_pub', '1998')
('nanstd', 'meses_aprov_pub', '1999')
('nanstd', 'meses_aprov_pub', '2000')
('nanstd', 'meses_aprov_pub', '2001')
('nanstd', 'meses_aprov_pub', '2002')
('nanstd', 'meses_aprov_pub', '2003')
('nanstd', 'meses_aprov_pub', '2004')
('nanstd', 'meses_aprov_pub', '2005')
('nanstd', 'meses_aprov_pub', '2006')
('nanstd', 'meses_aprov_pub', '2007')
('nanstd', 'meses_aprov_pub', '2008')
('nanstd', 'meses_aprov_pub', '2009')
('nanstd', 'meses_aprov_pub', '2010')
('nanstd', 'meses_aprov_pub', '2011')
('nanstd', 'meses_aprov_pub', '2012')
('nanstd', 'meses_aprov_pub', '2013')
('nanstd', 'meses_aprov_pub', '2014')
('nanstd', 'meses_aprov_pub', '2015')
('nanstd', 'meses_aprov_pub', '2016')
('nanstd', 'meses_aprov_pub', '2017')
('nanstd', 'meses_aprov_pub', '2018')
('nanstd', 'meses_aprov_pub', '2019')
('nanstd', 'meses_aprov_pub', 'ate_1996')
('nanstd', 'meses_aprov_pub_scielo', '1997')
('nanstd', 'meses_aprov_pub_scielo', '1998')
('nanstd', 'meses_aprov_pub_scielo', '1999')
('nanstd', 'meses_aprov_pub_scielo', '2000')
('nanstd', 'meses_aprov_pub_scielo', '2001')
('nanstd', 'meses_aprov_pub_scielo', '2002')
('nanstd', 'meses_aprov_pub_scielo', '2003')
('nanstd', 'meses_aprov_pub_scielo', '2004')
('nanstd', 'meses_aprov_pub_scielo', '2005')
('nanstd', 'meses_aprov_pub_scielo', '2006')
('nanstd', 'meses_aprov_pub_scielo', '2007')
('nanstd', 'meses_aprov_pub_scielo', '2008')
('nanstd', 'meses_aprov_pub_scielo', '2009')
('nanstd', 'meses_aprov_pub_scielo', '2010')
('nanstd', 'meses_aprov_pub_scielo', '2011')
('nanstd', 'meses_aprov_pub_scielo', '2012')
('nanstd', 'meses_aprov_pub_scielo', '2013')
('nanstd', 'meses_aprov_pub_scielo', '2014')
('nanstd', 'meses_aprov_pub_scielo', '2015')
('nanstd', 'meses_aprov_pub_scielo', '2016')
('nanstd', 'meses_aprov_pub_scielo', '2017')
('nanstd', 'meses_aprov_pub_scielo', '2018')
('nanstd', 'meses_aprov_pub_scielo', '2019')
('nanstd', 'meses_aprov_pub_scielo', 'ate_1996')
('nanstd', 'meses_sub_aprov', '1997')
('nanstd', 'meses_sub_aprov', '1998')
('nanstd', 'meses_sub_aprov', '1999')
('nanstd', 'meses_sub_aprov', '2000')
('nanstd', 'meses_sub_aprov', '2001')
('nanstd', 'meses_sub_aprov', '2002')
('nanstd', 'meses_sub_aprov', '2003')
('nanstd', 'meses_sub_aprov', '2004')
('nanstd', 'meses_sub_aprov', '2005')
('nanstd', 'meses_sub_aprov', '2006')
('nanstd', 'meses_sub_aprov', '2007')
('nanstd', 'meses_sub_aprov', '2008')
('nanstd', 'meses_sub_aprov', '2009')
('nanstd', 'meses_sub_aprov', '2010')
('nanstd', 'meses_sub_aprov', '2011')
('nanstd', 'meses_sub_aprov', '2012')
('nanstd', 'meses_sub_aprov', '2013')
('nanstd', 'meses_sub_aprov', '2014')
('nanstd', 'meses_sub_aprov', '2015')
('nanstd', 'meses_sub_aprov', '2016')
('nanstd', 'meses_sub_aprov', '2017')
('nanstd', 'meses_sub_aprov', '2018')
('nanstd', 'meses_sub_aprov', '2019')
('nanstd', 'meses_sub_aprov', 'ate_1996')
('nanstd', 'meses_sub_pub', '1997')
('nanstd', 'meses_sub_pub', '1998')
('nanstd', 'meses_sub_pub', '1999')
('nanstd', 'meses_sub_pub', '2000')
('nanstd', 'meses_sub_pub', '2001')
('nanstd', 'meses_sub_pub', '2002')
('nanstd', 'meses_sub_pub', '2003')
('nanstd', 'meses_sub_pub', '2004')
('nanstd', 'meses_sub_pub', '2005')
('nanstd', 'meses_sub_pub', '2006')
('nanstd', 'meses_sub_pub', '2007')
('nanstd', 'meses_sub_pub', '2008')
('nanstd', 'meses_sub_pub', '2009')
('nanstd', 'meses_sub_pub', '2010')
('nanstd', 'meses_sub_pub', '2011')
('nanstd', 'meses_sub_pub', '2012')
('nanstd', 'meses_sub_pub', '2013')
('nanstd', 'meses_sub_pub', '2014')
('nanstd', 'meses_sub_pub', '2015')
('nanstd', 'meses_sub_pub', '2016')
('nanstd', 'meses_sub_pub', '2017')
('nanstd', 'meses_sub_pub', '2018')
('nanstd', 'meses_sub_pub', '2019')
('nanstd', 'meses_sub_pub', 'ate_1996')
('nanstd', 'meses_sub_pub_scielo', '1997')
('nanstd', 'meses_sub_pub_scielo', '1998')
('nanstd', 'meses_sub_pub_scielo', '1999')
('nanstd', 'meses_sub_pub_scielo', '2000')
('nanstd', 'meses_sub_pub_scielo', '2001')
('nanstd', 'meses_sub_pub_scielo', '2002')
('nanstd', 'meses_sub_pub_scielo', '2003')
('nanstd', 'meses_sub_pub_scielo', '2004')
('nanstd', 'meses_sub_pub_scielo', '2005')
('nanstd', 'meses_sub_pub_scielo', '2006')
('nanstd', 'meses_sub_pub_scielo', '2007')
('nanstd', 'meses_sub_pub_scielo', '2008')
('nanstd', 'meses_sub_pub_scielo', '2009')
('nanstd', 'meses_sub_pub_scielo', '2010')
('nanstd', 'meses_sub_pub_scielo', '2011')
('nanstd', 'meses_sub_pub_scielo', '2012')
('nanstd', 'meses_sub_pub_scielo', '2013')
('nanstd', 'meses_sub_pub_scielo', '2014')
('nanstd', 'meses_sub_pub_scielo', '2015')
('nanstd', 'meses_sub_pub_scielo', '2016')
('nanstd', 'meses_sub_pub_scielo', '2017')
('nanstd', 'meses_sub_pub_scielo', '2018')
('nanstd', 'meses_sub_pub_scielo', '2019')
('nanstd', 'meses_sub_pub_scielo', 'ate_1996')

In [31]:
newlabel = []
for k in td.keys():
    newlabel.append(k[0]
                    .replace('nanmean', 'media')
                    .replace('nanstd', 'desvp')+'_'+k[1]+'_'+k[2])

In [32]:
newlabel


Out[32]:
['media_meses_aprov_pub_1997',
 'media_meses_aprov_pub_1998',
 'media_meses_aprov_pub_1999',
 'media_meses_aprov_pub_2000',
 'media_meses_aprov_pub_2001',
 'media_meses_aprov_pub_2002',
 'media_meses_aprov_pub_2003',
 'media_meses_aprov_pub_2004',
 'media_meses_aprov_pub_2005',
 'media_meses_aprov_pub_2006',
 'media_meses_aprov_pub_2007',
 'media_meses_aprov_pub_2008',
 'media_meses_aprov_pub_2009',
 'media_meses_aprov_pub_2010',
 'media_meses_aprov_pub_2011',
 'media_meses_aprov_pub_2012',
 'media_meses_aprov_pub_2013',
 'media_meses_aprov_pub_2014',
 'media_meses_aprov_pub_2015',
 'media_meses_aprov_pub_2016',
 'media_meses_aprov_pub_2017',
 'media_meses_aprov_pub_2018',
 'media_meses_aprov_pub_2019',
 'media_meses_aprov_pub_ate_1996',
 'media_meses_aprov_pub_scielo_1997',
 'media_meses_aprov_pub_scielo_1998',
 'media_meses_aprov_pub_scielo_1999',
 'media_meses_aprov_pub_scielo_2000',
 'media_meses_aprov_pub_scielo_2001',
 'media_meses_aprov_pub_scielo_2002',
 'media_meses_aprov_pub_scielo_2003',
 'media_meses_aprov_pub_scielo_2004',
 'media_meses_aprov_pub_scielo_2005',
 'media_meses_aprov_pub_scielo_2006',
 'media_meses_aprov_pub_scielo_2007',
 'media_meses_aprov_pub_scielo_2008',
 'media_meses_aprov_pub_scielo_2009',
 'media_meses_aprov_pub_scielo_2010',
 'media_meses_aprov_pub_scielo_2011',
 'media_meses_aprov_pub_scielo_2012',
 'media_meses_aprov_pub_scielo_2013',
 'media_meses_aprov_pub_scielo_2014',
 'media_meses_aprov_pub_scielo_2015',
 'media_meses_aprov_pub_scielo_2016',
 'media_meses_aprov_pub_scielo_2017',
 'media_meses_aprov_pub_scielo_2018',
 'media_meses_aprov_pub_scielo_2019',
 'media_meses_aprov_pub_scielo_ate_1996',
 'media_meses_sub_aprov_1997',
 'media_meses_sub_aprov_1998',
 'media_meses_sub_aprov_1999',
 'media_meses_sub_aprov_2000',
 'media_meses_sub_aprov_2001',
 'media_meses_sub_aprov_2002',
 'media_meses_sub_aprov_2003',
 'media_meses_sub_aprov_2004',
 'media_meses_sub_aprov_2005',
 'media_meses_sub_aprov_2006',
 'media_meses_sub_aprov_2007',
 'media_meses_sub_aprov_2008',
 'media_meses_sub_aprov_2009',
 'media_meses_sub_aprov_2010',
 'media_meses_sub_aprov_2011',
 'media_meses_sub_aprov_2012',
 'media_meses_sub_aprov_2013',
 'media_meses_sub_aprov_2014',
 'media_meses_sub_aprov_2015',
 'media_meses_sub_aprov_2016',
 'media_meses_sub_aprov_2017',
 'media_meses_sub_aprov_2018',
 'media_meses_sub_aprov_2019',
 'media_meses_sub_aprov_ate_1996',
 'media_meses_sub_pub_1997',
 'media_meses_sub_pub_1998',
 'media_meses_sub_pub_1999',
 'media_meses_sub_pub_2000',
 'media_meses_sub_pub_2001',
 'media_meses_sub_pub_2002',
 'media_meses_sub_pub_2003',
 'media_meses_sub_pub_2004',
 'media_meses_sub_pub_2005',
 'media_meses_sub_pub_2006',
 'media_meses_sub_pub_2007',
 'media_meses_sub_pub_2008',
 'media_meses_sub_pub_2009',
 'media_meses_sub_pub_2010',
 'media_meses_sub_pub_2011',
 'media_meses_sub_pub_2012',
 'media_meses_sub_pub_2013',
 'media_meses_sub_pub_2014',
 'media_meses_sub_pub_2015',
 'media_meses_sub_pub_2016',
 'media_meses_sub_pub_2017',
 'media_meses_sub_pub_2018',
 'media_meses_sub_pub_2019',
 'media_meses_sub_pub_ate_1996',
 'media_meses_sub_pub_scielo_1997',
 'media_meses_sub_pub_scielo_1998',
 'media_meses_sub_pub_scielo_1999',
 'media_meses_sub_pub_scielo_2000',
 'media_meses_sub_pub_scielo_2001',
 'media_meses_sub_pub_scielo_2002',
 'media_meses_sub_pub_scielo_2003',
 'media_meses_sub_pub_scielo_2004',
 'media_meses_sub_pub_scielo_2005',
 'media_meses_sub_pub_scielo_2006',
 'media_meses_sub_pub_scielo_2007',
 'media_meses_sub_pub_scielo_2008',
 'media_meses_sub_pub_scielo_2009',
 'media_meses_sub_pub_scielo_2010',
 'media_meses_sub_pub_scielo_2011',
 'media_meses_sub_pub_scielo_2012',
 'media_meses_sub_pub_scielo_2013',
 'media_meses_sub_pub_scielo_2014',
 'media_meses_sub_pub_scielo_2015',
 'media_meses_sub_pub_scielo_2016',
 'media_meses_sub_pub_scielo_2017',
 'media_meses_sub_pub_scielo_2018',
 'media_meses_sub_pub_scielo_2019',
 'media_meses_sub_pub_scielo_ate_1996',
 'desvp_meses_aprov_pub_1997',
 'desvp_meses_aprov_pub_1998',
 'desvp_meses_aprov_pub_1999',
 'desvp_meses_aprov_pub_2000',
 'desvp_meses_aprov_pub_2001',
 'desvp_meses_aprov_pub_2002',
 'desvp_meses_aprov_pub_2003',
 'desvp_meses_aprov_pub_2004',
 'desvp_meses_aprov_pub_2005',
 'desvp_meses_aprov_pub_2006',
 'desvp_meses_aprov_pub_2007',
 'desvp_meses_aprov_pub_2008',
 'desvp_meses_aprov_pub_2009',
 'desvp_meses_aprov_pub_2010',
 'desvp_meses_aprov_pub_2011',
 'desvp_meses_aprov_pub_2012',
 'desvp_meses_aprov_pub_2013',
 'desvp_meses_aprov_pub_2014',
 'desvp_meses_aprov_pub_2015',
 'desvp_meses_aprov_pub_2016',
 'desvp_meses_aprov_pub_2017',
 'desvp_meses_aprov_pub_2018',
 'desvp_meses_aprov_pub_2019',
 'desvp_meses_aprov_pub_ate_1996',
 'desvp_meses_aprov_pub_scielo_1997',
 'desvp_meses_aprov_pub_scielo_1998',
 'desvp_meses_aprov_pub_scielo_1999',
 'desvp_meses_aprov_pub_scielo_2000',
 'desvp_meses_aprov_pub_scielo_2001',
 'desvp_meses_aprov_pub_scielo_2002',
 'desvp_meses_aprov_pub_scielo_2003',
 'desvp_meses_aprov_pub_scielo_2004',
 'desvp_meses_aprov_pub_scielo_2005',
 'desvp_meses_aprov_pub_scielo_2006',
 'desvp_meses_aprov_pub_scielo_2007',
 'desvp_meses_aprov_pub_scielo_2008',
 'desvp_meses_aprov_pub_scielo_2009',
 'desvp_meses_aprov_pub_scielo_2010',
 'desvp_meses_aprov_pub_scielo_2011',
 'desvp_meses_aprov_pub_scielo_2012',
 'desvp_meses_aprov_pub_scielo_2013',
 'desvp_meses_aprov_pub_scielo_2014',
 'desvp_meses_aprov_pub_scielo_2015',
 'desvp_meses_aprov_pub_scielo_2016',
 'desvp_meses_aprov_pub_scielo_2017',
 'desvp_meses_aprov_pub_scielo_2018',
 'desvp_meses_aprov_pub_scielo_2019',
 'desvp_meses_aprov_pub_scielo_ate_1996',
 'desvp_meses_sub_aprov_1997',
 'desvp_meses_sub_aprov_1998',
 'desvp_meses_sub_aprov_1999',
 'desvp_meses_sub_aprov_2000',
 'desvp_meses_sub_aprov_2001',
 'desvp_meses_sub_aprov_2002',
 'desvp_meses_sub_aprov_2003',
 'desvp_meses_sub_aprov_2004',
 'desvp_meses_sub_aprov_2005',
 'desvp_meses_sub_aprov_2006',
 'desvp_meses_sub_aprov_2007',
 'desvp_meses_sub_aprov_2008',
 'desvp_meses_sub_aprov_2009',
 'desvp_meses_sub_aprov_2010',
 'desvp_meses_sub_aprov_2011',
 'desvp_meses_sub_aprov_2012',
 'desvp_meses_sub_aprov_2013',
 'desvp_meses_sub_aprov_2014',
 'desvp_meses_sub_aprov_2015',
 'desvp_meses_sub_aprov_2016',
 'desvp_meses_sub_aprov_2017',
 'desvp_meses_sub_aprov_2018',
 'desvp_meses_sub_aprov_2019',
 'desvp_meses_sub_aprov_ate_1996',
 'desvp_meses_sub_pub_1997',
 'desvp_meses_sub_pub_1998',
 'desvp_meses_sub_pub_1999',
 'desvp_meses_sub_pub_2000',
 'desvp_meses_sub_pub_2001',
 'desvp_meses_sub_pub_2002',
 'desvp_meses_sub_pub_2003',
 'desvp_meses_sub_pub_2004',
 'desvp_meses_sub_pub_2005',
 'desvp_meses_sub_pub_2006',
 'desvp_meses_sub_pub_2007',
 'desvp_meses_sub_pub_2008',
 'desvp_meses_sub_pub_2009',
 'desvp_meses_sub_pub_2010',
 'desvp_meses_sub_pub_2011',
 'desvp_meses_sub_pub_2012',
 'desvp_meses_sub_pub_2013',
 'desvp_meses_sub_pub_2014',
 'desvp_meses_sub_pub_2015',
 'desvp_meses_sub_pub_2016',
 'desvp_meses_sub_pub_2017',
 'desvp_meses_sub_pub_2018',
 'desvp_meses_sub_pub_2019',
 'desvp_meses_sub_pub_ate_1996',
 'desvp_meses_sub_pub_scielo_1997',
 'desvp_meses_sub_pub_scielo_1998',
 'desvp_meses_sub_pub_scielo_1999',
 'desvp_meses_sub_pub_scielo_2000',
 'desvp_meses_sub_pub_scielo_2001',
 'desvp_meses_sub_pub_scielo_2002',
 'desvp_meses_sub_pub_scielo_2003',
 'desvp_meses_sub_pub_scielo_2004',
 'desvp_meses_sub_pub_scielo_2005',
 'desvp_meses_sub_pub_scielo_2006',
 'desvp_meses_sub_pub_scielo_2007',
 'desvp_meses_sub_pub_scielo_2008',
 'desvp_meses_sub_pub_scielo_2009',
 'desvp_meses_sub_pub_scielo_2010',
 'desvp_meses_sub_pub_scielo_2011',
 'desvp_meses_sub_pub_scielo_2012',
 'desvp_meses_sub_pub_scielo_2013',
 'desvp_meses_sub_pub_scielo_2014',
 'desvp_meses_sub_pub_scielo_2015',
 'desvp_meses_sub_pub_scielo_2016',
 'desvp_meses_sub_pub_scielo_2017',
 'desvp_meses_sub_pub_scielo_2018',
 'desvp_meses_sub_pub_scielo_2019',
 'desvp_meses_sub_pub_scielo_ate_1996']

In [33]:
newlabel[0::24]


Out[33]:
['media_meses_aprov_pub_1997',
 'media_meses_aprov_pub_scielo_1997',
 'media_meses_sub_aprov_1997',
 'media_meses_sub_pub_1997',
 'media_meses_sub_pub_scielo_1997',
 'desvp_meses_aprov_pub_1997',
 'desvp_meses_aprov_pub_scielo_1997',
 'desvp_meses_sub_aprov_1997',
 'desvp_meses_sub_pub_1997',
 'desvp_meses_sub_pub_scielo_1997']

In [34]:
td.columns = newlabel

In [35]:
td.T


Out[35]:
issn 0001-3714 0001-3765 0001-6002 0001-6365 0002-0591 0002-192X 0002-7014 0003-2573 0004-0592 0004-0614 ... 2504-3145 2518-4431 2520-9868 2526-8910 2531-0488 2531-1379 2545-7756 2594-1321 2595-3192 2619-6573
media_meses_aprov_pub_1997 ...
media_meses_aprov_pub_1998 1.46875 ...
media_meses_aprov_pub_1999 1.42553 ...
media_meses_aprov_pub_2000 5.10638 4.22222 6.16667 ...
media_meses_aprov_pub_2001 5.71739 2.15 6.8125 ...
media_meses_aprov_pub_2002 5.16 3.45455 6.95238 ...
media_meses_aprov_pub_2003 4.675 2.5 8.5 ...
media_meses_aprov_pub_2004 5.10976 3.26087 11.7826 ...
media_meses_aprov_pub_2005 6.91667 3.26087 11.9286 10.0357 ...
media_meses_aprov_pub_2006 7.70968 4.56667 16.5484 11.2041 ...
media_meses_aprov_pub_2007 8 3.54839 14.4348 4.96552 6.43396 ...
media_meses_aprov_pub_2008 6.44068 5.8125 14.1791 6.55102 3.55556 ...
media_meses_aprov_pub_2009 7.0274 8.19231 14.4712 5.13636 4.52174 19.5897 ...
media_meses_aprov_pub_2010 8.86022 5.22727 15.1667 -3.86667 7.08571 23.9714 ...
media_meses_aprov_pub_2011 7.57009 4.2 -12.3636 1.71429 16.5366 ...
media_meses_aprov_pub_2012 10.8081 4.83871 -13.6667 3.3913 9.64062 ...
media_meses_aprov_pub_2013 9.86395 4.48276 11.4444 -12.2778 7 9.51429 ...
media_meses_aprov_pub_2014 8.05405 4.76923 10.5385 -16.3 10.0714 5.97368 ...
media_meses_aprov_pub_2015 7.56618 5 5.67857 ...
media_meses_aprov_pub_2016 9.0359 7.2069 ... -1.33333
media_meses_aprov_pub_2017 6.1032 7.52 ... 0.625 3.11765
media_meses_aprov_pub_2018 7.85502 4 8.64 ... 3.5614 10.85 6.76364 1.4 2.375 2.10606 4.3
media_meses_aprov_pub_2019 6 5 ...
media_meses_aprov_pub_ate_1996 ...
media_meses_aprov_pub_scielo_1997 ...
media_meses_aprov_pub_scielo_1998 7.53125 ...
media_meses_aprov_pub_scielo_1999 8.06349 ...
media_meses_aprov_pub_scielo_2000 7.38298 113.333 68.8333 ...
media_meses_aprov_pub_scielo_2001 5.97826 87.6 57.5938 ...
media_meses_aprov_pub_scielo_2002 6.46 88.6364 45.1905 ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
desvp_meses_sub_pub_2015 4.78488 3.07613 4.72456 ...
desvp_meses_sub_pub_2016 4.16104 5.6816 ... 12.8798
desvp_meses_sub_pub_2017 5.85289 4.52124 ... 6.21993 2.10782
desvp_meses_sub_pub_2018 6.65419 4.2459 4.81929 ... 4.62287 3.8092 5.84019 3.61109 3.80583 3.05794 4.47661
desvp_meses_sub_pub_2019 0 5.93191 ...
desvp_meses_sub_pub_ate_1996 ...
desvp_meses_sub_pub_scielo_1997 ...
desvp_meses_sub_pub_scielo_1998 4.33632 ...
desvp_meses_sub_pub_scielo_1999 6.95057 ...
desvp_meses_sub_pub_scielo_2000 5.22876 3.65148 4.71298 ...
desvp_meses_sub_pub_scielo_2001 5.50477 26.8892 3.94382 ...
desvp_meses_sub_pub_scielo_2002 5.16624 6.13973 4.96792 ...
desvp_meses_sub_pub_scielo_2003 10.5067 3.61613 4.45897 8.02557 ...
desvp_meses_sub_pub_scielo_2004 5.55682 3.34867 3.4527 6.61438 ...
desvp_meses_sub_pub_scielo_2005 4.57727 16.2863 7.49365 8.73957 8.51462 ...
desvp_meses_sub_pub_scielo_2006 4.34889 3.97157 3.71266 5.87918 4.72289 ...
desvp_meses_sub_pub_scielo_2007 8.30634 11.3478 4.52797 8.45538 8.26919 ...
desvp_meses_sub_pub_scielo_2008 6.77534 6.16663 8.64687 6.39515 4.84513 2.38537 ...
desvp_meses_sub_pub_scielo_2009 4.01793 9.55527 6.93194 10.5727 5.61425 7.02256 ...
desvp_meses_sub_pub_scielo_2010 5.23306 8.46346 3.4154 8.19241 6.98219 7.57047 ...
desvp_meses_sub_pub_scielo_2011 6.97633 4.29809 10.1592 4.38748 10.4556 ...
desvp_meses_sub_pub_scielo_2012 5.7614 4.40957 11.7364 7.95512 7.33939 7.55499 ...
desvp_meses_sub_pub_scielo_2013 5.911 3.52973 12.6745 6.06956 8.94841 6.97033 ...
desvp_meses_sub_pub_scielo_2014 4.70691 3.3435 5.00177 0.916515 8.66047 7.82911 ... 4.86438
desvp_meses_sub_pub_scielo_2015 4.79202 3.10303 5 ... 4.43471
desvp_meses_sub_pub_scielo_2016 4.50642 5.18206 ... 2.60396 10.435
desvp_meses_sub_pub_scielo_2017 6.79606 4.23585 ... 11.5684 0.992157 5.2054 6.21993 3.97222
desvp_meses_sub_pub_scielo_2018 6.67466 4.37277 4.34953 ... 4.69798 5.20707 4.30806 5.8585 3.61109 3.80583 2.95404 4.47661
desvp_meses_sub_pub_scielo_2019 0 5.93191 ...
desvp_meses_sub_pub_scielo_ate_1996 ...

240 rows × 1432 columns


In [36]:
td.to_csv("output/td_documents_dates_network.csv")

In [37]:
print(f"Notebook processing duration: {datetime.utcnow() - start}")


Notebook processing duration: 0:00:34.852477

Check a specific document


In [38]:
b = df[df['docs'].str.contains('S0100-40421998000500015')]

In [40]:
b[['document submitted at year',
   'document accepted at year',
   'document published at year',
   'document published in SciELO at year']].astype(int)


Out[40]:
document submitted at year document accepted at year document published at year document published in SciELO at year
9768 1967 1998 1998 2001

In [ ]: