In [1]:
import sys
sys.path.append("../scripts/")

Load original profile


In [2]:
import json
from data_collection_util import *

In [17]:
# load original profile
with open("../profile/profile.json") as f:
    orig_profile = json.load(f)

In [18]:
import codecs
with codec.open("../profile/profile2.json", "w") as f:
    json.dump(f)

In [3]:
# load original profile
with open("../profile/profile.json") as f:
    orig_profile = json.load(f)

In original profile, each gear member has the following arrtibutes:

  • website
  • gear_collaborators
  • pos_x
  • pos_y
  • surname
  • name
  • title
  • photo
  • other_collaborators
  • member_type
  • short_bio
  • mathsci_id
  • cluster_id
  • member_id
  • organization
  • research_interests

A sample member profile looks like this:


In [4]:
sample_profile = {u'cluster_id': 0,
 u'gear_collaborators': [],
 u'mathsci_id': u'MR304864',
 u'member_id': 12,
 u'member_type': u'member',
 u'name': u'Steven',
 u'organization': u'University of Illinois at Urbana-Champaign',
 u'other_collaborators': u'Indranil Biswas, Jim Glazebrook, Tomas Gomez, Adam Jacob, Franz Kamber, Vincent Mercat, Vicente Munoz, Peter Newstead, Mathias Stemmler',
 u'photo': u'BradlowSteven.jpg',
 u'pos_x': 0,
 u'pos_y': 0,
 u'research_interests': u'Higgs Bundles',
 u'short_bio': u"I'm interested in moduli spaces associated with holomorphic vector bundles. In particular, I'm a big fan of applications of Higgs bundle technology to the study of surface group representation varieties. Before I die, I'd like to be able to compute the surface group representation corresponding to any given Higgs bundle, and vice versa.",
 u'surname': u'Bradlow',
 u'title': u'GEAR Member',
 u'website': u''}

Build mappers for id and mathscinet id

In this step, we build mapping between gear_id and mathscinet_id. For example, given a gear member id, gear_mathsci_mapper will return a mathscinet id


In [5]:
mappers = make_mappers(orig_profile)
gear_mathsci_mapper = mappers[0]
mathsci_gear_mapper = mappers[1]

Download paper list for each member with mathsci_id

In this step, the program iterates through all members. If a member has valid mathscinet id, then we retrieve the paper list of that member.


In [6]:
paper_set = download_full_paper_set(orig_profile)


0%                                                                  100%
[                                                                      ]/Users/erichsu/anaconda/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))
[######################################################################] | ETA: 00:00:00
Total time elapsed: 00:04:14

paper_set has the following structure:

- 'member 0': paper a, paper b
- 'member 1': paper b, paper c, paper d
- 'member 2': paper c

for each paper, the structure is as follows:

  • date
  • url
  • id
  • description
  • authors

A sample paper looks like this:


In [7]:
sp={'authors': ['MR367870', 'MR1001390'],
   'date': 2012,
   'description': u'Conner, Gregory R. ; Kent, Curtis Inverse limits of finite rank free groups. J. Group Theory 15 (2012), no. 6, 823\u2013829. (Reviewer: David Meier) 20E05 (20E18)',
   'id': u'MR2997025',
   'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2997025'}

In our paper_set, Professor Bradlow has the following papers:


In [8]:
paper_set['MR304864']


Out[8]:
[{'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2015,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Gothen, Peter B. Higgs bundles for the non-compact dual of the special orthogonal group. Geom. Dedicata 175 (2015), 1\u201348. (Reviewer: Peter G. Dalakov) 14H60 (53C07 58D29)',
  'id': u'MR3323627',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR3323627'},
 {'authors': ['MR340073', 'MR304864', 'MR855479', 'MR976054'],
  'date': 2014,
  'description': u'Biswas, Indranil ; Bradlow, Steven B. ; Jacob, Adam ; Stemmler, Matthias Automorphisms and connections on Higgs bundles over compact K\xe4hler manifolds. Differential Geom. Appl. 32 (2014), 139\u2013152. (Reviewer: Antonella Nannicini) 53C07 (32L05 32Q15)',
  'id': u'MR3147201',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR3147201'},
 {'authors': ['MR340073', 'MR304864', 'MR855479', 'MR976054'],
  'date': 2013,
  'description': u'Biswas, Indranil ; Bradlow, Steven B. ; Jacob, Adam ; Stemmler, Matthias Approximate Hermitian-Einstein connections on principal bundles over a compact Riemann surface. Ann. Global Anal. Geom. 44 (2013), no. 3, 257\u2013268. (Reviewer: Antonella Nannicini) 53C07 (32L05)',
  'id': u'MR3101858',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR3101858'},
 {'authors': ['MR304864', 'MR339422', 'MR625886', 'MR348860'],
  'date': 2012,
  'description': u"Bradlow, Steven ; Ebenfelt, Peter ; Tyson, Jeremy T. ; Varolin, Dror Foreword [In honor of the 60th birthday of John D'Angelo]. Illinois J. Math. 56 (2012), no. 1, iii\u2013iv (2013). 32-06",
  'id': u'MR3117012',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR3117012'},
 {'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2012,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Gothen, Peter B. Deformations of maximal representations in ${\\rm Sp}(4,\\Bbb R)$ {\\rm Sp}(4,\\Bbb R) . Q. J. Math. 63 (2012), no. 4, 795\u2013843. (Reviewer: Matthias Stemmler) 14D20 (14F45 14H60)',
  'id': u'MR2999985',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2999985'},
 {'authors': ['MR304864', 'MR794406'],
  'date': 2012,
  'description': u'Bradlow, Steven B. ; Wilkin, Graeme Morse theory, Higgs fields, and Yang-Mills-Higgs functionals. J. Fixed Point Theory Appl. 11 (2012), no. 1, 1\u201341. (Reviewer: Addolorata Salvatore) 58E15',
  'id': u'MR2955005',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2955005'},
 {'authors': ['MR304864', 'MR341742', 'MR642809', 'MR602264', 'MR197503'],
  'date': 2009,
  'description': u'Bradlow, S. B. ; Garc\xeda-Prada, O. ; Mercat, V. ; Mu\xf1oz, V. ; Newstead, P. E. Moduli spaces of coherent systems of small slope on algebraic curves. Comm. Algebra 37 (2009), no. 8, 2649\u20132678. (Reviewer: H. Lange) 14H60 (14D20 14H51)',
  'id': u'MR2543511 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2543511 '},
 {'authors': ['MR304864'],
  'date': 2009,
  'description': u'Bradlow, S. B. Coherent systems: a brief survey. With an appendix by H. Lange. London Math. Soc. Lecture Note Ser., 359, Moduli spaces and vector bundles, 229\u2013264, Cambridge Univ. Press, Cambridge, 2009. (Reviewer: Swaminathan Subramanian) 14D20',
  'id': u'MR2537071 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2537071 '},
 {'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2008,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Gothen, Peter B. Homotopy groups of moduli spaces of representations. Topology 47 (2008), no. 4, 203\u2013224. (Reviewer: Graeme Wilkin) 53D30 (53C07 53C26 55Q52 58D27 58E05)',
  'id': u'MR2416769 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2416769 '},
 {'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2007,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Gothen, Peter B. What is $\\dots$ \\dots a Higgs bundle? Notices Amer. Math. Soc. 54 (2007), no. 8, 980\u2013981. (Reviewer: Graeme Wilkin) 53C07 (14D21)',
  'id': u'MR2343296 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2343296 '},
 {'authors': ['MR304864', 'MR341742', 'MR642809', 'MR602264', 'MR197503'],
  'date': 2007,
  'description': u'Bradlow, S. B. ; Garc\xeda-Prada, O. ; Mercat, V. ; Mu\xf1oz, V. ; Newstead, P. E. On the geometry of moduli spaces of coherent systems on algebraic curves. Internat. J. Math. 18 (2007), no. 4, 411\u2013453. 14H60 (14D20 14H51)',
  'id': u'MR2325354 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2325354 '},
 {'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2006,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Gothen, Peter B. Maximal surface group representations in isometry groups of classical Hermitian symmetric spaces. Geom. Dedicata 122 (2006), 185\u2013213. (Reviewer: Anna Wienhard) 14D20 (53D30 58D29)',
  'id': u'MR2295550 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2295550 '},
 {'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2004,
  'description': u'Bradlow, S. B. ; Garc\xeda-Prada, O. ; Gothen, P. B. Representations of surface groups in the general linear group. Proceedings of the XII Fall Workshop on Geometry and Physics, 83\u201394, Publ. R. Soc. Mat. Esp., 7, R. Soc. Mat. Esp., Madrid, 2004. (Reviewer: Usha N. Bhosle) 14D20 (14H30)',
  'id': u'MR2123494 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2123494 '},
 {'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2004,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Gothen, Peter B. Moduli spaces of holomorphic triples over compact Riemann surfaces. Math. Ann. 328 (2004), no. 1-2, 299\u2013351. (Reviewer: Zhenbo Qin) 32G13 (14D20 14D21)',
  'id': u'MR2030379 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2030379 '},
 {'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2003,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Gothen, Peter B. Surface group representations and ${\\rm U}(p,q)$ {\\rm U}(p,q) -Higgs bundles. J. Differential Geom. 64 (2003), no. 1, 111\u2013170. (Reviewer: Ignasi Mundet-Riera) 53D30 (57R19)',
  'id': u'MR2015045 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2015045 '},
 {'authors': ['MR304864', 'MR341742', 'MR602264', 'MR197503'],
  'date': 2003,
  'description': u'Bradlow, S. B. ; Garc\xeda-Prada, O. ; Mu\xf1oz, V. ; Newstead, P. E. Coherent systems and Brill-Noether theory. Internat. J. Math. 14 (2003), no. 7, 683\u2013733. (Reviewer: Montserrat Teixidor i Bigas) 14H60 (14D20 14H51)',
  'id': u'MR2001263 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR2001263 '},
 {'authors': ['MR304864', 'MR341742', 'MR642261'],
  'date': 2003,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Mundet i Riera, Ignasi Relative Hitchin-Kobayashi correspondences for principal pairs. Q. J. Math. 54 (2003), no. 2, 171\u2013208. (Reviewer: Adrian Langer) 53C07 (32Q15 58E15)',
  'id': u'MR1989871 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1989871 '},
 {'authors': ['MR304864', 'MR671318'],
  'date': 2002,
  'description': u'Bradlow, Steven B. ; G\xf3mez, Tom\xe1s L. Extensions of Higgs bundles. Illinois J. Math. 46 (2002), no. 2, 587\u2013625. (Reviewer: Adrian Langer) 32L05 (14D20 32G13 32Q20 53C07)',
  'id': u'MR1936939 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1936939 '},
 {'authors': ['MR304864', 'MR341742'],
  'date': 2002,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar An application of coherent systems to a Brill-Noether problem. J. Reine Angew. Math. 551 (2002), 123\u2013143. (Reviewer: Tohru Nakashima) 14H51 (14H60)',
  'id': u'MR1932176 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1932176 '},
 {'authors': ['MR304864', 'MR341742', 'MR355366'],
  'date': 2001,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar ; Gothen, Peter B. Representations of the fundamental group of a surface in ${\\rm PU}(p,q)$ {\\rm PU}(p,q) and holomorphic triples. C. R. Acad. Sci. Paris S\xe9r. I Math. 333 (2001), no. 4, 347\u2013352. (Reviewer: Usha N. Bhosle) 14D20 (14H60 32G13 53C07)',
  'id': u'MR1854778 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1854778 '},
 {'authors': ['MR304864', 'MR97445', 'MR74195'],
  'date': 2000,
  'description': u'Bradlow, Steven B. ; Kamber, Franz W. ; Glazebrook, James F. The Hitchin-Kobayashi correspondence for twisted triples. Internat. J. Math. 11 (2000), no. 4, 493\u2013508. 32L05 (32Q20 53C07 53C25)',
  'id': u'MR1768170 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1768170 '},
 {'authors': ['MR304864', 'MR341742'],
  'date': -1,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar A Hitchin-Kobayashi correspondence for coherent systems on Riemann surfaces. J. London Math. Soc. (2) 60 (1999), no. 1, 155\u2013170. (Reviewer: Nyshadham Raghavendra) 32L05 (14D20 32Q20)',
  'id': u'MR1721822 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1721822 '},
 {'authors': ['MR304864', 'MR74195', 'MR97445'],
  'date': -1,
  'description': u'Bradlow, Steven B. ; Glazebrook, James F. ; Kamber, Franz W. Reduction of the Hermitian-Einstein equation on K\xe4hlerian fiber bundles. Tohoku Math. J. (2) 51 (1999), no. 1, 81\u2013123. 32Q20 (32L05 53C25 53C55)',
  'id': u'MR1671747 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1671747 '},
 {'authors': ['MR304864', 'MR74195', 'MR97445'],
  'date': -1,
  'description': u'Bradlow, Steven ; Glazebrook, James ; Kamber, Franz A new look at the vortex equations and dimensional reduction. Geometry, topology and physics (Campinas, 1996), 83\u2013106, de Gruyter, Berlin, 1997. (Reviewer: Manfred Lehn) 32L07 (53C07 53C55)',
  'id': u'MR1605208 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1605208 '},
 {'authors': ['MR304864', 'MR341742'],
  'date': -1,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar Non-abelian monopoles and vortices. Geometry and physics (Aarhus, 1995), 567\u2013589, Lecture Notes in Pure and Appl. Math., 184, Dekker, New York, 1997. (Reviewer: Daniel Huybrechts) 53C07 (32L07 57R57 58D27)',
  'id': u'MR1423193 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1423193 '},
 {'authors': ['MR304864', 'MR313609', 'MR309514'],
  'date': -1,
  'description': u'Bradlow, Steven B. ; Daskalopoulos, Georgios D. ; Wentworth, Richard A. Birational equivalences of vortex moduli. Topology 35 (1996), no. 3, 731\u2013748. (Reviewer: Nicholas Buchdahl) 32G13 (32L07 32L10 53C07 58D27)',
  'id': u'MR1396775 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1396775 '},
 {'authors': ['MR304864', 'MR341742'],
  'date': -1,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar Stable triples, equivariant bundles and dimensional reduction. Math. Ann. 304 (1996), no. 2, 225\u2013252. (Reviewer: Daniel Huybrechts) 32L07 (32G13 53C07)',
  'id': u'MR1371765 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1371765 '},
 {'authors': ['MR304864', 'MR341742'],
  'date': -1,
  'description': u'Bradlow, Steven B. ; Garc\xeda-Prada, Oscar Higher cohomology triples and holomorphic extensions. Comm. Anal. Geom. 3 (1995), no. 3-4, 421\u2013463. (Reviewer: Nicholas Buchdahl) 32L07 (14J60 32L10 53C07 58D27)',
  'id': u'MR1371205 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1371205 '},
 {'authors': ['MR304864'],
  'date': -1,
  'description': u'Bradlow, Steven B. Hermitian-Einstein inequalities and Harder-Narasimhan filtrations. Internat. J. Math. 6 (1995), no. 5, 645\u2013656. (Reviewer: Daniel Huybrechts) 32L07 (53C07)',
  'id': u'MR1351159 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1351159 '},
 {'authors': ['MR304864', 'MR313609', 'MR341742', 'MR309514'],
  'date': -1,
  'description': u'Bradlow, Steven ; Daskalopoulos, Georgios D. ; Garc\xeda-Prada, Oscar ; Wentworth, Richard Stable augmented bundles over Riemann surfaces. Vector bundles in algebraic geometry (Durham, 1993), 15\u201367, London Math. Soc. Lecture Note Ser., 208, Cambridge Univ. Press, Cambridge, 1995. (Reviewer: Daniel Huybrechts) 14H60 (14D20 32L07 53C07)',
  'id': u'MR1338412 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1338412 '},
 {'authors': ['MR304864', 'MR313609'],
  'date': -1,
  'description': u'Bradlow, Steven ; Daskalopoulos, Georgios D. Moduli of stable pairs for holomorphic bundles over Riemann surfaces. II. Internat. J. Math. 4 (1993), no. 6, 903\u2013925. (Reviewer: Oscar Garc\xeda-Prada) 32G13 (14D20 14H55 32L07 53C07 58D27 58E15)',
  'id': u'MR1250254 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1250254 '},
 {'authors': ['MR304864'],
  'date': -1,
  'description': u'Bradlow, Steven B. Nonabelian vortices and a new Yang-Mills-Higgs energy. Differential geometry: geometry in mathematical physics and related topics (Los Angeles, CA, 1990), 85\u201398, Proc. Sympos. Pure Math., 54, Part 2, Amer. Math. Soc., Providence, RI, 1993. (Reviewer: J. S. Joel) 32G13 (32L07 53C07 58E15)',
  'id': u'MR1216530 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1216530 '},
 {'authors': ['MR304864', 'MR313609'],
  'date': -1,
  'description': u'Bradlow, Steven B. ; Daskalopoulos, Georgios D. Moduli of stable pairs for holomorphic bundles over Riemann surfaces. Internat. J. Math. 2 (1991), no. 5, 477\u2013513. (Reviewer: N. J. Hitchin) 58D27 (14D20 14H60 32G13 58E15)',
  'id': u'MR1124279 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1124279 '},
 {'authors': ['MR304864'],
  'date': -1,
  'description': u'Bradlow, Steven B. Special metrics and stability for holomorphic bundles with global sections. J. Differential Geom. 33 (1991), no. 1, 169\u2013213. (Reviewer: N. J. Hitchin) 32L07 (53C80)',
  'id': u'MR1085139 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1085139 '},
 {'authors': ['MR304864'],
  'date': -1,
  'description': u'Bradlow, Steven B. Vortices in holomorphic line bundles over closed K\xe4hler manifolds. Comm. Math. Phys. 135 (1990), no. 1, 1\u201317. (Reviewer: Chun-Li Shen) 32L10 (32G13 53C55 58E15)',
  'id': u'MR1086749 ',
  'url': u'http://www.ams.org/mathscinet/search/publdoc.html?pg1=MR&s1=MR1086749 '}]

Keep in mind that we only look at papers published after 2011. Hence, we filter the papers and get paper_set_2011


In [9]:
paper_2011_meta = filter_2011(paper_set)

paper_set_2011 = paper_2011_meta[0]
count_2011 = paper_2011_meta[1]


Completed! 1576 papers are published after 2011 (including 2011). 

For co-citation papers, we need to know what papers are citing papers in paper_set_2011.

This process may take 10 minutes, depending on network.


In [10]:
# download papers citing gear member papers
download_gear_papers(paper_set_2011, count_2011)


0%                                                                  100%
[######################################################################] | ETA: 00:00:00
Total time elapsed: 00:12:12

Matrix generation

We have one function update_collaborators that updates the co-authorship relation and the other function update_citations that updates the co-citation relation.


In [11]:
# update coauthorship/cocitation data
full_paper_list = []
useful_paper = set()

for ending_year in range(2011, 2017):
    update_collaborators(orig_profile, paper_set_2011, 2011, ending_year, mathsci_gear_mapper, useful_paper)
    update_citations(orig_profile, paper_set_2011, 2011, ending_year, mathsci_gear_mapper, full_paper_list, useful_paper)

In [12]:
len(useful_paper)


Out[12]:
700

These two functions will add additional data fields to authors.

Let's look at Professor Bradlow's profile again:


In [13]:
orig_profile['items'][12]


Out[13]:
{'2011-2011 citation details': {},
 '2011-2011 citation sizes': {},
 '2011-2011 collaborators details': {},
 '2011-2011 collaborators sizes': {},
 '2011-2012 citation details': {24: [u'MR2955005'],
  43: [u'MR3385630', u'MR3323627', u'MR3451467', u'MR2955005', u'MR3347573'],
  49: [u'MR3385630', u'MR3323627', u'MR3451467', u'MR2955005', u'MR3347573'],
  51: [u'MR3385630', u'MR3347573'],
  66: [u'MR3385630'],
  89: [u'MR3323627', u'MR2955005'],
  118: [u'MR2955005'],
  119: [u'MR3385630', u'MR3347573'],
  120: [u'MR2955005']},
 '2011-2012 citation sizes': {24: 1,
  43: 5,
  49: 5,
  51: 2,
  66: 1,
  89: 2,
  118: 1,
  119: 2,
  120: 1},
 '2011-2012 collaborators details': {43: [u'MR2999985'],
  49: [u'MR2999985'],
  120: [u'MR2955005']},
 '2011-2012 collaborators sizes': {43: 1, 49: 1, 120: 1},
 '2011-2013 citation details': {24: [u'MR2955005'],
  43: [u'MR3385630', u'MR3323627', u'MR3451467', u'MR2955005', u'MR3347573'],
  49: [u'MR3385630', u'MR2955005', u'MR3451467', u'MR3323627', u'MR3347573'],
  51: [u'MR3385630', u'MR3347573'],
  66: [u'MR3385630'],
  87: [u'MR3451467', u'MR3323627', u'MR3347573'],
  89: [u'MR3323627', u'MR2955005'],
  118: [u'MR2955005'],
  119: [u'MR3385630', u'MR3347573'],
  120: [u'MR2955005']},
 '2011-2013 citation sizes': {24: 1,
  43: 5,
  49: 5,
  51: 2,
  66: 1,
  87: 3,
  89: 2,
  118: 1,
  119: 2,
  120: 1},
 '2011-2013 collaborators details': {43: [u'MR2999985'],
  49: [u'MR2999985'],
  120: [u'MR2955005']},
 '2011-2013 collaborators sizes': {43: 1, 49: 1, 120: 1},
 '2011-2014 citation details': {24: [u'MR2955005'],
  43: [u'MR3385630', u'MR3323627', u'MR3451467', u'MR2955005', u'MR3347573'],
  49: [u'MR3385630', u'MR2955005', u'MR3451467', u'MR3323627', u'MR3347573'],
  51: [u'MR3385630', u'MR3347573'],
  66: [u'MR3385630'],
  70: [u'MR3263936'],
  87: [u'MR3451467', u'MR3323627', u'MR3347573'],
  89: [u'MR3323627', u'MR2955005'],
  103: [u'MR3385630'],
  118: [u'MR2955005'],
  119: [u'MR3385630', u'MR3347573'],
  120: [u'MR2955005'],
  254: [u'MR3323627']},
 '2011-2014 citation sizes': {24: 1,
  43: 5,
  49: 5,
  51: 2,
  66: 1,
  70: 1,
  87: 3,
  89: 2,
  103: 1,
  118: 1,
  119: 2,
  120: 1,
  254: 1},
 '2011-2014 collaborators details': {43: [u'MR2999985'],
  49: [u'MR2999985'],
  120: [u'MR2955005']},
 '2011-2014 collaborators sizes': {43: 1, 49: 1, 120: 1},
 '2011-2015 citation details': {24: [u'MR2955005'],
  43: [u'MR3385630', u'MR3323627', u'MR3451467', u'MR2955005', u'MR3347573'],
  49: [u'MR3385630', u'MR2955005', u'MR3451467', u'MR3323627', u'MR3347573'],
  51: [u'MR3385630', u'MR3347573'],
  66: [u'MR3385630'],
  70: [u'MR3263936'],
  87: [u'MR3451467', u'MR3323627', u'MR3347573'],
  89: [u'MR3323627', u'MR2955005'],
  103: [u'MR3385630'],
  118: [u'MR2955005'],
  119: [u'MR3385630', u'MR3347573'],
  120: [u'MR2955005'],
  254: [u'MR3323627']},
 '2011-2015 citation sizes': {24: 1,
  43: 5,
  49: 5,
  51: 2,
  66: 1,
  70: 1,
  87: 3,
  89: 2,
  103: 1,
  118: 1,
  119: 2,
  120: 1,
  254: 1},
 '2011-2015 collaborators details': {43: [u'MR3323627', u'MR2999985'],
  49: [u'MR3323627', u'MR2999985'],
  120: [u'MR2955005']},
 '2011-2015 collaborators sizes': {43: 2, 49: 2, 120: 1},
 '2011-2016 citation details': {24: [u'MR2955005'],
  43: [u'MR3385630', u'MR3323627', u'MR3451467', u'MR2955005', u'MR3347573'],
  49: [u'MR3385630', u'MR2955005', u'MR3451467', u'MR3323627', u'MR3347573'],
  51: [u'MR3385630', u'MR3347573'],
  66: [u'MR3385630'],
  70: [u'MR3263936'],
  87: [u'MR3451467', u'MR3323627', u'MR3347573'],
  89: [u'MR3323627', u'MR2955005'],
  103: [u'MR3385630'],
  118: [u'MR2955005'],
  119: [u'MR3385630', u'MR3347573'],
  120: [u'MR2955005'],
  254: [u'MR3323627']},
 '2011-2016 citation sizes': {24: 1,
  43: 5,
  49: 5,
  51: 2,
  66: 1,
  70: 1,
  87: 3,
  89: 2,
  103: 1,
  118: 1,
  119: 2,
  120: 1,
  254: 1},
 '2011-2016 collaborators details': {43: [u'MR3323627', u'MR2999985'],
  49: [u'MR3323627', u'MR2999985'],
  120: [u'MR2955005']},
 '2011-2016 collaborators sizes': {43: 2, 49: 2, 120: 1},
 u'cluster_id': 0,
 u'gear_collaborators': [],
 u'mathsci_id': u'MR304864',
 u'member_id': 12,
 u'member_type': u'member',
 u'name': u'Steven',
 u'organization': u'University of Illinois at Urbana-Champaign',
 u'other_collaborators': u'Indranil Biswas, Jim Glazebrook, Tomas Gomez, Adam Jacob, Franz Kamber, Vincent Mercat, Vicente Munoz, Peter Newstead, Mathias Stemmler',
 u'photo': u'BradlowSteven.jpg',
 u'pos_x': 0,
 u'pos_y': 0,
 u'research_interests': u'Higgs Bundles',
 u'short_bio': u"I'm interested in moduli spaces associated with holomorphic vector bundles. In particular, I'm a big fan of applications of Higgs bundle technology to the study of surface group representation varieties. Before I die, I'd like to be able to compute the surface group representation corresponding to any given Higgs bundle, and vice versa.",
 u'surname': u'Bradlow',
 u'title': u'GEAR Member',
 u'website': u''}

Co-author

For co-author, we look at two types of fields:

'2011-2015 collaborators details': 43: [u'MR3323627', u'MR2999985'], 49: [u'MR3323627', u'MR2999985']
'2011-2015 collaborators sizes': 43: 2, 49: 2

It means that, Professor Bradlow (member id 12), has co-authored 2 papers (with paper id 'MR3323627' and 'MR2999985') with Member 43, and 2 papers (with paper id 'MR3323627' and 'MR2999985') with Member 49.

For co-citation, the idea is the similar

Matrix export

We then output the matrix to files


In [14]:
# print matrix 
for ending_year in range(2011, 2017):
    matrix_maker(orig_profile, 2011, ending_year)


../gephi_input/2011_2011_citation.csv
../gephi_input/2011_2011_coauthor.csv
../gephi_input/2011_2012_citation.csv
../gephi_input/2011_2012_coauthor.csv
../gephi_input/2011_2013_citation.csv
../gephi_input/2011_2013_coauthor.csv
../gephi_input/2011_2014_citation.csv
../gephi_input/2011_2014_coauthor.csv
../gephi_input/2011_2015_citation.csv
../gephi_input/2011_2015_coauthor.csv
../gephi_input/2011_2016_citation.csv
../gephi_input/2011_2016_coauthor.csv

In [15]:
import codecs

def export_paper(the_paper_list): 
    print "Exporting papers ..."
    output_path = os.path.join( '..', 'website_input', 'papers.json') 
    export = {} 
    for p in the_paper_list: 
        export[p['id']] = p 
    with codecs.open(output_path, "w", 'utf-8') as f: 
        json.dump(export, f, indent=4, separators=(',', ': '), ensure_ascii = False)

In [16]:
len(full_paper_list)


Out[16]:
19456080

In [ ]:


In [ ]:
export_paper(full_paper_list)

In [ ]:
export_profile(orig_profile)

In [ ]:
def export_profile(profile):
    output_path = os.path.join( '..', 'website_input', 'profile.json') 
    with open(output_path, "w") as f: 
        json.dump(unicode(profile), f, ensure_ascii = False)

In [ ]:
output_path = os.path.join( '..', 'website_input', 'profile.json') 
with open(output_path, "w") as f: 
    json.dump(unicode(profile), f, ensure_ascii = False)

In [ ]:
output_path = os.path.join( '..', 'website_input', 'profile.json') 


with open(output_path, "r") as f: 
    p = json.load(f)

In [ ]:
p.keys()

In [ ]: