Increasing the Productivity of Scholarship

The Case for Knowledge Graphs

Paul Groth - Elsevier Labs

@pgroth | pgroth.com | labs.elsevier.com

Outline

  • Productivity in scholarship
  • A model
  • Current solutions
  • Knowledge Graphs & Why
  • Challenges

Productivity

35% growth in labor productivity

Global Scholarly Output

64% growth in scholarly output

Growth in researchers: 4–5% per year STM Report

Scholarly productivity is not increasing

Why?

The Burden of Knowledge

  • Benjamin F. Jones
  • The Burden of Knowledge and the ‘Death of the Renaissance Man’: Is Innovation Getting Harder?
  • NBER Working Paper 11360
  • http://www.nber.org/papers/w11360

The Model (Roughly)

if one is to stand on the shoulders of giants, one must first climb up their backs, and the greater the body of knowledge, the harder this climb becomes. - Jones

  • Knowledge accumulates
  • Need for more education
  • => Innovators narrow their expertise
  • => Larger teams
  • More overhead to produce new knowledge

Facts & Figures

  • U.S. team size is seen to be increasing of 17% per decade
  • Specialization is increasing by 6% per decade
  • Nobel Prize winners invention increased by 6 years over the 20th Century
  • R&D employment rising dramatically, yet TFP growth has been flat (Jones, 1995b).
  • Average number of patents produced per R&D worker has been falling over time across countries (Evenson 1984)

Reading more but with less time

  • "45-50 minutes in the mid-1990s to just over 30 minute" - The 2015 STM Report

Citing more...

  • Long-Term Variations in the Aging of Scientific Literature: From Exponential Growth to Steady-State Science
  • Vincent Larivière, Éric Archambault, Yves Gingras JASIST

All is not lost

101 Innovations in Scholarly Communication by Jeroen Bosman & Bianca Kramer

Knowledge Graphs

Knowledge Graph Definition:

graph structured knowledge bases (KBs) which store factual information in form of relationships between entities

  • Nickel et al. "A Review of Relational Machine Learning for Knowlege Graphs" arXiv:1503.00759v1d
  • Typically integrated with some form of context or probablities associated with facts
  • A nice tutorial

These are primarily for encycolopedic knowledge but I think it can help for science.

How does the concept help?

Integrate four core knowledge types

  1. Databases
  2. Text
  3. Models
  4. Social Networks

Lots of text

2.5 million articles a year

A couple of notes:

  • We too often look at articles as independent blocks
  • Many databases are curated from text (e.g. Chembl, Reaxys)

Can we connect them?

Example: Paleontological Databases

  • Paleobiology Database (PBDB; http://paleobiodb.org)
  • Peters SE, Zhang C, Livny M, Ré C (2014) A Machine Reading System for Assembling Synthetic Paleontological Databases. PLoS ONE 9(12): e113523. doi:10.1371/journal.pone.0113523
    • "the majority of the data were extracted from approximately 40,000 publications"
    • "leverages only a small fraction of all published paleontological knowledge"
    • "because the end product of manual data entry is a list of facts that are divorced from most, if not all, original contexts, assessing the quality of the database and the reproducibility of results is difficult."

Deep Dive Architecture: Text + DB

Models

  • Many models can be expressed with respect to graphical structures.
  • Examples
    • Link Prediction
    • Collective classification
    • Entity Resolution
    • Cellular Networks
    • Input to QA systems
  • Potential for common variables in models

FoxPSL

Sara Magliacane et al. FoxPSL: An Extended And Scalable PSL Implementation. AAAI Spring Symposium 2015 on Knowledge Representation and Reasoning.

The Burden of Knowledge?

  • Have computers attack the problem
  • Perform synthesis between the various kinds of knowledge we produce
  • Come up to speed by having information in one place
  • Ability to make smaller contributions that spread faster
    • e.g. wikidata

Challenges

  • Integration with modeling environments
  • User interaction
    • Are cards the only thing?
    • Is voice really the right way?
  • Tackling highly specific domain knowledge

Conclusion

  • The problem of too much is very real in scholarship
    • The Burden of Knowledge
  • New computational tools are necessary, but
    • Look at addressing systemic problems
    • See also:
      • discoveryinformaticsinitiative.org
      • DARPA Big Mechanism
  • One exciting tool is knowledge graphs

In [ ]: