**metaknowledge**
*NetLab, University of Waterloo*
Reid McIlroy-Young, John McLevey, and Jillian Anderson

My Outline

  • Introduction
    • What the purpose of this notebook is. What can it be used for.
    • Where does all the stuff need to be placed?
    • Install packages??
    • Import packages
  • Networks
    • Set variables
    • Do the backend processing
      • Make network
      • Add centrality measures
      • Give them the ability to filter? (Advanced feature)
    • Make the file
    • Display the file
  • RPYS
    • Set variables
    • Standard
    • Multi

Getting Set Up

The very first time you use this jupyter notebook you will need to run the cell directly below. Do not run the cell the next time you use this jupyter notebook. If you do, nothing bad will happen, it just isn't neccessary.


In [ ]:
# Only run this the VERY first time
!pip install metaknowledge
!pip install networkx
!pip install pandas
!pip install python-louvain

In [1]:
# Run this before you do anything else
import metaknowledge as mk
import networkx as nx
import pandas
import community
import webbrowser

Networks

Define Variables

Next, we need to define some variables:

  • filepath should be set as the filepath to your isi file.
  • networkType should be "CoCitation", "CoAuthor", or "Citation".
  • nodeType must be set to one of "full", "original", "author", "journal", or "year".

In [2]:
inputFile = "/Users/jilliananderson/Desktop/mkD3Test/pos.txt"
networkType = "CoCitation"
nodeType = "author"

Make Network


In [3]:
# This cell creates the network based on 
# the variables you provided above.
RC = mk.RecordCollection(inputFile)

if networkType == "CoCitation":
    net = RC.networkCoCitation(nodeType = nodeType, coreOnly=True)
elif networkType == "CoAuthor":
    net = RC.networkCoAuthor(coreOnly=True)
elif networkType == "Citation":
    net = RC.networkCitation(nodeType=nodeType, coreOnly=True)
elif networkType == "BibCoupling":
    net = RC.networkBibCoupling()
else:
    print("Please ensure networkType has been set to one of the accepted values")

    
# This code detects communities and centrality
# measures for your network.
partition = community.best_partition(net)
# closeness = nx.closeness_centrality(net)
betweenness = nx.betweenness_centrality(net)
# eigenVect = nx.eigenvector_centrality(net)
for n in net.nodes():
    comm = partition[n]
#     clos = round(closeness[n], 3)
    betw = round(betweenness[n], 3)
#     eVct = round(eigenVect[n], 3)
    net.add_node(n, community=comm, betweenness=betw)
    
# This code writes two .csv files to your computer.
# One is the edgeList and the other is the node Attribute file
mk.writeGraph(net, "myNet")

Writing the HTML file

To display our network, we need to make the file which displays it.


In [7]:
%%writefile network.html
<!DOCTYPE html>
<head>
    <meta charset="utf-8">
    <title>Title Here</title>
    <link rel="stylesheet" href="http://networkslab.org/mkD3/styles.css">
    <script src="https://d3js.org/d3.v4.js"></script>
    <script src="http://networkslab.org/mkD3/mkd3.js"></script>
</head>
<body>
    <script type = "text/javascript">
        mkd3.networkGraph("myNet_edgeList.csv", "myNet_nodeAttributes.csv")
    </script>
</body>


Writing network.html

Display the Network

Running the next cell


In [30]:
url = 'http://localhost:8888/files/network.html'
webbrowser.open(url)


Out[30]:
True

RPYS Visualization


In [11]:
inputFile = "/Users/jilliananderson/Desktop/mkD3Test/pos.txt"
minYear = 1900
maxYear = 2016
rpysType = "StandardBar"

Standard RPYS


In [23]:
RC = mk.RecordCollection(inputFile)

rpys = RC.rpys(minYear=1900, maxYear=2016)
df = pandas.DataFrame.from_dict(rpys)
df.to_csv("standard_rpys.csv") 

# Creating CitationFile
citations = RC.getCitations()
df = pandas.DataFrame.from_dict(citations)
df.to_csv("standard_citation.csv")

In [28]:
%%writefile standardBar.html
<!DOCTYPE html>
<head>
    <meta charset="utf-8">
    <title>Title Here</title>
    <link rel="stylesheet" href="http://networkslab.org/mkD3/styles.css">
    <script src="https://d3js.org/d3.v4.js"></script>
    <script src="http://networkslab.org/mkD3/mkd3.js"></script>
</head>
<body>
    <script type = "text/javascript">
        mkd3.standardLine("standard_rpys.csv", "standard_citation.csv")
    </script>
</body>


Overwriting standardBar.html

In [29]:
url = 'http://localhost:8888/files/standardBar.html'
webbrowser.open(url)


Out[29]:
True

In [26]:
%%writefile standardLine.html
<!DOCTYPE html>
<head>
    <meta charset="utf-8">
    <title>Title Here</title>
    <link rel="stylesheet" href="http://networkslab.org/mkD3/styles.css">
    <script src="https://d3js.org/d3.v4.js"></script>
    <script src="http://networkslab.org/mkD3/mkd3.js"></script>
</head>
<body>
    <script type = "text/javascript">
        mkd3.standardLine("standard_rpys.csv", "standard_citation.csv")
    </script>
</body>


Writing standardLine.html

In [27]:
url = 'http://localhost:8888/files/standardLine.html'
webbrowser.open(url)


Out[27]:
True

Multi RPYS


In [14]:
years = range(minYear, maxYear+1)
RC = mk.RecordCollection(inputFile)

# ***************************
#  Create the multiRPYS file
# ***************************
dictionary = {'CPY': [],
             "abs_deviation": [],
             "num_cites": [],
             "rank": [],
             "RPY": []}
for i in years:
    try:
        RCyear = RC.yearSplit(i, i)
        if len(RCyear) > 0:
            rpys = RCyear.rpys(minYear=1900, maxYear=maxYear)
            length = len(rpys['year'])
            rpys['CPY'] = [i]*length

            dictionary['CPY'] += rpys['CPY']
            dictionary['abs_deviation'] += rpys['abs-deviation']
            dictionary['num_cites'] += rpys['count']
            dictionary['rank'] += rpys['rank']
            dictionary['RPY'] += rpys['year']
    except:
        pass

df = pandas.DataFrame.from_dict(dictionary)
df.to_csv("multi_rpys.csv")


# ***************************
#  Create the citation file
# ***************************
dictionary = {"author": [],
              "journal": [],
              "num_cites": [],
              "RPY": [],
              "CPY": []}

for i in years:
    try:
        RCyear = RC.yearSplit(i, i)
        if len(RCyear) > 0:
            citations = RCyear.getCitations()
            length = len(citations['year'])
            citations['CPY'] = [i]*length

            dictionary['CPY'] += citations['CPY']
            dictionary['author'] += citations['author']
            dictionary['journal'] += citations['journal']
            dictionary['num_cites'] += citations['num-cites']
            dictionary['RPY'] += citations['year']
    except:
        pass

df = pandas.DataFrame.from_dict(dictionary)

df.to_csv("multi_citation.csv")

In [16]:
%%writefile multiRPYS.html
<!DOCTYPE html>
<head>
    <meta charset="utf-8">
    <title>Title Here</title>
    <link rel="stylesheet" href="http://networkslab.org/mkD3/styles.css">
    <script src="https://d3js.org/d3.v4.js"></script>
    <script src="http://networkslab.org/mkD3/mkd3.js"></script>
</head>
<body>
    <script type = "text/javascript">
        mkd3.multiRPYS("multi_rpys.csv", "multi_citation.csv")
    </script>
</body>


Writing multiRPYS.html

In [19]:
url = 'http://localhost:8888/files/multiRPYS.html'
webbrowser.open(url)


Out[19]:
True

In [ ]: