For this class most of the types of network you will want to make can be produced by metaknowledge. The first three co-citation network, citation network and co-author network are specialized versions of the last three one-mode network, two-mode network and multi-mode network.
First we need to import metaknowledge and because we will be dealing with graphs the graphs package networkx as should be imported
In [1]:
import metaknowledge as mk
import networkx as nx
And so we can visualize the graphs
In [2]:
import matplotlib.pyplot as plt
%matplotlib inline
import metaknowledge.contour.plotting as mkv
Before we start we should also get a RecordCollection
to work with.
In [3]:
RC = mk.RecordCollection('../savedrecs.txt')
Now lets look at the different types of graph.
To make a basic co-citation network of Records use networkCoCitation().
In [4]:
CoCitation = RC.networkCoCitation()
print(mk.graphStats(CoCitation, makeString = True)) #makestring by default is True so it is not strictly necessary to include
graphStats() is a function to extract some of the statists of a graph and make them into a nice string.
CoCitation
is now a networkx graph of the co-citation network, with the hashes of the Citations
as nodes and the full citations stored as an attributes. Lets look at one node
In [5]:
CoCitation.nodes(data = True)[0]
Out[5]:
and an edge
In [6]:
CoCitation.edges(data = True)[0]
Out[6]:
All the graphs metaknowledge use are networkx graphs, a few functions to trim them are implemented in metaknowledge, here is the example section, but many useful functions are implemented by it. Read the documentation here for more information.
The networkCoCitation()
function has many options for filtering and determining the nodes. The default is to use the Citations
themselves. If you wanted to make a network of co-citations of journals you would have to make the node type 'journal'
and remove the non-journals.
In [7]:
coCiteJournals = RC.networkCoCitation(nodeType = 'journal', dropNonJournals = True)
print(mk.graphStats(coCiteJournals))
Lets take a look at the graph after a quick spring layout
In [8]:
nx.draw_spring(coCiteJournals)
A bit basic but gives a general idea. If you want to make a much better looking and more informative visualization you could try gephi or visone. Exporting to them is covered below in Exporting graphs.
The networkCitation() method is nearly identical to networkCoCitation()
in its parameters. It has one additional keyword argument directed
that controls if it produces a directed network. Read Making a co-citation network to learn more about networkCitation()
.
One small example is still worth providing. If you want to make a network of the citations of years by other years and have the letter 'A'
in them then you would write:
In [9]:
citationsA = RC.networkCitation(nodeType = 'year', keyWords = ['A'])
print(mk.graphStats(citationsA))
In [10]:
nx.draw_spring(citationsA, with_labels = True)
The networkCoAuthor() function produces the co-authorship network of the RecordCollection as is used as shown
In [11]:
coAuths = RC.networkCoAuthor()
print(mk.graphStats(coAuths))
In addition to the specialized network generators metaknowledge lets you make a one-mode co-occurence network of any of the WOS tags, with the oneModeNetwork() function. For examples the WOS subject tag 'WC'
can be examined.
In [12]:
wcCoOccurs = RC.oneModeNetwork('WC')
print(mk.graphStats(wcCoOccurs))
In [13]:
nx.draw_spring(wcCoOccurs, with_labels = True)
If you wish to study the relationships between 2 tags you can use the twoModeNetwork() function which creates a two mode network showing the connections between the tags. For example to look at the connections between titles('TI'
) and subjects ('WC'
)
In [14]:
ti_wc = RC.twoModeNetwork('WC', 'title')
print(mk.graphStats(ti_wc))
The network is directed by default with the first tag going to the second.
In [15]:
mkv.quickVisual(ti_wc, showLabel = False) #default is False as there are usually lots of labels
quickVisual() makes a graph with the different types of nodes coloured differently and a couple other small visual tweaks from networkx's draw_spring
.
For any number of tags the nModeNetwork() function will do the same thing as the oneModeNetwork()
but with any number of tags and it will keep track of their types. So to look at the co-occurence of titles 'TI'
, WOS number 'UT'
and authors 'AU'
.
In [16]:
tags = ['TI', 'UT', 'AU']
multiModeNet = RC.nModeNetwork(tags)
mk.graphStats(multiModeNet)
Out[16]:
In [17]:
mkv.quickVisual(multiModeNet)
Beware this can very easily produce hairballs
In [18]:
tags = mk.tagsAndNames #All the tags, twice
sillyMultiModeNet = RC.nModeNetwork(tags)
mk.graphStats(sillyMultiModeNet)
Out[18]:
In [19]:
mkv.quickVisual(sillyMultiModeNet)
If you wish to apply a well known algorithm or process to a graph networkx is a good place to look as they do a good job at implementing them.
One of the features it lacks though is pruning of graphs, metaknowledge has these capabilities. To remove edges outside of some weight range, use dropEdges(). For example if you wish to remove the self loops, edges with weight less than 2 and weight higher than 10 from coCiteJournals
.
In [20]:
minWeight = 3
maxWeight = 10
proccessedCoCiteJournals = mk.dropEedges(coCiteJournals, minWeight, maxWeight, dropSelfLoops = True)
mk.graphStats(proccessedCoCiteJournals)
Out[20]:
Then to remove all the isolates, i.e. nodes with degree less than 1, use dropNodesByDegree()
In [21]:
proccessedCoCiteJournals = mk.dropNodesByDegree(proccessedCoCiteJournals, 1)
mk.graphStats(proccessedCoCiteJournals)
Out[21]:
Now before the processing the graph can be seen here. After the processing it looks like
In [22]:
nx.draw_spring(proccessedCoCiteJournals)
Hm, it looks a bit thinner. Using a visualizer will make the difference a bit more noticeable.
Now you have a graph the last step is to write it to disk. networkx has a few ways of doing this, but they tend to be slow. metaknowledge can write an edge list and node attribute file that contain all the information of the graph. The function to do this is called writeGraph(). You give it the start of the file name and it makes two labeled files containing the graph.
In [23]:
mk.writeGraph(proccessedCoCiteJournals, "FinalJournalCoCites")
These files are simple CSVs an can be read easily by most systems. If you want to read them back into Python the readGraph() function will do that.
In [24]:
FinalJournalCoCites = mk.readGraph("FinalJournalCoCites_edgeList.csv", "FinalJournalCoCites_nodeAttributes.csv")
mk.graphStats(FinalJournalCoCites)
Out[24]:
This is full example workflow for metaknowledge, the package is flexible and you hopefully will be able to customize it to do what you want (I assume you do not want the Records staring with 'A').