This notebook is dedicated to comparison of two journal articles, two ideas. This can be interpreted as meta-research. For now, just to test our capabilities and to discover problematics of package we are going to compare what kind of disciplines mentioned in different codes.
Here we will use three publications:
We will start by analysis by analyzing each article. Then, in second part, we will see what we can get by fusing them together.
In [67]:
from disciplines.theory import consensus_map_of_science
from IPython.display import Image
i = Image(url='http://wiki.cns.iu.edu/download/attachments/1245876/worddavecfd04f904a8c7a15eaac3c2b9a6305a.png')
i
Out[67]:
If we look into the the article we would find main findings filled in form of table. We did enter data from table into Python's tables and dictionaries. (In future we will do it ina format that will be more pandas friendly)
In [68]:
discipline_name_dictionary = consensus_map_of_science.name_dict
consensus_table = consensus_map_of_science.TABLE_3
consensus_map_of_science.TABLE_3.keys()
Out[68]:
In [69]:
consensus_map_of_science.TABLE_3['columns']
Out[69]:
In [70]:
consensus_map_of_science.TABLE_3['table'][0:2]
Out[70]:
In [71]:
import pandas as pd
df = pd.DataFrame([x[1:] for x in consensus_table['table']])
df.columns = consensus_table['columns']
df.index = range(1, len(df) +1) # Here we do some reindexing so it stays closer to original form.
In [72]:
df['Rank'] = df['Rank'].apply(lambda x: discipline_name_dictionary[x])
df['Pair'] = df['Pair'].apply(lambda x: discipline_name_dictionary[x])
df['Rank'] = df['Rank'].str.rstrip(' ')
df['Pair'] = df['Pair'].str.rstrip(' ')
In [73]:
unique_disciplines_in_consensus = set(df['Rank'].unique()) | set(df['Pair'].unique())
In [88]:
from disciplines.theory.consensus_map_of_science import TABLE_3
from disciplines.theory.consensus_map_of_science import name_dict
g_cms = nx.Graph()
#g_cms.add_edges_from([(x[1], x[2], {'weight':x[5]}) for x in TABLE_3['table']])
g_cms.add_edges_from([(name_dict[x[1]], name_dict[x[2]], {'weight':x[4]}) for x in TABLE_3['table']])#
nx.draw(g_cms)
here we have a list of disciplines, or formations, mentioned in consensus map of science module. Can we compare it to disciplines mentioned elsewhere? What aboug Biglans?
A separate notebook goes deeper into the subject. Biglan
In [74]:
from disciplines.theory import biglan
columns = ['discipline','pure', 'hard', 'life']
the = biglan.the_classification
mylist = []
for line in the:
for discipline in line[0]:
mylist.append([discipline, line[1]['pure'], line[1]['hard'], line[1]['life']])
df_biglan = pd.DataFrame(mylist, columns=columns)
df_biglan['discipline'] = df_biglan['discipline'].str.rstrip(' ')
unique_disciplines_in_biglan = set(df_biglan['discipline'].unique())
Currently we have little problems, like psychology and psychiatry is made into one. Also, brain research has second name that is neuroscience. Earth sciences have second name that is geoscience. We have to somehow handle this complexity
In [75]:
def see_similarity(terms):
'''
We except terms to be splited because of two reasons, appearence of "()" and "/".
'''
if len(terms) == 2:
terms = [str.lower(term) for term in terms]
term1, term2= [terms[0]], [terms[1]]
if r'/' in term1[0]:
term1 = term1[0].split(r'/')
if ('(' in term2) and (')' in term2):
term2 = term2.split('(').strip(')')
for term in term1:
if term in term2:
answer = True
return [term1, term2], True
see_similarity(['Psychology/psychiatry', 'Psychology'])
Out[75]:
In [12]:
from disciplines.theory import web_of_science_categories
wos = set(web_of_science_categories.categories)
In [90]:
g_biglan = nx.Graph()
#sudedam visas disciplinas, surasom ju savybes
#jei savybes atitinka dedame svori.
#atitinka viena savybe - 1
#atitinka dvi savybes - 2
#atitinka trys savybes - 3
for x in biglan.the_classification:
for discipline in x[0]:
g_biglan.add_node(discipline, x[1])
#two ways to go:
## connect nodes to node that represents value or
## Connect nodes that have same value.
# Connecting to node that represent value
for node, info in g_biglan.nodes_iter(data=True):
if info['life'] == True:
g_biglan.add_edge(node, 'life')
if info['life'] == False:
g_biglan.add_edge(node, 'non-life')
if info['pure'] == True:
g_biglan.add_edge(node, 'pure')
if info['pure'] == False:
g_biglan.add_edge(node, 'applied')
if info['hard'] == True:
g_biglan.add_edge(node, 'hard')
if info['hard'] == False:
g_biglan.add_edge(node, 'soft')
import matplotlib.pyplot as plt
plt.figure(figsize=(12,12))
nx.draw(g_biglan)
Now as we have datasets we can go into comparing them. By comparison we meet few things.
In [76]:
overlaping = unique_disciplines_in_biglan & unique_disciplines_in_consensus
a_but_not_b = unique_disciplines_in_biglan - unique_disciplines_in_consensus
b_but_not_a = unique_disciplines_in_consensus - unique_disciplines_in_biglan
In [77]:
def compare_two_sets_of_disciplines(set1, set2):
"""This function compares sets and return items that:
Args:
set1 (set):
set2 (set):
Vars:
discipline1 (str):
discipline2 (str):
Returns:
answer(dict): that has three keys: same, similar and different
"""
answer = {'same': set([]),
'similar': set([]),
'different': set([])}
for discipline1 in set1:
for discipline2 in set2:
if discipline1 == discipline2:
answer['same'].add(discipline1)
elif (discipline1 in discipline2) or (discipline2 in discipline1):
answer['similar'].add((discipline1, discipline2))
return answer
In [78]:
# Change to permutations
compared = []
compared.append(compare_two_sets_of_disciplines(unique_disciplines_in_biglan, unique_disciplines_in_consensus))
In [79]:
import networkx as nx
g = nx.Graph()
g.add_nodes_from(compared[0]['same'])
g.add_edges_from(compared[0]['similar'])
%matplotlib inline
nx.draw(g)
Now we have two ways extending the code. First, we can see what kind of
Now we will merge two graphs, two categorization. They come from very different schools.
In [32]:
plt.figure(figsize=(10,10))
first_composition = nx.compose(g, g_biglan)
nx.draw(first_composition)
In [33]:
g_composed = nx.Graph()
new_nbunch = []
new_ebunch = []
set1 = [node.rstrip(' ') for node, data in g.nodes_iter(data=True)]
set2 = [node for node, data in g_biglan.nodes_iter(data=True)]
#all_nodes = g_composed.nodes()
def find_similar(set1, set2):
answer = []
for node in set1:
for node1 in set2:
if ((node in node1) or (node1 in node)) and (node != node1):
answer.append((node, node1, {'weight':100}))
return answer
temporary_result = find_similar(set1, set2)
first_composition.add_edges_from(temporary_result)
plt.figure(figsize=(10,10))
nx.draw(first_composition)
Next, add list of humanities and social sciences
In [17]:
from disciplines.theory import Zhang
zhang = Zhang.columns
a list of of dictionaries that has string as key and list of disciplines as values. Problems: each key is a set of distinct disciplines. Second problem, some disciplines contained in a list are actually
In [18]:
import networkx as nx
g= nx.Graph()
little_bunch = []
for key in zhang:
for key1 in key.keys():
for discipline in key.values():
for subdiscipline in discipline:
little_bunch.append((subdiscipline, key1, {'source':'Zhang'}))
#print key1.split(',') # if we will want to split
g.add_edges_from(little_bunch)
In [47]:
def create_relations_based_on_similar_named(graph):
"""Takes a nx.graph object and creates relations based on similar names.
Notes:
This is complicated because some names are separated with "," and similar
separators
Other way to do similar thing is to retriever keywords engineering, humanities
multidisciplinary, psychology and etc."""
answer = []
count = 0
for node in g.nodes():
for node1 in g.nodes():
if ',' in node1:
for separated_node in node1.split(','):
if (separated_node in node) and (node1, node) not in answer:
answer.append((node1, node))
count += 1
return answer
ebunch = create_relations_based_on_similar_named(g)
In [53]:
[len(x) for x in ebunch]
Out[53]:
In [57]:
import networkx as nx
g_zhang_2 = nx.Graph()
g_zhang_2.add_edges_from(ebunch)
In [65]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(20,20))
nx.draw(g_zhang_2)
In [66]:
compared.append(compare_two_sets_of_disciplines(unique_disciplines_in_consensus, wos))
compared.append(compare_two_sets_of_disciplines(unique_disciplines_in_biglan, wos))