Emergence

This notebook will teach you how to use disciplines for the analysis of particular theory. Here we will see theory of emergence.

First, lets import the theory.


In [1]:
from disciplines.theory import emergence

We can get the basic information about theory from its docstring.


In [2]:
print emergence.__doc__


 Emergence of discipline 

This module is an implementation of theoretical work by Ziman.
It checks a stage of development and contains multiple functions 
designed for looking for evidence of stage of emergence.

Vars:
    author
    concepts
    data
    claim
    theory
    approach

The docstring above is written manually, in the future editions it will be generated from the data associated with the module.

Authors

Now lets see who are the authors associated with the disciplines.


In [3]:
emergence.authors


Out[3]:
['Ziman']

both author and authors work in the same way.

Concepts

You can get list of concepts used within theory like this. For this moment, the list of concept was extracted manually by reading the theory itself. However, in future versions of disciplines we will try to automatize extraction of main concepts of the text. That will likely be done by using NLP algorithms. I have seen some but I cannot give names for now. So, what concepts are used in emergence theory? (For now we use only concepts that are used by Ziman.)


In [4]:
emergence.concepts


Out[4]:
['nodal point',
 'organize little research conferences',
 'hierarchy of authority',
 'association develops into a learned society',
 'newsletter becomes reputable primary journal',
 'primary journal']

Also, concepts here should be defined. Two types of definitions should be available:

a) on definitions by the author and 
b) by general consensus use of definitions. 

In both cases we can look into. Term extraction.

Data

In theory of emergence of disciplines, as in any other theory, we have some ideas about what kind of data we need. The theories data requirements were extracted from text itself. Datatypes can be access like this:


In [6]:
emergence.data


Out[6]:
['citation network', 'journal', 'newsletter', 'conference participants']

Based on this list, scripts can understand, what kind of data will have to be prepared from database and other locations.

In this case we will might need network of citations, list of journals, newsletters, and conferences.

Still, this list is problematic because it does not provide full information what kind of aspects we will need. For example, it is not written that we will need list of conferences, but this can be deduced from the fact that we will need list of conference participants.

Also, we do not know why we need list of journals

Theory

The description of particular theory can be retrieved.


In [5]:
print emergence.theory


    1. Emerging specialty is only observable as a nodal point in the network of 
    citations. 
    2. Scientists whose research is associated with this co-citation cluster 
    organize little research conferences to discuss their common interests, or are 
    commissioned to write articles for a special issues of a primary journal 
    drawing attention to progress in this particular problem area. 
    3. An ‘invisible college’ begins to condense out, in the form, say, of a semi-
    official association held together by further conferences, the regular 
    exchange of pre-prints and re-prints and the publication of an informal 
    ‘newsletter’. 
    4. The association develops into a regular learned society, whose newsletter 
    has become a reputable primary journal. 
    5. A hierarch of authority is soon set up to preside over conferences, edit 
    journals, allocate resources, and confer recognition on the members of the new
    discipline.
    

Currently it is only a text that was used to formulate the claims. That is going to be changed.

In future, theory will be associated with many related texts and particular extracts from them. For now, we will have only this extract as it is dense enough to continue with.

But, before we continue, let's look into description of our approach.


In [6]:
print emergence.approach


    The object of analysis is easy to specify when it comes to first point. It is 
    some citations. In second point we have to identify a) scientists associated 
    with citation cluster, b) conferences. Or and to look for special issues of 
    primary journals. It is hard to identify third point as we would need to 
    find "invisible college", exchange of pre-prints nadd reprits. But, it is 
    possible to find some newsletters. The fourth step is straightforward, it is 
    rather easy to identify learned society when it already exists, the same about 
    the journal. The fifth. Who presides over conferences, who edit journal and 
    who allocate resources and confer recognition is easy to get. Most of this 
    information can be get online.
    Several strategies are available to do such things. It depends what we want to 
    do. The later stage of development the easier is to get data. However, even 
    when it comes to finding NPNOC it can be accomplished.
    Other thing to consider is the starting point of research. Whether to start 
    from specific disciplines or just run over all. The first type is easier to 
    do, but then we would come to a problem of integrating the data. The second 
    approach is harder to do but is better one. For know, I think the best way to 
    go is to have small case studies, built on earlier research, and follow by 
    constructing the overarching algorithms.
    Functionality: identification of turning points of development of a discipline 
    by identifying NPNOC, people, conferences and etc date is the prime focus.
    

The part is totally written by humans. This is part that glues all hard-to-related aspects of developing this theory. At this point I hardly imagine that this is going to be done automatically.

However, we plan to get this into stage where text written here could be interpreted with NLP and necessery actions might be proposed. Still, we have a loong way till that. Why? Well, maybe not that long.

In every theory we have little details that it is made of. Every sentence or paragrah can be implemented as little code. All of them should be stored as lists, dictionaries, functions and objects. For now, let's look at functions that are already available.


In [7]:
import inspect
all_functions = inspect.getmembers(emergence, inspect.isfunction)
[x[0] for x in all_functions]


Out[7]:
['detect_emergences',
 'detect_hierarchy_of_authority',
 'detect_regular_learned_society',
 'get_conferences_organized_by_cluster',
 'get_invisible_college',
 'get_regular_exchange_of_pre_prints',
 'get_special_issues_of_a_primary_journal',
 'is_emergence_state',
 'is_stage',
 'observe_nodal_points',
 'recreate_emerge',
 'reputable_primary_journals',
 'what_emergence_state',
 'who_confers_recognition',
 'who_edits_journals',
 'who_presides_over_conferences']

To work

lets get back to data types that are required to test the theory.


In [8]:
emergence.data


Out[8]:
['citation network', 'journal', 'newsletter', 'conference participants']

Now lets look whether our current database has any of the following data required.


In [9]:
from data import availability
availability('citation network') #return all datasets that fulfill criteria of citation network.
availability('journal') #return all datasets that fulfill criteria of citation network.
# Problem, a more specific set is required, not a journal, but special editions, in which a particular set of resarchers are going
availability('newsletter') #check whether historic account of newsletter available.
availability('conference participants') #check whether historic account of newsletter available.


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-9-f22606a24aa7> in <module>()
----> 1 from data import availability
      2 availability('citation network') #return all datasets that fulfill criteria of citation network.
      3 availability('journal') #return all datasets that fulfill criteria of citation network.
      4 # Problem, a more specific set is required, not a journal, but special editions, in which a particular set of resarchers are going
      5 availability('newsletter') #check whether historic account of newsletter available.

ImportError: No module named data

We have found that only citation network and hierarchy of a journal exists. Therefore we will continue the reseach based on available information.


In [10]:
url = 'd:\Desktop\DBLP_Citation_2014_May\publications.txt'

In [11]:
with open(url) as infile:
    count = 0
    mylist = []
    thelist = []
    for line in infile:
        count += 1
        if count == 1:
            mylist.append(line)
        if count == 2:
            mylist.append(line)
        if count == 3:
            mylist.append(line)
        if count == 4:
            mylist.append(line)
        if count == 5:
            mylist.append(line)
        if count == 6:
            mylist.append(line)
        if count == 7:
            mylist.append(line)
        if count == 8:
            thelist.append(mylist)
            mylist = []
            count = 0
        #do_something_with(line)

In [12]:
len(thelist)


Out[12]:
2599712

In [14]:
thelist[2]


Out[14]:
['#*Overview of the ADDS System.\n',
 '#@Yuri Breitbart,Tom C. Reyes\n',
 '#t1995\n',
 '#cModern Database Systems\n',
 '#index4\n',
 '#% \n',
 '#!\n']

In [40]:
url = 'd:\Desktop\DBLP_Citation_2014_May\domains'

def get_citations(url):
    print 'reading {}'.format(link)
    with open(url) as infile:
        mylist = []
        thelist = []
        for line in infile:
            mylist.append(line)
            if '  \n' in line:
                mylist.pop()
                thelist.append(mylist)
                mylist = []
    return thelist

list_of_lists = []
for x in os.listdir(url):
    link = '\\'.join([url,x])
    citations = get_citations(link)
    list_of_lists.append(citations)

len(list_of_lists)


reading d:\Desktop\DBLP_Citation_2014_May\domains\Artificial intelligence.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\Compter graphics_Multimedia.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\Computer networks.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\Database_Data mining_Information retrieval.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\High-Performance Computing.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\Human computer interaction_Ubiquitous computing.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\Information security.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\Interdisciplinary Studies.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\Software engineering.txt
reading d:\Desktop\DBLP_Citation_2014_May\domains\Theoretical computer science.txt
Out[40]:
10

In [24]:
if line '#*':
        paperTitle = line
    if line '#@':
        Authors = line
    if line '#t':
        Year = line
    if line '#c':
        publication_venue = line
    if line '#index 00':
        index_id = line
    if line '#%':
        references.append(line)
    if line '#!':
        abstract = line


  File "<ipython-input-24-2c33b3de39c9>", line 1
    if line '#*':
               ^
SyntaxError: invalid syntax

In [1]:
emergence.what_emergence_state('social studies of science')


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-90ee3c756d44> in <module>()
----> 1 emergence.what_emergence_state('social studies of science')

NameError: name 'emergence' is not defined

In [ ]:
emergence.is_emergence_state(1, 'social studies of science')

In [ ]:
network_of_citations = 'some graph'

In [ ]:
emergence.observe_nodal_points(network_of_citations, 3)

In [ ]:
person_list = ['John Peter', 'Pete Johner']

In [ ]:
emergence.get_conferences_organized_by_cluster(person_list)

In [ ]:
emergence.recreate_emerge('sociology')

In [ ]:
emergence.detect_emergences()

In [ ]:
emergence.get_special_issues_of_a_primary_journal('Science and Society Studies')  # what is a primary journal