Linguistic change over the course of membership in an online support group

Daniel McDonald

SUMMARY: This IPython Notebook demonstrates the methodology I used to extract findings from a corpus of posts to a bipolar disorder online support group. The tools and methods here can easily be applied to corpus linguistic and digital humanities research, and can also be used as an introduction to Python or natural language processing.

Setup

If you haven't already done so, the first things we need to do are install corpkit, download data for NLTK's tokeniser, and unzip our corpus.


In [ ]:
# install corpkit with either pip or easy_install
try:
    import pip
    pip.main(['install', 'corpkit'])
except ImportError:
    import easy_install
    easy_install.main(["-U","corpkit"])

In [ ]:
# download nltk tokeniser data
import nltk
nltk.download('punkt')

In [ ]:
# unzip and untar our data
! gzip -dc data/bipolar.tar.gz | tar -xf - -C data

In [207]:
print 'hello everyone'


hello everyone

Great! Now we have everything we need to start.

Quickstart

Let's first perform a quickstart, just so we know what we're getting into.


In [5]:
# plot our figures in this window
%matplotlib inline
# import everything we need
import corpkit
from corpkit import interroplot
# path to data
path = 'data/postcounts'
# words to find
query = ['meds', 'diagnosed', 'support', 'bipolar', 'pdoc', 'board']
# search and plot
interroplot(path, query)


 15:49:19: Finished! 6 unique results, 69098 total.

On the x-axis as groups of posts. Group 1 is comprised of all first posts. Group 2 contains second and third posts. This progresses until group 10, where 560th posts and above are stored. We'll go into this in more depth later.

Here, the y-axis is measuring the result as a percentage of all the results in that year.

We can see here how certain lexical items are more often used by new than veteran members, and vice versa. Interestingly, bipolar itself drops in relative frequency quite a lot! By the end of this Notebook, we should have some reasons why this is so.

Orientation

First, let's import the functions we'll be using to investigate the corpus. These functions are designed for this interrogation, but also have more general use in mind, so you can likely use them on your own corpora.

Function name Purpose
interrogator() interrogate parsed corpora
editor() edit interrogator() results
plotter() visualise interrogator() results
quickview() view interrogator() results
multiquery() run a list of interrogator() queries
conc() complex concordancing of subcorpora
keywords() get keywords and ngrams from conc() output, subcorpora
collocates() get collocates from conc() output, subcorpora
quicktree() visually represent a parse tree
searchtree() search a parse tree with a Tregex query
save_result() save a result to disk
load_result() load a saved result
load_all_results() load every saved result into a dict

In [206]:
%matplotlib inline
import corpkit
from corpkit import *
import pandas as pd
# r = load_all_results()

def show(lines, index, show = 'thread'):
    url = lines.ix[index]['link'].replace('<a href=', '').replace('>link</a>', '')
    return HTML('<iframe src=%s width=1000 height=500></iframe>' % url)



The first thing we need to do is set a path to our corpus. If you have different data, you can change this.


In [2]:
corpus = 'data/postcounts' # path to corpora

Let's also quickly set some options for displaying raw data:


In [3]:
pd.set_option('display.max_rows', 20)
pd.set_option('display.max_columns', 10)
pd.set_option('max_colwidth',70)
pd.set_option('display.width', 1000)
pd.set_option('expand_frame_repr', False)

The data

The data is every post to a 10 year old online support group for bipolar disorder. The support group contains mostly people who have bipolar disorder, but also present are the friends and families of people living with the condition.

Like many forums, this one is characterised by a high dropout rate, with most users having a total postcount under five.


In [195]:
from data.tallies import postcounter
num_posts = postcounter("data/postcounts")


14:44:09: Working ... 

14:46:30: Done!

In [199]:
plotter('Number of users by number of posts', num_posts, logx = True, x_label = "User's total post count", 
        y_label = 'Number of users', legend = False, kind = 'area', figsize = (16, 7))


To create the corpus, each page of the forum was downloaded and stored. Posts were extracted using Beautiful Soup, and were grouped into ten subcorpora representing different stages of membefunct. The first subcorpus contains all first posts. The second contains all 2nd and 3rd posts, and so forth. In the final subcorpus are all 560th posts and above. Each is approximately equal size.

Post group 1 2 3 4 5 6 7 8 9 10
Post range 1st post 2-3 posts 4-7 posts 8-15 posts 16-30 posts 31-58 posts 59-115 posts 116-219 posts 220-559 posts 560+ posts
# texts 5818 5689 5607 5937 5790 5875 5848 5757 5789 5570
# users 5818 3348 1777 1004 529 284 148 76 38 8
Wordcount 1135520 832198 827893 853123 828538 831341 803965 870124 755476 737896

The texts have been parsed for part of speech and grammatical structure by Stanford CoreNLP. In this notebook, we are only working with the parsed versions of the texts. It's definitely worthwhile to learn the Tregex syntax. If not, at the end of this notebook are a series of Tregex queries that you can copy and paste in.

The interrogator() and conc() functions rely on Tregex to interrogate the corpora. Tregex allows very complex searching of parsed trees, in combination with Java Regular Expressions.

Interrogating the corpus

So, let's generate some general information about this corpus. First, let's define a query to find every word in the corpus. Run the cell below to define the allwords_query variable as the Tregex query to its right.

When writing Tregex queries or Regular Expressions, remember to always use r'...' quotes!


In [28]:
# any token containing letters or numbers (i.e. no punctuation):
allwords_query = r'/[A-Za-z0-9]/ !< __'

Corpus interrogation is handled by interrogator(). Its most important arguments are:

  1. path to corpus
  2. search options
  3. a search query

There are many kinds of search options available:

Option Function
b get tag and word of Tregex match
c count Tregex match
d get dependent of regular expression match and the r/ship
f get dependency function of regular expression match
g get governor of regular expression match and the r/ship
i get dependency index of regular expression match
k Find keywords
n find n-grams
p get part-of-speech tag with Tregex
r regular expression, for plaintext corpora
s simple search string or list of strings for plaintext corpora
w get word(s) returned by Tregex/keywords/ngrams

Right now, we only need to count tokens, so we can use the c option. The cell below will run interrogator() over each subcorpus and count the number of matches for the query.


In [69]:
allwords = interrogator(corpus, 'c', allwords_query, quicksave = 'allwords')


 21:36:33: Finished! 8476074 total occurrences.

When the interrogation has finished, we can view our results:


In [29]:
# from the allwords results, print the totals
allwords.totals


Out[29]:
1     1135520
2      832198
3      827893
4      853123
5      828538
6      831341
7      803965
8      870124
9      755476
10     737896
Name: Total, dtype: int64

If you want to see the query and options that created the results, you can use:


In [30]:
allwords.query


Out[30]:
{'datatype': dtype('int64'),
 'dep_type': 'basic-dependencies',
 'dictionary': 'bnc.p',
 'function_filter': False,
 'lemmatag': False,
 'lemmatise': False,
 'options': 'count',
 'path': 'data/postcounts',
 'phrases': False,
 'plaintext': False,
 'query': '/.?[A-Za-z0-9].?/ !< __',
 'quicksave': False,
 'spelling': False,
 'table_size': 50,
 'time_ended': '2015-06=11 16:53:31',
 'time_started': '2015-06=11 16:49:18',
 'titlefilter': False,
 'translated_options': 'C'}

Plotting results

Lists of years and totals are pretty dry. Luckily, we can use the plotter() function to visualise our results, and the table function to table them. At minimum, plotter() needs two arguments:

  1. a title (in quotation marks)
  2. a list of results to plot

In [4]:
plotter('Word counts in each subcorpus', allwords.totals, figsize = (16, 7), kind = 'area')


Great! So, we can see that the number of words per postgroup actually varies quite a lot. That's worth keeping in mind.

Pilot study: medication names

Let's get to know the functions available to us by performing a pilot study of the use of medication names in the community. If we simply create a list of words, corpkit() will automatically create a Tregex query to match any word in the list:


In [33]:
# our query:
med_words_query = ['lithium', 'lexapro', 'prozac', 'ssri', 'quetiapine', 'seroquel',
                   'depakote', 'risperdol', 'risperidone', 'depakene', 'lamictal', 
                   'klonopin', 'abilify', 'geodon', 'topomax', 'zyprexa', 'tegretol', 
                   'carbamazepine']
med_words = interrogator(corpus, 'words', med_words_query, quicksave = 'med_words')


 16:55:08: Finished! 18 unique results, 30181 total.


17:00:28: Data saved: data/saved_interrogations/med_words.p

Even when do not use the count option, we can access the total number of matches as before:


In [6]:
plotter('Medication words', med_words.totals, figsize = (16, 7))


At the moment, it's hard to tell whether or not these counts are simply because our postcount samples are different sizes. To account for this, we can calculate the percentage of parsed words that are medication words. This means combining the two interrogations we have already performed.

We can do this by using editor():


In [186]:
rel_medwords = editor(med_words.results, '%', allwords.totals)
plotter('Relative frequency of medication words', rel_medwords.totals, num_to_plot = 'all')


***Processing results***
========================

***Done!***
========================


In [189]:
plotter('Relative frequency of medication words', rel_medwords.results, legend_pos = 'o r',
        style = 'bmh', figsize = (14, 6))


That's more helpful. We can now see that first posts contain the highest relative frequency of medication words.

Perhaps we're interested in not only the relative frequency of medication words, but the frequency with which the different medications are mentioned. We actually already collected this data during our last interrogator() query.

We can print just the top five results in a number of different ways:


In [81]:
# 1. pandas syntax
rel_medwords.results.iloc[:,0:5]


Out[81]:
lamictal lithium seroquel depakote abilify
01 19.530393 20.298442 12.639895 10.840465 7.614659
02 18.170619 21.127194 13.242994 11.733908 7.668617
03 19.565978 19.950998 12.565628 10.465523 8.155408
04 19.288512 20.104439 12.271540 12.695822 7.571802
05 23.552426 21.478873 12.676056 9.976526 7.981221
06 23.876901 21.577644 10.753449 11.071808 8.348072
07 22.459350 23.069106 12.940379 7.554201 9.891599
08 20.844687 23.228883 12.806540 8.208447 6.539510
09 27.334669 24.128257 12.104208 6.492986 6.372745
10 27.360595 14.684015 14.089219 8.810409 9.665428

In [82]:
# 2: pandas syntax
rel_medwords.results[rel_medwords.results.columns[:5]]


Out[82]:
lamictal lithium seroquel depakote abilify
01 19.530393 20.298442 12.639895 10.840465 7.614659
02 18.170619 21.127194 13.242994 11.733908 7.668617
03 19.565978 19.950998 12.565628 10.465523 8.155408
04 19.288512 20.104439 12.271540 12.695822 7.571802
05 23.552426 21.478873 12.676056 9.976526 7.981221
06 23.876901 21.577644 10.753449 11.071808 8.348072
07 22.459350 23.069106 12.940379 7.554201 9.891599
08 20.844687 23.228883 12.806540 8.208447 6.539510
09 27.334669 24.128257 12.104208 6.492986 6.372745
10 27.360595 14.684015 14.089219 8.810409 9.665428

In [83]:
# loop plus pandas syntax
for col in rel_medwords.results.columns[:5]:
    print rel_medwords.results[col]


01    19.530393
02    18.170619
03    19.565978
04    19.288512
05    23.552426
06    23.876901
07    22.459350
08    20.844687
09    27.334669
10    27.360595
Name: lamictal, dtype: float64
01    20.298442
02    21.127194
03    19.950998
04    20.104439
05    21.478873
06    21.577644
07    23.069106
08    23.228883
09    24.128257
10    14.684015
Name: lithium, dtype: float64
01    12.639895
02    13.242994
03    12.565628
04    12.271540
05    12.676056
06    10.753449
07    12.940379
08    12.806540
09    12.104208
10    14.089219
Name: seroquel, dtype: float64
01    10.840465
02    11.733908
03    10.465523
04    12.695822
05     9.976526
06    11.071808
07     7.554201
08     8.208447
09     6.492986
10     8.810409
Name: depakote, dtype: float64
01    7.614659
02    7.668617
03    8.155408
04    7.571802
05    7.981221
06    8.348072
07    9.891599
08    6.539510
09    6.372745
10    9.665428
Name: abilify, dtype: float64

In [85]:
# a corpkit function
quickview(rel_medwords, 5)


  0: lamictal
  1: lithium
  2: seroquel
  3: depakote
  4: abilify

With all of this data, we can now do some serious plotting.


In [191]:
perc_medwords = editor(med_words.results, '%', allwords.totals)

plotter('Medication word / all medication words', rel_medwords.results, figsize = (12, 6))

# selecting a slice using some pandas syntax
plotter('Medication word / all words', perc_medwords.results.iloc[0:5,5:10], figsize = (12, 6))


***Processing results***
========================

***Done!***
========================

Customising visualisations

We can use plotter() arguments to customise what our chart shows.

plotter()'s possible arguments are:

plotter() argument Mandatory/default? Use Type
title mandatory A title for your plot string
results mandatory the results you want to plot interrogator() or editor() output
num_to_plot 7 Number of top entries to show int
x_label False custom label for the x-axis str
y_label False custom label for the y-axis str
figsize (13, 6) set the size of the figure tuple: (length, width)
tex 'try' use TeX to generate image text boolean
style 'ggplot' use Matplotlib styles str: 'dark_background', 'bmh', 'grayscale', 'ggplot', 'fivethirtyeight'
legend_pos 'default' legend position str: 'outside right' to move legend outside chart
show_totals False Print totals on legend or plot where possible str: 'legend', 'plot', 'both', or 'False'
save False Save to file True: save as title.png. str: save as str
colours 'Paired' plot colours str: any of Matpltlib's colormaps
cumulative False plot entries cumulatively bool
**kwargs False pass other options to Pandas plot/Matplotlib rot = 45, subplots = True, fontsize = 16, etc.

You can easily use these to get different kinds of output. Try changing some parameters below:


In [ ]:
plotter('Relative frequencies of medication words', rel_medwords.results, y_label = 'Percentage of all medication words', num_to_plot = 5, style = 'fivethirtyeight', legend_pos = 'lower left')

Another neat thing you can do is save the results of an interrogation, so they don't have to be run the next time you load this notebook:


In [ ]:
# specify what to save, and a name for the file.
save_result(allwords, 'allwords')

You can then load these results:


In [ ]:
fromfile_allwords = load_result('allwords')
fromfile_allwords.totals

You can also load every saved interrogation into a dictionary:


In [ ]:
r = load_all_results()

quickview()

quickview() is a function that quickly shows the n most frequent items in a list. Its arguments are:

  1. an interrogator() result
  2. number of results to show (default = 50)

In [203]:
quickview(med_words, n = 10)


  0: lamictal
  1: lithium
  2: seroquel
  3: depakote
  4: abilify
  5: zyprexa
  6: prozac
  7: lexapro
  8: geodon
  9: klonopin

conc()

For concordancing, there is conc(), which produces concordances of a subcorpus based on Tregex queries. Its arguments are:

  1. A subcorpus to search (remember to put it in quotation marks!)
  2. A Tregex query

In [204]:
lines = conc('data/postcounts/10', ['angry', 'mad', 'upset'], random = True, n = 20)


16:03:51: Getting concordances for data/postcounts/10 ... 
          Query: /(?i)^(angry|mad|upset)$/ !< __

0                      I find myself getting   angry   with him a lot and making fun of him    
1                                   I get so   angry   at times and do n't know how to deal    
2                                She got all   upset   at my saying that ... she started       
3                                         As   upset   as her life is she only gets more of her
4   go ... do your best and please do n't be   upset   if you do n't get your license just yet 
5                     Sometimes I still feel   angry   about the fact that I have bipolar and  
6      of the time it 's just a page full of   angry   words , jumbles , sentences written all 
7   the happy kind , I 'm super agitated and   angry   at everyone and everything              
8                                     He got   upset   when they told him that in order for him
9   m feeling a little better , but am still   angry   and tearful                             
10      her that he ca n't eat when they get     mad   at one another                          
11       stable , so that you 're not always   upset   or angry or sad , it can and does happen
12   the church specifically because she was   upset   and they knew you 'd be there           
13     Hello Jenn , I hope I do n't make you   angry   by what I 'm going to say , -LRB- that  
14    he did he left and Erin came to me all   upset   saying that she had to go after him     
15    `` Is this really how I want to feel ,   angry   , irritated , depressed at everything   
16   all of a sudden walking in your home so   angry   and irritable all the time , shouting   
17              While I understood her being   upset   with that I needed her to understand my 
18    and how easily she gets frustrated and   upset   I am not really comfortable signing off 
19                                So she got   upset   about that insisting that she had to    

conc() automatically prints concordance lines alongside their index, so that you can easily manipulate them.


In [ ]:
lines = conc('data/postcounts/4', r'/VB.?/ << /(?i).?\bhelp.*?\b/', 
          random = True, window = 50)

# > **Tip:** You can concordance any `interrogator()` query, to make sure the expected things are being matched.

Keywords, ngrams and collocates

There are also functions for keywording, ngramming and collocation. Currently, these work with csv output from conc(). keywords() produces both keywords and ngrams. It relies on code from the Spindle project.


In [ ]:
keys, ngrams = keywords('data/postcounts/01', dictionary = 'bnc.p')
for key in keys[:10]:
    print key
for ngram in ngrams:
    print ngram

In [ ]:
keys, ngrams = keywords('data/postcounts/01', dictionary = 'bnc.p')
for key in keys[:10]:
    print key
for ngram in ngrams:
    print ngram

With the collocates() function, you can specify the maximum distance at which two tokens will be considered collocats.


In [ ]:

Now you're familiar with the corpus and functions. Before beginning the corpus interrogation, let's also learn a bit about Systemic Functional Linguistics---the theory of language that underlies my analytical approach.

Functional linguistics

Functional linguistics is a research area concerned with how realised language (lexis and grammar) work to achieve meaningful social functions. One functional linguistic theory is Systemic Functional Linguistics, developed by Michael Halliday.


In [ ]:
from IPython.display import HTML
HTML('<iframe src=http://en.mobile.wikipedia.org/wiki/Michael_Halliday?useformat=mobile width=700 height=350></iframe>')

Central to the theory is a division between experiential meanings and interpersonal meanings.

  • Experiential meanings communicate what happened to whom, under what circumstances.
  • Interpersonal meanings negotiate identities and role relationships between speakers

Halliday argues that these two kinds of meaning are realised simultaneously through different parts of English grammar.

  • Experiential meanings are made through transitivity choices.
  • Interpersonal meanings are made through mood choices

Here's one visualisation of it. We're concerned with the two left-hand columns. Each level is an abstraction of the one below it.



Transitivity choices include fitting together configurations of:

  • Participants (a man, green bikes)
  • Processes (sleep, has always been, is considering)
  • Circumstances (on the weekend, in Australia)

As in the following example:

I can wear the same outfit all weekend
Participant Process Participant Circumstance

Mood features of a language include:

  • Mood types (declarative, interrogative, imperative)
  • Modality (would, can, might)
  • Lexical density---the number of words per clause, the number of content to non-content words, etc.

Lexical density is usually a good indicator of the general tone of texts. The language of academia, for example, often has a huge number of nouns to verbs. We can approximate an academic tone simply by making nominally dense clauses:

  The consideration of interest is the potential for a participant of a certain demographic to be in Group A or Group B.

Notice how not only are there many nouns (consideration, interest, potential, etc.), but that the verbs are very simple (is, to be).

In comparison, informal speech is characterised by smaller clauses, and thus more verbs.

  A: Did you feel like dropping by?
  B: I thought I did, but now I don't think I want to

Here, we have only a few, simple nouns (you, I), with more expressive verbs (feel, dropping by, think, want)

Like transitivity groups and phrases,

Within the clause, mood constituents are the subject, finite, predicator, complement, and adjunct.

I can wear the same outfit all weekend
Subject Finite Predicator Complement Adjunct

Note that these do not always coincide perfectly with transitivity annotations.

Note: SFL argues that through grammatical metaphor, one linguistic feature can stand in for another. Would you please shut the door? is an interrogative, but it functions as a command. invitation is a nominalisation of a process, invite. We don't have time to deal with these kinds of realisations, unfortunately.

A functional analysis of language in the bipolar disorder forum

A discourse analysis that is not based on grammar is not an analysis at all, but simply a running commentary on a text. - M.A.K. Halliday, 1994

Mood features

Sorry, because of substantial revisions to corpkit, this part is still under construction.

Mood types


In [39]:
query = ([u'Declarative', r'S < (NP $++ VP)'],
    [u'Interrogative', r'ROOT !<<, (MD < __ ) ( < /(SBARQ|SINV|SQ)/ | << (/\?/ !< __))'],
    [u'Imperative', r'VP !<1 (__ < /(?i)\b(thank|hello|hey|hi|am|had)\b/) !<1 /(?i)\bto\b/ !<<# /\b(VBG|VBN|VBZ|VBD)\b/ >1 (S !>> VP > (ROOT !<< /\?/))'],
    [u'Modalised interrogative', r'ROOT <<, (MD < __ ) ( < /(SBARQ|SINV|SQ)/ | << (/\?/ !< __))'])
moods = multiquery(corpus, query)


 00:43:25: Finished! 2110 total occurrences.

00:43:25: Finished! 4 unique results, 1072521 total.

In [118]:
clauses = r['clauses']
rel_moods = editor(moods.results, '%', moods.totals)
plotter('Moods', rel_moods.results, figsize = (12, 10), style = 'fivethirtyeight', subplots = True)


***Processing results***
========================

***Done!***
========================

Modalisation


In [35]:
modals_unlemmatised = interrogator(corpus, 'words', 'modals', lemmatise = False)


 23:52:22: Finished! 18 unique results, 159544 total.


In [27]:
modals = interrogator(corpus, 'words', 'modals', lemmatise = True)


 23:17:16: Finished! 8 unique results, 159544 total.


In [36]:
rel_modals_unlemmatised = editor(modals_unlemmatised.results, '%', modals_unlemmatised.totals)
rel_modals2_unlemmatised = editor(modals_unlemmatised.results, '%', allwords.totals)
rel_modals = editor(modals.results, '%', modals.totals)
rel_modals2 = editor(modals.results, '%', allwords.totals)


***Processing results***
========================

***Done!***
========================


***Processing results***
========================

***Done!***
========================


***Processing results***
========================

***Done!***
========================


***Processing results***
========================

***Done!***
========================


In [202]:
#plotter('Modalisation over the course of membership (unlemmatised)', rel_modals_unlemmatised.results, 
#        figsize = (14, 6), style = 'bmh')
plotter('Modalisation over the course of membership (unlemmatised)', rel_modals2_unlemmatised.results, 
        figsize = (14, 6), style = 'bmh')
#plotter('Modalisation over the course of membership (lemmatised)', rel_modals.results, 
#        figsize = (14, 6), style = 'bmh')
plotter('Modalisation over the course of membership (lemmatised)', rel_modals2.results, 
        figsize = (14, 6), style = 'bmh')


We can use concordancing to understand what's happening:


In [18]:
i_would_adjunct = r'MD < /(?i)would/ > (VP << RB $ (NP <<# /(?i)^i$/))'

In [20]:
lines = conc('data/postcounts/01', i_would_adjunct, n = 30, random = True, window = 50)


23:00:48: Getting concordances for data/postcounts/01 ... 
          Query: MD < /(?i)would/ > (VP << RB $ (NP <<# /(?i)^i$/))

0                           Thanks for listening and I   would   greatly appreciate any response                   
1                            So I quit taking it and I   would   get so much energy and I could be up half the     
2                                                    I   would   literally destroy anything he liked or had before 
3                                                 Or I   would   see my family and clam up again                   
4                                                    I   would   keep calling the doc for sure ... and also start  
5        But as I have had several wrong diagnosis , I   would   like your real life experience to also compare my 
6    I think this is what anxiety feels like ... but I   would   n't know for sure                                 
7             it does get me to sleep and without it I   would   be awake for days                                 
8     I have forgotten what they are ... I guess all I   would   really like to know is , does this seem like      
9                                          Sometimes I   would   have to turn around and come right back because   
10    that Zoloft lowers sperm count and my wife and i   would   like to start a family soon                       
11                                                   I   would   like to know if anyone else has this problem and  
12                                                   I   would   like any info on the two ever being linked or     
13                                                   i   would   b really gratefull                                
14                                                   I   would   say if you are n't feeling an effect by now , ask 
15                               I would n't sleep , I   would   n't stop                                          
16                                                   I   would   consider getting a new lawyer as well ... this is 
17                   If I took the prescribed dosage I   would   not have been able to work                        
18                                         Dr.s said I   would   never be able to walk again normally              
19                         If anyone can advise me , I   would   be so thankful                                    
20  - and I hope she also has some suggestions , but I   would   love to hear about something that may help me now 
21                                                   I   would   rather die than be fat                            
22                                                   I   would   be physically agressive with my parents and sister
23                                                   I   would   not call it bipolar but like the rest of us you do
24                                                   I   would   suggest getting an professional opinion , whether 
25     excellent , i felt like i could do anything , I   would   sign myself up for loads of courses at night      
26                                                   i   would   be on the look out for a moderate to severe       
27                                                   I   would   love to get stats to show my family that most     
28                Hello Everyone , I am new here and I   would   like to ask a question , but first let me tell you
29                                   Hope i helped , I   would   sure appreciate any information on the disability 

In [24]:
from dictionaries.process_types import processes
# i_would_adjunct = r'MD < /(?i)would/ > (VP << RB $ (NP <<# /(?i)^i$/))'
i_would_adjunct_mental = r'MD < /(?i)would/ > (VP ( <+(VP) (VP < (/VB.?/ < /%s/) !< VP)) << RB $ (NP <<# /(?i)^i$/))' % processes.mental

In [25]:
lines = conc('data/postcounts/01', i_would_adjunct_mental, n = 30, random = True, window = 50)


23:11:46: Getting concordances for data/postcounts/01 ... 
          Query: MD < /(?i)would/ > (VP ( <+(VP) (VP < (/VB.?/ < /(?i)\b((abide{0,2}|abominate{0,2}|accept{0,2}|acknowledge{0,2}|acquiesce{0,2}|adjudge{0,2}|adore{0,2}|affirm{0,2}|agree{0,2}|allow{0,2}|allure{0,2}|anticipate{0,2}|appreciate{0,2}|ascertain{0,2}|aspire{0,2}|assent{0,2}|assume{0,2}|begrudge{0,2}|believe{0,2}|calculate{0,2}|care{0,2}|conceal{0,2}|concede{0,2}|conceive{0,2}|concern{0,2}|conclude{0,2}|concur{0,2}|condone{0,2}|conjecture{0,2}|consent{0,2}|consider{0,2}|contemplate{0,2}|convince{0,2}|crave{0,2}|decide{0,2}|deduce{0,2}|deem{0,2}|delight{0,2}|desire{0,2}|determine{0,2}|detest{0,2}|discern{0,2}|discover{0,2}|dislike{0,2}|doubt{0,2}|dread{0,2}|enjoy{0,2}|envisage{0,2}|estimate{0,2}|excuse{0,2}|expect{0,2}|exult{0,2}|fear{0,2}|foreknow{0,2}|foresee{0,2}|gather{0,2}|grant{0,2}|grasp{0,2}|hate{0,2}|hope{0,2}|hurt{0,2}|hypothesise{0,2}|hypothesize{0,2}|imagine{0,2}|infer{0,2}|inspire{0,2}|intend{0,2}|intuit{0,2}|judge{0,2}|ken{0,2}|lament{0,2}|like{0,2}|loathe{0,2}|love{0,2}|marvel{0,2}|mind{0,2}|miss{0,2}|need{0,2}|neglect{0,2}|notice{0,2}|observe{0,2}|omit{0,2}|opine{0,2}|perceive{0,2}|plan{0,2}|please{0,2}|posit{0,2}|postulate{0,2}|pray{0,2}|preclude{0,2}|prefer{0,2}|presume{0,2}|presuppose{0,2}|pretend{0,2}|provoke{0,2}|realize{0,2}|realise{0,2}|reason{0,2}|recall{0,2}|reckon{0,2}|recognise{0,2}|recognize{0,2}|recollect{0,2}|reflect{0,2}|regret{0,2}|rejoice{0,2}|relish{0,2}|remember{0,2}|resent{0,2}|resolve{0,2}|rue{0,2}|scent{0,2}|scorn{0,2}|sense{0,2}|settle{0,2}|speculate{0,2}|suffer{0,2}|suppose{0,2}|surmise{0,2}|surprise{0,2}|suspect{0,2}|trust{0,2}|visualise{0,2}|visualize{0,2}|want{0,2}|wish{0,2}|wonder{0,2}|yearn{0,2}|rediscover{0,2})(s|es|ed|ing|)|(choose|chooses|chose|chosen|choosing|dream|dreams|dreamed|dreamt|dreaming|fancy|fancies|fancied|fancying|feel|feels|felt|feeling|find|finds|found|finding|figure|figures|figured|figuring|forget|forgets|forgot|forgotten|forgetting|hear|hears|heard|hearing|know|justify|justifies|justified|justifying|knows|knew|known|knowing|learn|learns|learned|learnt|learning|mean|means|meant|meaning|overhear|overhears|overheard|overhearing|prove|proves|proved|proven|proving|read|reads|see|sees|saw|seen|seeing|smell|smells|smelled|smelt|smelling|think|thinks|thought|thinking|understand|understands|understood|understanding|worry|worries|worried|worrying))\b/) !< VP)) << RB $ (NP <<# /(?i)^i$/))

0        So if any one is thinking about taking them I   would   say go for it and see what happens the wrost thing
1                                                    I   would   appreciate any help with these specific problems  
2                                               Here I   would   like to note that before starting the program I   
3      or that I 'm not good enough for her and then i   would   just think about that                             
4                                       i am new but i   would   like to said o am pleased to hear that you are out
5                                                    i   would   n't want to be on that cocktail , personally      
6                           My boyfriend is mad that I   would   choose to be miserable over seeking help because  
7                                          Sometimes I   would   feel like something was going to get me if I did n
8                                             I really   would   n't worry bout the lithium the people I know on it
9                                                    I   would   n't even think of it in the depression family and 
10   either as someone with bp or a spouse or mate , I   would   appreciate it so much                             
11      , i was going deeper and deeper into sleep , i   would   also hear voices , only when i would nap during   
12                                                   I   would   also like to know how to deal with her moods      
13                             Analyze if you will , I   would   truly appreciate it                               
14                                  As far as luck , i   would   n't know                                          
15                                                   I   would   just like to know how can you tell if someone is  
16     interesting topic , and I did n't expect that I   would   ever find a course like that out there            
17                                                   I   would   really like to talk with someone who has been     
18                                           I guess I   would   just like to hear if anyone has been where I 'm at
19                                         I certainly   would   never have thought bipolar was an issue for me    
20   addressed more than one concern above ... but , I   would   really like to know about anyone else 's          
21  reactions to certain medications are different , I   would   just like to know everyone 's experience with the 
22                                                   I   would   consider taking Zyprexa again                     
23                                       It was like I   would   n't even realize what I was doing until I got     
24                                                   i   would   literally find it impossible to do anything that i
25       I would start a story and forget the end so I   would   just be like `` so yeah ... '' and I was tired and
26                                                   I   would   think that for this to work , he had to suffer the
27        It 's good to know I am not alone , though I   would   never wish this on anyone                         
28                People would confront me later and i   would   n't recall what happened                          
29                                                   I   would   like to know aswell , how can you possibly go     

In [121]:
from dictionaries.process_types import processes
# i_would_adjunct = r'MD < /(?i)would/ > (VP << RB $ (NP <<# /(?i)^i$/))'
i_would_not_mental = r'MD < /(?i)would/ > (VP ( <+(VP) (VP < (/VB.?/ !< /%s/ !< /%s/) !< VP)) << RB $ (NP <<# /(?i)^i$/))' % (processes.verbal, processes.mental)

In [122]:
lines = conc('data/postcounts/01', i_would_not_mental, n = 30, random = True, window = 50)


10:51:26: Getting concordances for data/postcounts/01 ... 
          Query: MD < /(?i)would/ > (VP ( <+(VP) (VP < (/VB.?/ !< /(?i)\b((accede{0,2}|add{0,2}|admit{0,2}|advise{0,2}|advocate{0,2}|allege{0,2}|announce{0,2}|answer{0,2}|apprise{0,2}|argue{0,2}|ask{0,2}|assert{0,2}|assure{0,2}|attest{0,2}|aver{0,2}|avow{0,2}|bark{0,2}|beg{0,2}|bellow{0,2}|blubber{0,2}|boast{0,2}|brag{0,2}|cable{0,2}|claim{0,2}|comment{0,2}|complain{0,2}|confess{0,2}|confide{0,2}|confirm{0,2}|contend{0,2}|convey{0,2}|counsel{0,2}|declare{0,2}|demand{0,2}|disclaim{0,2}|disclose{0,2}|divulge{0,2}|emphasise{0,2}|emphasize{0,2}|exclaim{0,2}|explain{0,2}|forecast{0,2}|gesture{0,2}|grizzle{0,2}|guarantee{0,2}|hint{0,2}|holler{0,2}|indicate{0,2}|inform{0,2}|insist{0,2}|intimate{0,2}|mention{0,2}|moan{0,2}|mumble{0,2}|murmur{0,2}|mutter{0,2}|note{0,2}|object{0,2}|offer{0,2}|phone{0,2}|pledge{0,2}|preach{0,2}|predicate{0,2}|preordain{0,2}|proclaim{0,2}|profess{0,2}|prohibit{0,2}|promise{0,2}|propose{0,2}|protest{0,2}|reaffirm{0,2}|reassure{0,2}|rejoin{0,2}|remark{0,2}|remind{0,2}|repeat{0,2}|report{0,2}|request{0,2}|require{0,2}|respond{0,2}|retort{0,2}|reveal{0,2}|riposte{0,2}|roar{0,2}|scream{0,2}|shout{0,2}|signal{0,2}|state{0,2}|stipulate{0,2}|telegraph{0,2}|telephone{0,2}|testify{0,2}|threaten{0,2}|vow{0,2}|warn{0,2}|wire{0,2}|reemphasise{0,2}|reemphasize{0,2}|rumor{0,2}|rumour{0,2})(s|es|ed|ing|)|(certify|certifies|certified|certified|certifying|deny|denies|denied|denied|denying|forbid|forbids|forbade|forbidden|forbidding|foretell|foretells|foretold|foretold|foretelling|forswear|forswears|forswore|forsworn|forswearing|imply|implies|implied|implied|implying|move|moves|moved|moved|moving|notify|notifies|notified|notified|notifying|prophesy|prophesies|prophesied|prophesied|prophesying|reply|replies|replied|replied|replying|say|says|said|said|saying|specify|specifies|specified|specified|specifying|swear|swears|swore|sworn|swearing|tell|tells|told|told|telling|write|writes|wrote|written|writing))\b/ !< /(?i)\b((abide{0,2}|abominate{0,2}|accept{0,2}|acknowledge{0,2}|acquiesce{0,2}|adjudge{0,2}|adore{0,2}|affirm{0,2}|agree{0,2}|allow{0,2}|allure{0,2}|anticipate{0,2}|appreciate{0,2}|ascertain{0,2}|aspire{0,2}|assent{0,2}|assume{0,2}|begrudge{0,2}|believe{0,2}|calculate{0,2}|care{0,2}|conceal{0,2}|concede{0,2}|conceive{0,2}|concern{0,2}|conclude{0,2}|concur{0,2}|condone{0,2}|conjecture{0,2}|consent{0,2}|consider{0,2}|contemplate{0,2}|convince{0,2}|crave{0,2}|decide{0,2}|deduce{0,2}|deem{0,2}|delight{0,2}|desire{0,2}|determine{0,2}|detest{0,2}|discern{0,2}|discover{0,2}|dislike{0,2}|doubt{0,2}|dread{0,2}|enjoy{0,2}|envisage{0,2}|estimate{0,2}|excuse{0,2}|expect{0,2}|exult{0,2}|fear{0,2}|foreknow{0,2}|foresee{0,2}|gather{0,2}|grant{0,2}|grasp{0,2}|hate{0,2}|hope{0,2}|hurt{0,2}|hypothesise{0,2}|hypothesize{0,2}|imagine{0,2}|infer{0,2}|inspire{0,2}|intend{0,2}|intuit{0,2}|judge{0,2}|ken{0,2}|lament{0,2}|like{0,2}|loathe{0,2}|love{0,2}|marvel{0,2}|mind{0,2}|miss{0,2}|need{0,2}|neglect{0,2}|notice{0,2}|observe{0,2}|omit{0,2}|opine{0,2}|perceive{0,2}|plan{0,2}|please{0,2}|posit{0,2}|postulate{0,2}|pray{0,2}|preclude{0,2}|prefer{0,2}|presume{0,2}|presuppose{0,2}|pretend{0,2}|provoke{0,2}|realize{0,2}|realise{0,2}|reason{0,2}|recall{0,2}|reckon{0,2}|recognise{0,2}|recognize{0,2}|recollect{0,2}|reflect{0,2}|regret{0,2}|rejoice{0,2}|relish{0,2}|remember{0,2}|resent{0,2}|resolve{0,2}|rue{0,2}|scent{0,2}|scorn{0,2}|sense{0,2}|settle{0,2}|speculate{0,2}|suffer{0,2}|suppose{0,2}|surmise{0,2}|surprise{0,2}|suspect{0,2}|trust{0,2}|visualise{0,2}|visualize{0,2}|want{0,2}|wish{0,2}|wonder{0,2}|yearn{0,2}|rediscover{0,2})(s|es|ed|ing|)|(choose|chooses|chose|chosen|choosing|dream|dreams|dreamed|dreamt|dreaming|fancy|fancies|fancied|fancying|feel|feels|felt|feeling|find|finds|found|finding|figure|figures|figured|figuring|forget|forgets|forgot|forgotten|forgetting|hear|hears|heard|hearing|know|justify|justifies|justified|justifying|knows|knew|known|knowing|learn|learns|learned|learnt|learning|mean|means|meant|meaning|overhear|overhears|overheard|overhearing|prove|proves|proved|proven|proving|read|reads|see|sees|saw|seen|seeing|smell|smells|smelled|smelt|smelling|think|thinks|thought|thinking|understand|understands|understood|understanding|worry|worries|worried|worrying))\b/) !< VP)) << RB $ (NP <<# /(?i)^i$/))

0    through this pregnancy I felt terrible , my moods   would   swing severly , I would pick fights with my       
1        So if any one is thinking about taking them I   would   say go for it and see what happens the wrost thing
2                    had I would return , sometimes it   would   work and sometimes it would not                   
3    are up for being responsible for another life , i   would   highly recommend adding a lil pup to your life    
4   He would spend days sleeping in our basement and I   would   not be permitted to know why                      
5      and go to school with TONS of energy ... and it   would   be like that all week ... and I would eventually  
6   He then admits to me that `` he has always known I   would   leave him , it was just a matter of when '' -LRB  
7   The trouble I have with the fast track - is that I   would   have to also become like that to one-up him - and 
8                                             I know I   would   n't do it                                         
9                         AT this point in my life , I   would   do almost anything to be me again                 
10    i had to flush my meds down the toilet just so i   would   n't take em , the pain is excrutiating &amp; i was
11            - I would think after all this time , he   would   trust me more than that ? -RRB                    
12  m fine around her but when we get into auguments i   would   yell a lot and just leave and I 'm always thinking
13      on the medication just to feel something but i   would   not recommend it                                  
14       this post ... it does n't sound like me ... I   would   never call myself `` solctice '' out of youthful  
15            I would have dreams , no nightmares that   would   wake me and leave me sweating from being scared of
16                                      So I thought I   would   just give a short intro while lurking around here 
17  the church -LRB- attached to my school -RRB- and i   would   die holding this thing -LRB- ca n't remember what 
18   but it 's because I really do n't love him , or I   would   n't act the way I act ... something about T-shirts
19        making a bad situation a whole lot worse - I   would   be starting to crawl up my own whatsit by now     
20                                                   I   would   try this , First you need to get off your         
21                                        One minute I   would   be laughing hysterically and the next minute I    
22                                          Hi Kat , I   would   gladly exchange a little of my depression with    
23                                                       would   quite making me go through the full blown swings  
24         I ca n't even do simple daily things that I   would   do without even thinking about                    
25                            If I were in HOllywood I   would   probably earn lots of money and be accepted       
26  't want to go on and upset others with conditons i   would   n't be bothered about if i had , in respect for   
27                              If I was a carrier , I   would   not have kid because it is a fatal disorder , but 
28                                                   I   would   definitely encourage him to do so                 
29                                       I do things I   would   never normally do , ie ; try and set fire to my   

Thematic coding of 'I would + adjunct' in first and last post groups


In [201]:



Being and having bipolar


In [1]:
from corpkit import *
%matplotlib inline
govs = load_result('bp_govrole_lem_collcc')

In [2]:
govs.query


Out[2]:
{'datatype': dtype('int64'),
 'dep_type': 'collapsed-ccprocessed-dependencies',
 'dictionary': 'bnc.p',
 'function': 'interrogator',
 'function_filter': False,
 'lemmatag': False,
 'lemmatise': True,
 'option': 'g',
 'path': 'data/postcounts',
 'phrases': False,
 'plaintext': False,
 'query': '(?i)\\b(bipolar|bi-polar|bp|b-p)',
 'quicksave': 'bp_govrole_lem_collcc.p',
 'spelling': False,
 'table_size': 50,
 'time_ended': '2015-06-18 06:37:20',
 'time_started': '2015-06-17 22:20:00',
 'titlefilter': False,
 'translated_option': 'g'}

In [127]:
govs.results


Out[127]:
amod:disorder root:root dobj:have prep_with:diagnose ccomp:think acomp:have nn:ii prep_as:diagnose nsubj:be amod:people ccomp:know prep_with:people dobj:diagnose amod:type nsubj:have prep_of:symptom amod:1 ccomp:say amod:med amod:person amod:depression nn:disorder amod:ii prep_of:part amod:symptom amod:i prep_with:deal dobj:treat amod:patient nn:med amod:child prep_for:med ccomp:tell nn:people prep_of:diagnosis amod:2 acomp:diagnose amod:diagnosis prep_of:type xcomp:diagnose prep_from:suffer rcmod:people appos:hug amod:thing prep_about:know prep_for:take nsubj:go ccomp:believe advcl:be nn:person prep_for:be prep_of:side advcl:know nn:symptom amod:illness prep_with:person prep_with:someone amod:episode prep_of:form nsubj:do nn:diagnosis prep_like:sound prep_with:live dobj:manage prep_with:do ccomp:be rcmod:someone nn:i nn:depression prep_about:learn prepc_as:diagnose dobj:understand amod:board ccomp:find advcl:wonder prep_about:read amod:cycling prep_of:sign prep_with:have dep:tell rcmod:friend amod:issue nsubj:take amod:i. dep:be rcmod:somebody prep_with:be dobj:get prep_for:treat prep_for:use amod:group conj_and:have nsubj:affect nsubj:make nn:group amod:year amod:iii prep_to:due dep:have prep_with:those pobj:with nsubj:get prep_for:treatment conj_and:adhd amod:medication nn:thing amod:disease nsubj:worse nsubj:illness prep_with:dx nn:board amod:individual advcl:have ccomp:sure dep:. nsubj:disorder prep_on:read amod:condition advcl:sound nsubj:cause prep_between:difference acomp:treat dep:know xcomp:have prep_with:help conj_and:depression amod:sufferer conj_and:bipolar amod:one dobj:use nsubj:tend prep_for:medication nn:sufferer conj_and:be amod:son prep_with:one prepc_of:part prep_with:experience rcmod:those amod:no nn:type prep_to:new dep:not amod:mania dobj:call amod:problem rcmod:daughter prep_about:talk ccomp:suspect nsubj:need amod:??? ccomp:have prep_of:history prep_with:cope nsubj:disease dobj:think prep_with:know prep_for:work conj_and:add acomp:sound prep_to:relate rcmod:one ccomp:realize prep_about:tell nn:patient dep:think dobj:help amod:kid dobj:control prep_to:change acomp:understand prep_of:aspect amod:drug nsubj:something conj_and:bpd prep_on:do prepc_with:diagnose prep_with:struggle amod:mind amod:swing prep_of:case nsubjpass:treat prep_about:educate pobj:for prep_of:lot prep_for:prescribe prep_with:anyone conj_and:disorder amod:tendency dobj:cause advcl:see dobj:say advcl:sure ccomp:mean nsubjpass:diagnose advcl:tell prep_about:much amod:daughter advcl:say prep_in:specialize nn:dx amod:friend amod:guide rcmod:dx amod:woman rcmod:son prep_about:thing prep_with:associate rcmod:child nn:wife advcl:do prep_to:related rcmod:anyone prep_with:child conj_and:bp dep:i prep_with:come agent:cause amod:?? prep_of:dx rcmod:i prep_with:go nsubj:imbalance xcomp:call conj_or:depression prep_with:suffer amod:behavior prep_with:would nn:cycling ccomp:suggest prep_for:good nsubj:start nsubj:come acomp:be nsubj:thing ccomp:decide prep_with:get amod:dx nsubj:condition pobj:of pobj:but amod:experience prep_on:blame prep_about:find prep_for:drug amod:folk ccomp:admit ccomp:feel parataxis:have nsubj:seem ... parataxis:guess parataxis:upset parataxis:put parataxis:relaise parataxis:research parataxis:right parataxis:search parataxis:see parataxis:seek parataxis:seem parataxis:sever parataxis:son parataxis:stand parataxis:stare parataxis:strange parataxis:stuff parataxis:stump parataxis:talk parataxis:test parataxis:textbook parataxis:this parataxis:time parataxis:too parataxis:try parataxis:read parataxis:post parataxis:handle parataxis:please parataxis:happen parataxis:happy parataxis:he parataxis:head parataxis:hear parataxis:honest parataxis:inhereted parataxis:keep parataxis:kidding parataxis:last parataxis:likely parataxis:lurk parataxis:manic parataxis:marry parataxis:misunderstand parataxis:need parataxis:normal parataxis:old parataxis:pass parataxis:people parataxis:pick poss:unmedicated poss:view poss:writer prep_along_with:diagnose prep_along_with:oc/anxieties prep_along_with:ocd prep_along_with:rage prep_along_with:run prep_along_with:seizure prep_along_with:sever prep_along_with:start prep_along_with:take prep_along_with:well prep_alongside:disorder prep_among:exist prep_among:feeling prep_among:loss prep_amongst:classic prep_around:be prep_around:environmentally prep_around:evolve prep_around:issue prep_around:manage prep_around:other prep_around:revolve prep_along_with:i prep_along_with:deal prep_as:answer prep_along_with:come prep_about:with prep_about:wondering prep_about:workshop prep_about:year prep_above:rise prep_across:come prep_after:come prep_after:look prep_after:year prep_against:combat prep_against:have prep_against:live prep_against:stigma prep_against:weapon prep_along:somewhere prep_along_with:anger prep_along_with:anrexa prep_along_with:anxiety prep_along_with:bay prep_along_with:be prep_along_with:cause prep_as:accept prep_as:appear prep_about:web prep_as:fit prep_as:hate prep_as:he/she prep_as:healthy prep_as:iii prep_as:important prep_as:judgement prep_as:lamictal prep_as:learn prep_as:look prep_as:low prep_as:make prep_as:mood prep_as:new prep_as:note prep_as:offically prep_as:open prep_as:paranoia prep_as:pattern prep_as:peg prep_as:perceive prep_as:picture prep_as:go prep_as:feel prep_as:bad prep_as:face prep_as:break prep_as:bring prep_as:calssify prep_as:categorize prep_as:challenge prep_as:classification prep_as:confirm prep_as:control prep_as:course prep_as:define prep_as:describe prep_as:diagno prep_as:diagnoe prep_as:diagnosis prep_as:diagose prep_as:diganose prep_as:dio prep_as:doctor prep_as:down prep_as:eventually prep_as:extreme prep_about:website prep_about:way poss:you prep_about:dominant prep_about:everything prep_about:experience prep_about:express prep_about:facet prep_about:family prep_about:feed prep_about:feel prep_about:forget prep_about:frustration prep_about:give prep_about:giveaway prep_about:go prep_about:gritty prep_about:happen prep_about:hate prep_about:honest prep_about:ignorance prep_about:insight prep_about:irritate prep_about:joy prep_about:kno prep_about:else prep_about:discussion prep_about:literature prep_about:discover prep:b prep:b/w prep:consider prep:hey prep:medicate prep:not prep:think prep_about:able prep_about:about prep_about:advice prep_about:allot prep_about:anyone prep_about:board prep_about:bring prep_about:call prep_about:check prep_about:comment prep_about:completely prep_about:confide prep_about:coverage prep_about:diagnose prep_about:like prep_about:mean prep_about:user prep_about:reply prep_about:serious prep_about:site prep_about:skeptical prep_about:some prep_about:sound prep_about:spill prep_about:stuff prep_about:take prep_about:tdoc prep_about:teach prep_about:that prep_about:the prep_about:themselves prep_about:therapist prep_about:thingss prep_about:try prep_about:uk prep_about:uneducated prep_about:unknown prep_about:up prep_about:upfront prep_about:secretive prep_about:recommend prep_about:message prep_about:really prep_about:mine prep_about:misinform prep_about:move prep_about:negativity prep_about:news prep_about:not prep_about:note prep_about:obsess prep_about:offer prep_about:one prep_about:online prep_about:page prep_about:person prep_about:pick prep_about:possible prep_about:press prep_about:private prep_about:public prep_about:questions/answers prep_about:rage prep_about:reading nsubj:etc.
01 727 416 252 353 143 93 63 113 44 41 40 24 42 38 16 35 40 41 34 18 35 31 26 11 31 29 17 13 20 6 16 15 42 9 21 38 47 29 13 28 35 14 0 13 13 14 8 32 10 5 9 4 27 7 17 3 9 6 14 6 10 17 11 2 5 14 16 6 6 4 32 2 7 15 13 15 13 7 8 15 17 9 6 5 11 0 9 2 12 4 9 8 5 6 2 24 1 7 6 3 6 2 6 5 10 2 20 5 2 11 1 5 6 10 5 3 8 9 5 6 1 7 5 8 7 11 12 5 4 4 6 5 5 4 2 3 7 8 3 0 3 12 4 6 4 5 3 4 10 0 3 5 9 3 2 4 2 5 4 3 1 3 7 2 6 6 2 2 2 3 4 3 5 6 4 4 12 5 0 2 4 1 2 5 4 3 7 7 8 1 3 4 7 2 6 5 6 4 2 3 0 8 3 3 2 4 1 3 4 2 4 0 4 0 1 5 4 0 3 8 5 4 2 2 2 4 5 2 1 9 1 2 4 2 2 5 4 3 4 1 1 3 2 2 8 3 2 7 4 5 ... 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0
02 296 225 147 135 55 39 36 45 25 32 22 23 25 18 22 14 14 24 29 19 11 21 23 11 15 20 11 9 17 4 10 11 15 10 13 20 16 12 9 13 13 11 0 11 8 9 8 11 11 11 14 7 10 3 14 4 6 12 11 6 5 8 11 1 2 7 9 3 3 8 10 8 4 5 9 7 7 5 6 7 8 10 3 8 3 0 6 4 5 3 10 5 5 7 1 6 2 6 2 3 4 3 3 4 7 4 5 7 3 1 1 3 4 11 2 2 6 5 0 6 2 0 3 4 3 4 4 2 4 5 2 1 2 5 4 1 4 5 6 4 7 6 2 3 2 3 4 3 1 1 1 4 3 0 2 4 1 2 4 5 1 0 2 3 0 3 2 0 3 4 3 1 2 0 1 3 7 2 0 7 1 2 1 4 0 3 3 1 6 1 6 3 4 3 1 4 3 3 3 5 4 4 0 1 5 1 2 3 2 1 0 4 2 4 1 3 0 2 3 3 2 3 0 4 2 2 2 4 2 2 1 4 1 3 0 4 2 2 1 2 1 1 1 2 1 0 2 2 2 1 ... 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0
03 252 236 126 91 40 48 40 41 24 23 37 24 32 19 16 19 13 25 20 21 10 20 18 16 8 11 16 9 16 12 5 6 10 13 8 13 12 10 9 12 12 5 0 12 9 8 5 6 5 4 8 4 8 5 5 7 6 7 5 4 6 8 10 2 8 11 9 2 4 3 3 7 4 8 14 8 5 5 5 8 6 5 3 5 5 1 2 7 7 6 5 5 3 3 3 6 3 6 6 10 3 5 6 5 2 2 3 1 3 6 3 3 3 4 3 7 6 11 5 2 6 2 3 5 5 5 5 5 3 6 4 6 3 2 2 0 2 1 1 1 3 8 4 5 1 1 5 3 1 1 1 2 2 3 4 1 0 2 3 4 2 6 5 1 1 4 5 3 2 1 3 1 4 0 1 2 2 2 3 1 3 3 1 0 1 1 4 3 1 4 1 1 4 3 2 4 3 1 3 0 3 3 0 0 3 2 1 2 3 2 2 3 4 2 4 2 1 1 0 1 6 0 1 1 1 1 0 2 2 1 2 3 3 2 1 1 4 1 1 1 0 2 1 3 2 1 2 2 5 1 ... 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
04 179 194 148 62 29 25 44 33 21 47 34 20 20 15 19 21 11 14 12 30 6 16 16 16 11 12 16 9 8 10 11 8 10 14 16 13 13 12 10 12 9 9 0 10 7 9 7 12 10 11 12 7 14 9 12 9 7 10 13 12 11 8 7 5 8 3 9 2 3 7 4 8 5 8 5 6 5 5 7 8 5 7 6 9 5 0 7 3 2 4 1 5 7 5 3 3 5 6 5 3 6 8 2 4 8 2 4 4 6 6 2 4 1 1 3 2 5 2 4 4 3 5 5 4 1 1 3 3 1 3 3 2 2 3 7 8 4 5 5 2 2 2 6 2 1 7 4 2 2 5 2 6 2 2 1 6 0 2 0 3 1 1 3 4 4 3 4 3 2 4 3 2 4 1 2 0 0 0 3 1 2 1 2 3 2 2 2 1 2 3 3 6 2 2 1 3 1 2 1 1 2 3 3 2 2 3 0 4 1 2 1 2 3 3 1 1 3 2 4 3 1 3 0 5 1 1 4 3 0 0 2 1 1 1 3 1 2 2 1 4 3 2 4 3 1 1 4 2 0 2 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
05 219 197 132 46 36 31 27 34 17 32 29 28 17 12 18 13 7 14 10 15 8 14 9 20 12 11 11 7 8 15 5 8 5 14 7 12 7 10 10 8 5 4 0 12 6 6 12 8 6 10 10 7 5 7 8 5 9 11 5 10 8 5 9 2 10 8 3 2 1 2 3 6 8 7 3 3 5 6 6 4 7 4 4 6 2 2 6 3 3 8 3 4 2 4 0 1 6 3 3 3 5 6 0 4 3 5 0 3 3 5 10 2 6 0 6 5 2 1 3 3 6 2 7 2 3 1 2 3 6 2 4 2 8 1 2 5 0 3 1 1 3 2 3 1 4 2 0 1 1 2 3 4 1 2 0 4 1 6 1 2 2 3 2 2 4 3 5 1 5 5 3 2 2 1 2 4 0 1 1 3 2 0 1 3 5 3 0 2 1 3 6 5 1 3 2 1 0 9 2 5 4 2 0 1 1 4 4 1 5 5 3 0 0 2 2 2 3 3 2 1 1 1 1 0 2 3 2 3 4 2 0 3 3 1 1 2 5 1 1 2 1 5 2 0 1 2 3 1 2 2 ... 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06 279 167 142 51 24 26 25 24 30 22 27 32 24 26 22 19 3 13 9 9 14 21 6 11 10 17 14 16 6 15 13 13 13 19 8 3 6 10 9 9 3 10 0 14 10 8 7 6 8 19 9 3 5 15 8 9 7 10 8 9 10 9 7 4 15 5 6 15 1 4 4 6 4 3 5 7 3 0 6 5 1 4 10 2 6 1 5 4 5 2 3 6 6 4 6 3 6 2 4 7 8 2 3 4 0 10 1 1 2 4 2 0 0 5 13 1 4 3 3 4 9 3 2 3 3 1 4 2 2 3 3 3 6 5 6 2 4 4 2 0 5 1 3 1 2 2 2 0 4 2 5 2 3 3 2 3 3 6 4 1 5 2 2 3 1 1 1 2 2 4 2 4 2 1 6 7 2 6 5 2 2 2 1 1 5 3 0 3 0 3 0 1 1 2 0 0 0 1 2 2 2 1 2 0 0 3 2 3 2 1 2 0 0 0 2 1 2 0 2 1 2 1 3 0 5 4 1 2 2 2 1 0 1 0 2 0 0 3 1 2 3 0 3 2 2 2 1 1 1 1 ... 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
07 249 166 150 39 31 25 22 19 32 27 17 23 16 44 23 15 17 10 14 12 13 10 18 13 7 11 17 5 18 8 9 6 7 11 1 4 2 6 6 9 8 11 0 6 6 9 16 6 7 7 7 12 6 10 8 7 11 5 3 3 6 3 6 4 7 4 6 36 10 5 2 8 4 3 4 2 1 6 1 3 4 2 7 4 5 9 2 4 1 4 4 6 6 5 3 1 10 5 6 1 3 2 1 2 3 7 2 2 6 4 3 0 4 3 3 4 1 2 5 4 4 1 5 2 3 4 0 2 3 5 2 2 2 3 3 1 2 1 3 4 1 0 2 3 4 0 2 2 2 3 1 1 0 5 2 1 2 0 1 2 2 3 3 1 2 3 1 3 0 3 1 2 3 2 1 2 1 3 4 2 2 1 2 1 1 2 1 0 1 3 1 1 2 1 0 2 4 0 6 1 1 0 3 0 2 2 3 3 0 6 1 3 1 1 2 2 2 3 2 1 0 3 1 4 2 0 3 0 4 0 4 2 1 1 3 1 0 0 1 3 5 0 1 0 1 2 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
08 338 151 134 26 18 17 55 8 24 29 34 27 18 16 18 15 36 16 15 13 12 9 23 18 17 20 18 9 21 16 20 10 4 16 6 4 4 8 11 7 6 14 1 4 12 5 7 6 11 14 9 5 5 10 6 8 7 8 4 9 9 5 5 4 3 6 4 2 4 6 3 7 4 5 5 1 8 4 9 2 2 6 4 11 7 8 5 2 4 5 3 6 4 5 6 1 13 3 7 3 3 1 2 11 5 4 3 5 5 2 4 1 11 3 1 1 3 0 4 3 3 2 2 2 0 3 2 5 6 2 2 3 3 2 1 2 4 2 3 5 0 0 3 4 3 5 3 1 2 8 3 3 1 1 2 1 5 3 2 4 2 3 2 1 0 1 3 5 4 0 2 1 2 4 1 3 1 2 6 6 1 2 1 3 5 3 4 1 2 2 0 1 2 1 3 2 1 0 2 0 4 1 6 0 1 1 1 1 3 2 3 7 4 3 1 3 2 4 1 1 2 3 0 2 0 2 2 0 2 1 5 3 2 6 4 3 1 4 3 1 2 1 0 2 1 1 0 1 1 2 ... 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
09 313 150 172 45 15 19 21 8 29 12 13 22 12 14 23 10 31 8 8 16 17 9 3 12 15 4 13 20 13 12 25 17 3 4 14 1 5 6 19 7 2 14 93 9 10 10 10 4 14 4 4 20 4 4 5 10 9 5 1 10 3 2 3 10 12 8 4 1 13 17 3 7 12 6 1 4 8 7 7 2 3 5 4 0 3 20 5 8 2 7 2 0 5 4 5 1 0 6 4 3 3 4 6 2 3 3 1 7 7 0 6 12 0 1 2 6 2 3 4 3 1 2 4 4 7 0 3 7 2 2 1 4 1 2 2 6 3 0 6 2 1 0 1 2 4 2 5 7 4 3 8 2 2 4 7 1 3 0 2 2 7 4 1 5 0 1 1 4 3 2 1 3 0 5 5 0 0 1 3 0 3 7 1 3 1 1 3 1 1 3 3 1 1 4 2 2 2 3 1 2 1 1 4 2 3 2 3 0 1 0 3 1 2 1 0 1 1 2 2 0 1 0 4 2 1 1 1 4 2 2 2 1 1 3 3 1 0 2 4 1 1 4 0 2 0 2 4 0 2 2 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
10 374 60 333 60 12 52 8 8 34 5 11 39 20 11 26 27 7 5 15 10 34 1 7 17 19 8 10 40 9 35 9 28 4 3 18 4 0 8 12 0 3 2 0 3 13 14 12 0 6 2 5 17 1 14 1 19 10 4 12 6 7 10 5 40 4 7 6 2 24 12 2 5 10 1 2 6 4 14 4 2 1 0 5 2 5 10 4 14 9 6 8 3 5 4 18 1 0 2 3 9 3 10 14 1 0 2 2 6 4 1 8 9 4 0 0 7 1 2 5 2 2 13 0 2 4 5 0 0 2 1 6 5 0 5 3 4 2 3 2 13 6 0 3 4 6 3 2 7 3 4 2 0 5 4 5 2 10 1 6 1 4 2 0 5 8 1 2 3 3 0 3 6 1 5 2 0 0 3 0 1 5 6 13 2 0 3 0 5 2 1 1 1 0 2 6 0 3 0 1 4 2 0 1 13 3 0 5 1 0 0 2 1 1 5 7 1 3 4 1 1 0 2 8 0 4 2 0 0 0 0 1 0 2 0 0 1 1 1 1 1 1 0 4 2 1 4 0 1 1 2 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0

10 rows × 8097 columns


In [6]:
#govs = interrogator(corpus, 'g', bp_words, lemmatise = True)
from dictionaries.process_types import processes
%matplotlib inline

renames = [(r'root:root', 'be bipolar'),
           (r'(dobj|acomp):have', 'have bipolar'),
           (r'dobj:.*', 'other processes')]

be_have = editor(govs.results, '%', 'self', sort_by = 'total',
                 replace_names = renames, just_entries = [n for r, n in renames])

plotter('Being and having bipolar', be_have.results, figsize = (10, 5), save = True, style = 'fivethirtyeight',
    y_label = 'Percentage of all processes')


***Processing results***
========================

Replacing "root:root" with "be bipolar" ...

Replacing "(dobj|acomp):have" with "have bipolar" ...

Replacing "dobj:.*" with "other processes" ...

Merging duplicate entries ... 

Keeping 3 entries:
    be bipolar
    have bipolar
    other processes

***Done!***
========================


09:52:52: images/being-and-having-bipolar.png created.

In [133]:
be_have = editor(govs.results, merge_entries = ['dobj:have', 'acomp:have'], newname = 'have bipolar')
be_have = editor(be_have.results, merge_entries = ['root:root'], newname = 'be bipolar')
be_have = editor(be_have.results, merge_entries = r'dobj:%s' % processes.relational, newname = 'other relational processes')
be_have = editor(be_have.results, '%', be_have.totals, sort_by = 'total',
                 just_entries = ['have bipolar', 'be bipolar', 'other relational processes'])
plotter('Being and having bipolar', be_have.results, num_to_plot = 3, y_label = 'Percentage of all relational processes',
        style = 'fivethirtyeight', figsize = (14, 6))


***Processing results***
========================

Merging 2 entries as "have bipolar":
    dobj:have
    acomp:have

***Done!***
========================


***Processing results***
========================

Merging 1 entries as "be bipolar":
    root:root

***Done!***
========================


***Processing results***
========================

Merging 8 entries as "other relational processes":
    dobj:see
    dobj:look
    dobj:sound
    dobj:feel
    dobj:be
    dobj:be/have
    dobj:are/have
    dobj:is/was

***Done!***
========================


***Processing results***
========================

Keeping 3 entries:
    have bipolar
    be bipolar
    other relational processes

***Done!***
========================

Transitivity features

Key participants


In [45]:
part_query = r'/(NN|JJ).?/ >># (/(NP|ADJP)/ $ VP | > VP)'
parts = interrogator(corpus, 'words', part_query, lemmatise = True)


 09:12:33: Finished! 31171 unique results, 816077 total.


In [46]:
quickview(parts, 50)


  0: thing
  1: time
  2: person
  3: med
  4: bipolar
  5: day
  6: something
  7: way
  8: problem
  9: sure
 10: doctor
 11: able
 12: lot
 13: better
 14: pdoc
 15: life
 16: good
 17: anyone
 18: anything
 19: someone
 20: friend
 21: effect
 22: husband
 23: bp
 24: help
 25: hard
 26: year
 27: everything
 28: today
 29: care
 30: week
 31: medication
 32: son
 33: doc
 34: depression
 35: everyone
 36: disorder
 37: sorry
 38: job
 39: feeling
 40: symptom
 41: thought
 42: one
 43: bad
 44: night
 45: episode
 46: drug
 47: part
 48: daughter
 49: great

Plotting by total:


In [114]:
tot_part = editor(parts.results, '%', parts.totals, sort_by = 'total')
plotter('Participants, increasing', tot_part.results, figsize = (10, 5), 
        style = 'fivethirtyeight', num_to_plot = 20, interactive=True)


***Processing results***
========================

***Done!***
========================

Out[114]:

Plotting by decreasing frequency:


In [63]:
# dec_part = editor(parts.results, '%', parts.totals, sort_by = 'decrease')
plotter('Participants, decreasing', dec_part.results, figsize = (16, 7), num_to_plot = 10, 
        style = 'fivethirtyeight')


Plotting by increasing frequency:


In [58]:
inc_part = editor(parts.results, '%', parts.totals, sort_by = 'increase')
plotter('Participants, increasing', inc_part.results, figsize = (16, 7), num_to_plot = 10, 
        style = 'fivethirtyeight', kind = 'area')


Jargonisation

First, we copy our results a few times:


In [ ]:


In [156]:

We then make dictionaries of jargon/non-jargon, and their search queries:


In [157]:
jargon_terms = {'pdoc': r'^pdoc',
'tdoc': r'^tdoc',
'meds': r'^med$'}

non_jargon_terms = {'psychologist/psychiatrist': r'^(psychologist|psychiatrist)',
'diagnos*': r'^diagnos',
'therapist': r'^therapist',
'medication/medicine': r'^(medicine|medication)s*',
'doc/doctor': r'^doc(tor)*s*$'}

Now we iterate through these items, merging things in the coped lists:


In [158]:
for name, regex in jargon_terms.items():
    jargon = editor(jargon.results, merge_entries = regex, newname = name)
jargon = editor(jargon.results, '%', parts.totals, just_entries = jargon_terms.keys())

for name, regex in non_jargon_terms.items():
    non_jargon = editor(non_jargon.results, merge_entries = regex, newname = name)
non_jargon = editor(non_jargon.results, '%', parts.totals, just_entries = non_jargon_terms.keys())


***Processing results***
========================

Merging 1 entries as "meds":
    med

***Done!***
========================


***Processing results***
========================

Merging 3 entries as "tdoc":
    tdoc
    tdocs
    tdoc/pdoc

***Done!***
========================


***Processing results***
========================

Merging 12 entries as "pdoc":
    pdoc
    pdocs
    pdoctor
    pdoc/tdoc
    pdocs/tdocs
    pdoc/doc
    pdoc/md
    pdoc.i
    pdoc/rx
    pdoc/tdoc/dh
... and 2 more ... 

***Done!***
========================


***Processing results***
========================

Keeping 3 entries:
    meds
    tdoc
    pdoc

***Done!***
========================


***Processing results***
========================

Merging 9 entries as "therapist":
    therapist
    therapist/pdoc
    therapist/psychologist
    therapists/drugs
    therapist?j
    therapist/psych
    therapist/doctor
    therapist/md
    therapist/np

***Done!***
========================


***Processing results***
========================

Merging 13 entries as "medication/medicine":
    medication
    medicine
    medication-related
    medication-therapy
    medication/treatment
    medication.your
    medication?i
    medicationless
    medications/lifestyles
    medications/treatments
... and 3 more ... 

***Done!***
========================


***Processing results***
========================

Merging 2 entries as "doc/doctor":
    doctor
    doc

***Done!***
========================


***Processing results***
========================

Merging 11 entries as "psychologist/psychiatrist":
    psychiatrist
    psychologist
    psychiatrist/psychologist
    psychiatrist/therapist
    psychologist/psychiatrist
    psychiatrist.i
    psychiatristi
    psychologist/counselor
    psychologist/pdoc
    psychologists/psychiatrists
... and 1 more ... 

***Done!***
========================


***Processing results***
========================

Merging 28 entries as "diagnos*":
    diagnosis
    diagnosed
    diagnosing
    diagnosis/treatment
    diagnosable
    diagnosed-you
    diagnosed.they
    diagnosed.well
    diagnosed/lithium
    diagnosedhim
... and 18 more ... 

***Done!***
========================


***Processing results***
========================

Keeping 5 entries:
    therapist
    medication/medicine
    doc/doctor
    psychologist/psychiatrist
    diagnos*

***Done!***
========================


In [159]:
plotter('Jargon terms', jargon.results, subplots = True, style = 'fivethirtyeight', figsize = (9, 9))
plotter('Non-jargon terms', non_jargon.results, subplots = True, style = 'fivethirtyeight', figsize = (9, 9))



In [184]:
parts = r['participants']
proc = r['processes']
allwords = r['allwords']

nms = {'diagnose as participant': r'^diagno*',
       'dx as participant': r'^dx', 
       'diagnose as process': r'^diagno',
       'dx as process': r'^dx'}

diag = []
for name, regex in nms.items()[:2]:
    tmp = editor(parts.results, merge_entries = regex, newname = name)
    diag.append(tmp.results[name])

for name, regex in nms.items()[2:]:
    tmp = editor(proc.results, merge_entries = regex, newname = name)
    diag.append(tmp.results[name])

diag = pd.concat(diag, axis = 1)
diagnosis = editor(diag, '%', allwords.totals, just_entries = nms.keys())
plotter('Diagnosis as jargon process', diagnosis.results, subplots = True,
       figsize = (10, 10))


***Processing results***
========================

Merging 38 entries as "diagnose as participant":
    diagnosis
    diagnosed
    diagnosing
    diagnosis/treatment
    diagnonsense
    diagnosable
    diagnosed-you
    diagnosed.they
    diagnosed.well
    diagnosed/lithium
... and 28 more ... 

***Done!***
========================


***Processing results***
========================

Merging 9 entries as "dx as participant":
    dx
    dxd
    dxm
    dx-ed
    dxs
    dx.he
    dx/rx
    dxed.this
    dxing

***Done!***
========================


***Processing results***
========================

Merging 12 entries as "diagnose as process":
    diagnose
    diagnoised
    diagnosised
    diagnoed
    diagnossed
    diagnos
    diagnosied
    diagnozed
    diagnonised
    diagnose/prescribe
... and 2 more ... 

***Done!***
========================


***Processing results***
========================

Merging 5 entries as "dx as process":
    dxed
    dx
    dxd
    dxing
    dx-ed

***Done!***
========================


***Processing results***
========================

Keeping 4 entries:
    diagnose as participant
    dx as participant
    diagnose as process
    dx as process

***Done!***
========================

Key processes

With first person actor


In [75]:
what_i_do = r['what_i_do_deps']
what_i_do.query


Out[75]:
{'datatype': dtype('int64'),
 'dep_type': 'basic-dependencies',
 'dictionary': 'bnc.p',
 'function': 'interrogator',
 'function_filter': '^(nsubj|xsubj|agent)$',
 'lemmatag': False,
 'lemmatise': True,
 'option': 'gov',
 'path': 'data/postcounts',
 'phrases': False,
 'plaintext': False,
 'query': '(?i)^I$',
 'quicksave': 'what_i_do_deps.p',
 'spelling': False,
 'table_size': 50,
 'time_ended': '2015-06-15 02:54:09',
 'time_started': '2015-06-14 11:16:36',
 'titlefilter': False,
 'translated_option': 'g'}

In [78]:
tot_proc = editor(what_i_do.results, '%', what_i_do.totals, sort_by = 'total')
plotter('Processes with first person subject as actor', tot_proc.results, num_to_plot = 10,
        style = 'bmh', figsize = (16, 7))


***Processing results***
========================

***Done!***
========================


In [87]:
inc_proc = editor(what_i_do.results, '%', what_i_do.totals, sort_by = 'increase')
plotter('Processes with first person subject as actor', inc_proc.results, num_to_plot = 10,
        style = 'bmh', figsize = (16, 7), legend_pos = 'upper left')



In [77]:
dec_proc = editor(what_i_do.results, '%', what_i_do.totals, sort_by = 'decrease')
plotter('Processes with first person subject as actor, decreasing', dec_proc.results, num_to_plot = 10,
        style = 'bmh', figsize = (16, 7))


***Processing results***
========================

***Done!***
========================

We can also get results with the least slope:


In [88]:
stat = editor(what_i_do.results, keep_top = 100, print_info = False)
stat_proc = editor(stat.results, '%', what_i_do.totals, sort_by = 'static')
plotter('Processes with first person subject as actor, decreasing', stat_proc.results, num_to_plot = 10,
        style = 'bmh', figsize = (16, 7))


***Processing results***
========================

***Done!***
========================

With second person actor:


In [69]:
what_you_do = r['what_you_do_deps']
what_you_do.query


Out[69]:
{'datatype': dtype('int64'),
 'dep_type': 'basic-dependencies',
 'dictionary': 'bnc.p',
 'function': 'interrogator',
 'function_filter': '^(nsubj|xsubj|agent)$',
 'lemmatag': False,
 'lemmatise': True,
 'option': 'gov',
 'path': 'data/postcounts',
 'phrases': False,
 'plaintext': False,
 'query': '(?i)^you$',
 'quicksave': 'what_you_do_deps.p',
 'spelling': False,
 'table_size': 50,
 'time_ended': '2015-06-15 17:08:46',
 'time_started': '2015-06-15 07:35:41',
 'titlefilter': False,
 'translated_option': 'g'}

In [74]:
inc_proc_2 = editor(what_you_do.results, '%', what_you_do.totals, sort_by = 'increase')
plotter('Processes with first person subject as actor, increasing', inc_proc_2.results, num_to_plot = 10,
        style = 'bmh', figsize = (16, 7))


***Processing results***
========================

***Done!***
========================


In [73]:
dec_proc_2 = editor(what_you_do.results, '%', what_you_do.totals, sort_by = 'decrease')
plotter('Processes with first person subject as actor, decreasing', dec_proc_2.results, num_to_plot = 10,
        style = 'bmh', figsize = (16, 7))


***Processing results***
========================

***Done!***
========================

Process types


In [96]:
from dictionaries.process_types import processes
proc_types_query = [('Relational processes', r'/VB.?/ < /%s/ >># ( VP >+(VP) (VP !> VP $ NP))' % processes.relational), 
              ('Mental processes', r'/VB.?/ < /%s/ >># ( VP >+(VP) (VP !> VP $ NP))' % processes.mental), 
              ('Verbal processes', r'/VB.?/ < /%s/ >># ( VP >+(VP) (VP !> VP $ NP))' % processes.verbal)]
proc_types = multiquery(corpus, proc_types_query)


 10:34:47: Finished! 54252 total occurrences.

10:34:47: Finished! 3 unique results, 276109 total.

In [193]:
proc_total = r['processes']
rel_proc_types = editor(proc_types.results, '%', proc_total.totals)
plotter('Process types', rel_proc_types.results, kind = 'line', figsize = (11, 6), 
        style = 'fivethirtyeight', interactive = True)


***Processing results***
========================

***Done!***
========================

Out[193]:

Work in progress


In [24]:
#part_in_diag = interrogator(corpus, 'd', query = r'(?i)\bdiagno', 
               #lemmatise = True, dep_type = 'collapsed-ccprocessed-dependencies')

part_in_diag = load_result('diagnose_coll_deps')
# get only roles that equate to participant and process
to_match = r'^(dobj|nsubj|nsubjpass|csubj|acomp|iobj|csubjpass|tmod|agent|advmod|prep_[a-z]*):'
parts_circs = editor(part_in_diag.results, just_entries = to_match)

# edit names to sfl categories
# normalise bipolar and dr words
roles = [(r':(bp|bipolar|ii|bi-polar).*', ':bipolar'),
         (r':(doc|pdoc|tdoc|psychiatrist|dr).*', ':doctor'),
         (r':(you|yourself|yourselves).*', ':you'),
         (r'^(nsubj|agent):', 'actor:'), 
         (r'^(nsubjpass|dobj):', 'goal:'),
         (r'^prep_by:', 'actor:'),
         (r'(prep_[a-z]*|acomp):', 'range:'),
         (r'^tmod:', 'circ:'),
         (r'^advmod:', 'circ:'),
         (r'goal:bipolar', 'range:bipolar')]
         #(r'goal:(?!(i|you)).*', 'goal:3rdperson'),
         #(r'actor:(?!(i|you)).*', 'actor:3rdperson')]
parts_circs = editor(parts_circs.results, replace_names = roles, 
    merge_subcorpora = ['06', '07', '08', '09', '10'], new_subcorpus_name = '6+')


***Processing results***
========================

Keeping 2106 entries:
    nsubjpass:i
    prep_with:bipolar
    prep_with:disorder
    advmod:ago
    advmod:recently
    advmod:when
    advmod:just
    dobj:i
    prep_with:bp
    prep_as:bipolar
... and 2096 more ... 

***Done!***
========================


***Processing results***
========================

Replacing ":(bp|bipolar|ii|bi-polar).*" with ":bipolar" ...

Replacing ":(doc|pdoc|tdoc|psychiatrist|dr).*" with ":doctor" ...

Replacing ":(you|yourself|yourselves).*" with ":you" ...

Replacing "^(nsubj|agent):" with "actor:" ...

Replacing "^(nsubjpass|dobj):" with "goal:" ...

Replacing "^prep_by:" with "actor:" ...

Replacing "(prep_[a-z]*|acomp):" with "range:" ...

Replacing "^tmod:" with "circ:" ...

Replacing "^advmod:" with "circ:" ...

Replacing "goal:bipolar" with "range:bipolar" ...

Merging duplicate entries ... 

Merging 5 entries as "6+":
    06
    07
    08
    09
    10

***Done!***
========================


In [26]:
p = editor(parts_circs.results, '%', 'self', just_entries = r'(actor|goal|range):', 
    sort_by = 'total', just_subcorpora = ['01', '6+'], keep_stats = False, keep_top = 12)
p.results.rename(index={'01': 'First posts', '6+': 'Veteran posts (groups 6$+$)'}, 
                 inplace = True)
p = editor(p.results, replace_names = [(':', r'\\textbf{: '), (r'$', r'}')]) # $
plotter('Participants selected by \emph{diagnose} as Event', p.results.T.head(7), 
        kind = 'bar', y_label = 'Percentage of all \emph{diagnose} circumstances', 
        x_label = 'Word', figsize = (10, 5), rot = 0, save = True)


***Processing results***
========================

Keeping 2 subcorpora:
    01
    6+

Keeping 1247 entries:
    goal:i
    range:bipolar
    range:disorder
    goal:you
    range:depression
    goal:he
    goal:she
    actor:doctor
    range:year
    goal:who
... and 1237 more ... 

***Done!***
========================

                             goal:i  range:bipolar  range:disorder  goal:you    ...     goal:he  range:year  goal:daughter  goal:son
First posts                   30.03          16.85            6.13      0.77    ...        1.98        1.64           0.40      0.77
Veteran posts (groups 6$+$)   21.31          13.31            2.68      6.53    ...        1.68        1.60           1.71      1.08

[2 rows x 12 columns]

***Processing results***
========================

Replacing ":" with "\\textbf{: " ...

Replacing "$" with "}" ...

Merging duplicate entries ... 

***Done!***
========================


00:58:21: images/participants-selected-by-emphdiagnose-as-event.png created.

In [15]:
# get relative frequencies of circumstances
c = editor(parts_circs.results, just_entries = r'^circ:', 
    sort_by = 'total', just_subcorpora = ['01', '6+'], keep_stats = False)

c = editor(c.results, '%', c.totals, replace_names = r'^circ:')

plotter(r'Common circumstances of \emph{diagnose} processes in first posts', 
    c.results.ix['01'].order(ascending=False), kind = 'bar', num_to_plot = 13, 
    figsize = (10, 5), show_totals = 'plot', x_label = 'Participant', save = True, 
    y_label = 'Percentage of all circ. in \emph{diagnose} process')

# a new chart, comparing first posts to veteran posts

key_terms = ['recently', 'just', 'ago', 'now', 'year', 'week',
             'properly', 'correctly', 'officially', 'accurately', 'here']

selected_c = editor(c.results, just_entries = key_terms, 
               just_subcorpora = ['01', '6+'], sort_by = 'decrease')

selected_c.results.rename(index={'01': 'First posts', '6+': 'Veteran posts (6$+$)'}, inplace = True)

plotter('Circumstances surrounding the process of diagnosis', selected_c.results.T, num_to_plot = 'all', 
        kind = 'bar', y_label = 'Percentage of all \emph{diagnose} circumstances', x_label = 'Word',
        figsize = (10, 5), save = True)


***Processing results***
========================

Keeping 2 subcorpora:
    01
    6+

Keeping 270 entries:
    circ:ago
    circ:recently
    circ:when
    circ:just
    circ:first
    circ:properly
    circ:only
    circ:finally
    circ:also
    circ:now
... and 260 more ... 

***Done!***
========================


***Processing results***
========================

Replacing "^circ:" with "" ...

Merging duplicate entries ... 

***Done!***
========================


00:49:56: images/common-circumstances-of-emphdiagnose-processes-in-first-posts.png created.
***Processing results***
========================

Keeping 2 subcorpora:
    01
    6+

Keeping 11 entries:
    recently
    just
    ago
    now
    year
    week
    properly
    correctly
    officially
    accurately
... and 1 more ... 

***Done!***
========================


00:50:00: images/circumstances-surrounding-the-process-of-diagnosis.png created.

In [16]:
c.results


Out[16]:
recently when ago just ... apart anymore and wednesday
01 14.80 5.47 15.55 9.90 ... 0.09 0.09 0.09 0.09
6+ 8.30 9.96 5.85 5.42 ... 0.00 0.00 0.00 0.00
slope -6.49 4.50 -9.70 -4.48 ... -0.09 -0.09 -0.09 -0.09
intercept 14.80 5.47 15.55 9.90 ... 0.09 0.09 0.09 0.09
r -1.00 1.00 -1.00 -1.00 ... -1.00 -1.00 -1.00 -1.00
p 0.00 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00
stderr 0.00 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00

7 rows × 270 columns


In [17]:
#part_in_diag = interrogator(corpus, 'd', query = r'(?i)\bdiagno', 
               #lemmatise = True, dep_type = 'collapsed-ccprocessed-dependencies')

part_in_diag = load_result('diagnose_coll_deps')
# get only roles that equate to participant and process
to_match = r'^(dobj|nsubj|nsubjpass|csubj|acomp|iobj|csubjpass|tmod|agent|advmod|prep_[a-z]*):'
parts_circs = editor(part_in_diag.results, just_entries = to_match)

# edit names to sfl categories
# normalise bipolar and dr words
roles = [(r':(bp|bipolar|ii|bi-polar).*', ':bipolar'),
         (r':(doc|pdoc|tdoc|psychiatrist|dr).*', ':doctor'),
         (r':(you|yourself|yourselves).*', ':you'),
         (r'^(nsubj|agent):', 'actor:'), 
         (r'^(nsubjpass|dobj):', 'goal:'),
         (r'^prep_by:', 'actor:'),
         (r'(prep_[a-z]*|acomp):', 'range:'),
         (r'^tmod:', 'circ:'),
         (r'^advmod:', 'circ:'),
         (r'goal:bipolar', 'range:bipolar')]
         #(r'goal:(?!(i|you)).*', 'goal:3rdperson'),
         #(r'actor:(?!(i|you)).*', 'actor:3rdperson')]
parts_circs = editor(parts_circs.results, replace_names = roles)


***Processing results***
========================

Keeping 2106 entries:
    nsubjpass:i
    prep_with:bipolar
    prep_with:disorder
    advmod:ago
    advmod:recently
    advmod:when
    advmod:just
    dobj:i
    prep_with:bp
    prep_as:bipolar
... and 2096 more ... 

***Done!***
========================


***Processing results***
========================

Replacing ":(bp|bipolar|ii|bi-polar).*" with ":bipolar" ...

Replacing ":(doc|pdoc|tdoc|psychiatrist|dr).*" with ":doctor" ...

Replacing ":(you|yourself|yourselves).*" with ":you" ...

Replacing "^(nsubj|agent):" with "actor:" ...

Replacing "^(nsubjpass|dobj):" with "goal:" ...

Replacing "^prep_by:" with "actor:" ...

Replacing "(prep_[a-z]*|acomp):" with "range:" ...

Replacing "^tmod:" with "circ:" ...

Replacing "^advmod:" with "circ:" ...

Replacing "goal:bipolar" with "range:bipolar" ...

Merging duplicate entries ... 

***Done!***
========================


In [18]:
# get relative frequencies of circumstances
c = editor(parts_circs.results, just_entries = r'^circ:', 
    sort_by = 'total', keep_stats = False)

c = editor(c.results, '%', c.totals, replace_names = r'^circ:')

# a new chart, comparing first posts to veteran posts
inc_c = editor(c.results, sort_by = 'increase')


***Processing results***
========================

Keeping 270 entries:
    circ:ago
    circ:recently
    circ:when
    circ:just
    circ:first
    circ:properly
    circ:only
    circ:finally
    circ:also
    circ:now
... and 260 more ... 

***Done!***
========================


***Processing results***
========================

Replacing "^circ:" with "" ...

Merging duplicate entries ... 

***Done!***
========================


***Processing results***
========================

***Done!***
========================


In [21]:
list(inc_c.results.columns)[:30]


Out[21]:
[u'properly',
 u'correctly',
 u'finally',
 u'when',
 u'here',
 u'sooner',
 u'even',
 u'first',
 u'accurately',
 u'where',
 u'how',
 u'really',
 u'often',
 u'ever',
 u'still',
 u'much',
 u'especially',
 u'truly',
 u'since',
 u'about',
 u'so',
 u'anyway',
 u'why',
 u'shortly',
 u'long',
 u'inability',
 u'up',
 u'either',
 u'originally',
 u'usually']

In [ ]:


In [ ]:


In [ ]:
key_terms = ['recently', 'just', 'ago', 'now', 'year', 'week',
             'properly', 'correctly', 'officially', 'accurately', 'here']

selected_c = editor(c.results, just_entries = key_terms, 
               just_subcorpora = ['01', '6+'], sort_by = 'decrease')

selected_c.results.rename(index={'01': 'First posts', '6+': 'Veteran posts (6$+$)'}, inplace = True)

plotter('Circumstances surrounding the process of diagnosis', selected_c.results.T, num_to_plot = 'all', 
        kind = 'bar', y_label = 'Percentage of all \emph{diagnose} circumstances', x_label = 'Word',
        figsize = (14, 7))

In [ ]:


In [ ]:


In [ ]:


In [29]:
#proc = load_result('tree_processeslem')
#query = r'/VB.?/ >># ( VP >+(VP) (VP !> VP $ NP))'
#proc = interrogator(corpus, 'words', query, lemmatise = True)
p = editor(proc.results.ix[0], 'k', proc.results)
p.results


***Done!***
========================

Out[29]:
diagnose     521.30
appreciate   113.01
suffer        88.68
be            75.35
wonder        45.61
date          36.13
have          29.75
start         28.90
put           22.12
marry         20.34
cry           18.75
sleep         17.06
become        16.57
depress       15.79
lose          15.61
              ...  
learn        -21.82
make         -22.16
use          -23.02
let          -24.24
need         -27.27
get          -27.74
mean         -28.07
work         -30.25
hope         -31.68
say          -31.68
find         -32.54
see          -35.28
help         -37.65
keep         -39.83
do           -52.72
Name: 01: keyness, dtype: float64

Connecting the levels of abstraction

With IPython Notebooks, you can embed HTML pages:


In [20]:
from IPython.display import HTML
HTML('<iframe src=https://en.wikipedia.org/wiki/ETH_Zurich width=900 height=350></iframe>')


Out[20]:

This has a lot of potential for CMC research that hopes to deal with contextualised data.

Let's make some concordance lines for crazy in veteran posts:


In [12]:
lines = conc('data/postcounts/10', r'/crazy/', window = 50, n = 50, add_links = True)


07:59:47: Getting concordances for data/postcounts/10 ... 
          Query: /crazy/

0           i just hid because i thought i was somehow   crazy   or just a bad person                              
1      have irritation/anger , others spend money like   crazy   sometimes , talk really really fast , etc         
2       but if he takes off and is spending money like   crazy   you need to make sure that you are legally        
3      i usually only recommend books that i 'm really   crazy   about , and i do n't feel that way about this one 
4   being able to keep weight on as well ... the whole   crazy   ride is just a big balancing act                  
5           i hope that helps ... there 's site called   crazy   meds that have people 's accounts of their        
6                        goody , do n't drive yourself   crazy   by trying to anticipate something that may or may 
7                           in part this is due to the   crazy   schedule most college kids keep , the long hours  
8     for instance will tell me about someone and some   crazy   or stupid thing they did and say , do you think   
9        to join the board and our little community of   crazy   bp ` ers                                          
10       who said he just was n't the same old fun and   crazy   guy                                               
11                           apparently there are more   crazy   variances with lamictal than with a lot of other  
12            accepting the bp diagnosis makes you not   crazy   , if anything                                     
13          there , because my thoughts just race like   crazy   and they try to crowd out the peaceful thoughts   
14  so maybe he 'll up out there -- but he 's not that   crazy   about the areas where he 's been                  
15                                          i would go   crazy   cycling as often as you do                        
16                                 she is getting stir   crazy   and the attitude is starting to come back         
17         of her dreams when my friends thought i was   crazy   to allow her to do such a thing                   
18     i 'm sick though given my manic episode and how   crazy   our weather has been lately                       
19     rehab ... she went bolistic and said that i was   crazy   ... that only `` f-ups '' go there and that she   
20      and keep him calm and basically from not going   crazy   when he was awake                                 
21    impulsive actions -lrb- like spending money like   crazy   -rrb- , sleeping around , drinking to excess      
22                                  but i do shed like   crazy   and it clogs my drains                            
23              also ... she went on to say that i was   crazy   and no other mom sits up til all hours of the     
24  doing ; excessive drug and/or alcohol use , wild ,   crazy   behavior and a feeling that the individual is     
25   erin and her crying telling me that everybody was   crazy   and she was n't                                   
26  see that show but i 'm glad oprah used the word ``   crazy   '' because that is what people expect once they   
27                                          as for the   crazy   part , yeah well i think we 're all a lil crazy   
28      who get questioned adn accused and told we 're   crazy   and we have to change                             
29     of these pdocs must have mothers who drive them   crazy   with concerns and he made the mistake of          
30       yesterday that previously would have made him   crazy   -- carrying mattresses down a tight hall and      
31  like that for over 2 months now , and it drives me   crazy   ... so you definitely do n't want to let that get 
32    girlfriend likes to talk about me and how i 'm a   crazy   , insane , drug user , so i had to tell him if i  
33  a reason , it just is ... so do n't drive yourself   crazy   trying to figure out why you feel that way        
34     it seems like i catch colds and infections like   crazy   and recover very slowly . -rrb                    
35       things that at the time drove my mom and aunt   crazy   , but we all often sit around the table laughing  
36                                  but calling you ``   crazy   '' is different                                   
37                            oh well ... just another   crazy   event happening in goodyland                      
38    you 've been and way you 've been , it 's not ``   crazy   '' , it 's an illnes                              
39   that i would never be let out again , that 's how   crazy   and depressed i felt at that moment               
40    , ca n't sit still , just bursting with all this   crazy   energy like i 've never had before heightened mood
41     me that there were kids who saw things and went   crazy   and needed 5 people to hold them down and then    
42        or i was on to her she pulled the `` you 're   crazy   '' routine and started crying like she did just   

Each line also contains an html link, which is clickable:


In [14]:
lines.ix[0]


Out[14]:
l                                  i just hid because i thought i was somehow 
m                                                                        crazy
r                                                         or just a bad person
link    http://www.healthboards.com/boards/bipolar-disorder/695089-labels.html
Name: 0, dtype: object

So, we could also display it in text:


In [18]:
show(lines, 0, show = 'thread')


Out[18]:

Voila! Contexualised corpus linguistics!


In [16]: