Epilepsy Comorbidities (using Brown MySQL server)

This script run a PubMed-Comorbidities pipeline using the following characteristics:

  • Main MeSH Heading: "Epilepsy"
  • UMLS filtering concepts: "Disease or Syndrome", "Mental or Behavioral Dysfunction" or "Neoplastic Process"
  • Articles analysed: All MEDLINE 2017AA articles tagged with the as a MeSH Heading. Note that this is equivalent to searching PubMed using [MH:noexp] Total number of articles found: 66720
  • UMLS concept filtering: Comorbidities are analysed on all other MeSH descriptors associated with the specified UMLS concept
  • This script uses Brown MySQL databases:
    • medline
    • umls_meta
    • pubmed_comorbidities

In [10]:
#Settings
const mh = "Anemia"
const concepts = ("Disease or Syndrome", "Mental or Behavioral Dysfunction", "Neoplastic Process");


WARNING: redefining constant mh
WARNING: redefining constant concepts

Retrieve Results and Analyze Simple Occurrences


In [11]:
using Revise #used during development to detect changes in module
using PubMedMiner

@time occurrence_df = get_semantic_occurrences_df(mh, concepts...);


123.160312 seconds (3.71 M allocations: 73.901 MiB, 0.04% gc time)

In [12]:
@time stats = mesh_stats(occurrence_df, 20);


  1.737753 seconds (1.56 M allocations: 1.345 GiB, 46.08% gc time)

In [13]:
PubMedMiner.plot_bar_topn(stats.topn_mesh_labels, stats.topn_mesh_counts)


Out[13]:

Analyze and Plot Pair Statistics

  • Correlation Coefficient
  • Mutual information
  • Co-occurrence matrix

Correlation Coefficient


In [14]:
PubMedMiner.plot_stat_mat(stats.corrcoef, stats.topn_mesh_labels)


Out[14]:

Pointwise Mutual Information


In [15]:
PubMedMiner.plot_stat_mat(stats.pmi_sp, stats.topn_mesh_labels)


Out[15]:

Co-Occurrences Matrix


In [16]:
PubMedMiner.plot_chord_coo(stats.top_coo_sp, stats.topn_mesh_labels)


Out[16]:

Frequent Item Sets

Visualization of Frequent Item Sets

  • Basic visualization of frequent item sets using Sankey diagram (experimental - use with caution)
  • Future work includes better layout for more links as well as the ability to dinamically change the number of of itemsets

In [17]:
PubMedMiner.plot_sankey_arules(stats.sankey_sources, stats.sankey_targets, stats.sankey_vals, stats.mesh_names)


Out[17]: