Here we will creat an instance of Topics called t.
This will read the corpus (in JSON) using pandas, train a matrix representation of the corpus using CountVectorizer, then perform Latent Dirichlet Allocation (LDA). It's done when "Topics instance ready" is printed.
In [1]:
from BioTechTopics import Topics
t=Topics()
t.showWordCloud(x) will show the wordcloud for topic number x made by LDA. The size of the words in the wordcloud are proportional to the word count within the corpus multiplied by P(word|topic=x) (from LDA). Words for other topics will be shown below.
In [2]:
t.showTopicWordCloud(0)
Here, this topic 0 appears to be about liquiud biopsy since it is referencing blood tests, fingerpricks, and Theranos. Future work will conduct sentiment analysis for sentences containing these keywords, and display the sentiment as color on these word maps.
In [3]:
t.printTopWords(10)
Here we use the "Who's Who" function to retrieve the named entities returned for the query "microbiome". Some examples include:
Indigo Agriculture: Indigo Agriculture is a startup using plant microbiomes to strengthen crops against disease and drought to increase crop yield for farmers.
Noubar Afeyan: Cofounder of Evelo Therapeutics, which is using microbiome therapy to treat cancer.
Asit Parikh, MD, PhD: Head of Takeda's gastroenterology therapeutic area unit, which just acquired NuBiyota. NuBiyota is a microbiome therapeutic company that focuses on gastrointestinal indications.
In [4]:
t.ww('microbiome')