In [4]:
from BioTechTopics import Topics
t=Topics()
t.load() # unpickles LDA, tf, and tf-idf representations, puts text data from JSON into pandas dataframe
execfile('./plotBokehJpnb.py')
All keyword extraction, named entity recognition, and LDA is performed before hand and the results are pickled or put into JSON
Scrapy was used to scrape the entire fiercebiotech.com website , resulting in 90Mb of text data and 38,000 separate documents
Interactive bokeh plots were implemented