In [4]:

    
from BioTechTopics import Topics
t=Topics()
t.load() # unpickles LDA, tf, and tf-idf representations, puts text data from JSON into pandas dataframe 
execfile('./plotBokehJpnb.py')









    





    
        
        Loading BokehJS ...

Key Takaways

Performance (compared to semi-final)

1) BioTechTopics is faster:

All keyword extraction, named entity recognition, and LDA is performed before hand and the results are pickled or put into JSON

2) BioTechTopics is bigger:

Scrapy was used to scrape the entire fiercebiotech.com website , resulting in 90Mb of text data and 38,000 separate documents

3) BioTechTopics is more user friendly:

Interactive bokeh plots were implemented