Life Sciences - Anything that wiggles from 20nm to 30m in length
Most of the subjects I will touch on are incredibly deep and worthy of their own talk. Thankfully, the Research Triangle Analysts have already given some of them.
Fortunately, it requires near willful ignorance to acquire hacking skills and substantive expertise without also learning some math and statistics along the way. As such, the danger zone is sparsely populated, however, it does not take many to produce a lot of damage. - Drew Conway
The emphasis here is on finding a common understanding of the vocabulary between life scientists and analysts; things like pipelines, dataframes and representations.
A classic supervised learning problem.
1 Choose a representation
2 Train a classifier
3 Make predictions
4 Evaluation metrics
In :from rdkit import Chem from rdkit.Chem import Draw %matplotlib inline
In :m3 = Chem.MolFromSmiles('O=C1OC2=C(C=C1)C1=C(C=CCO1)C=C2') fig3 = Draw.MolToMPL(m3)
In :smiles = ("O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C", "CC(C)CCCCCC(=O)NCC1=CC(=C(C=C1)O)OC", "c1(C(=O)O)cc(OC)c(O)cc1") mols = [Chem.MolFromSmiles(x) for x in smiles] Draw.MolsToGridImage(mols)
Dealing with 30X genome sized datasets initially
Comparing RNA expression levels takes this from a big data problem back to another simple classification problem
Picture of simple net
Picture of architecture
Example of python code with Theano
New opportunities come from tying together multiple models.
For the hackers
For the employed
For the enthusiast
In [ ]:!jupyter nbconvert --to slides MLforLS.ipynb --post serve
[NbConvertApp] Converting notebook MLforLS.ipynb to slides [NbConvertApp] Writing 202636 bytes to MLforLS.slides.html [NbConvertApp] Redirecting reveal.js requests to https://cdn.jsdelivr.net/reveal.js/2.6.2 Serving your slides at http://127.0.0.1:8000/MLforLS.slides.html Use Control-C to stop this server Created new window in existing browser session. WARNING:tornado.access:404 GET /custom.css (127.0.0.1) 0.79ms WARNING:tornado.access:404 GET /favicon.ico (127.0.0.1) 0.47ms
Podcasts Talking Machines podcast starting after 10 minutes a16z - breathless, but not all hype