Over the course of your project, you'll have many ups and downs with your data, so we'll use the analogy of a romantic relationship to explain the different steps of single-cell bioinformatics analyses.
1. Matchmaking: Getting the publicly available deets on your data
"Pubmed stalking"... it's just like facebook stalking!
Homework
- Spillover from what we didn't finish:
- Mapping/alignment spillover
- Downloading public data and filtering on expressed genes spillover
- Find another single cell paper with GEO/ArrayExpress accession, download its data, and compare gene expression filtering strategies (Will use this dataset throughout the course)
Optional: Pandas from .head() to .tail()
The package we'll be using to deal with matrices and dataframes in Python is called Pandas. Thoughout the course, I've tried to show some different applications of pandas but this is definitely not complete. For a full introduction, I recommend the following tutorial from Tom Augspurger.
While this tutorial is aimed for newbies to Python and pandas, and thus the beginning would be review for intermediate to advanced Python and pandas users, the last few notebooks would be of interest to non-newbies.
- Groupby
- Life-changing concept that has saved me hours of work. There's been many days where I've said to myself, "I LOVE GROUPBY!!!!!!"
- Tidy Data
- Another Awesome life-changing concept that helps you think about how to structure your data, even as you're making Excel files. Based off of this paper by Hadley Wickham, the author of many many dataframe manipulation packages in R.
- Pandas applied to Machine Learning and Statistics
- Categorical variables and transforming them to machine-learning friendly formats
2. First date: Get your data's life story with dimensionality reduction
Homework
- Application spillover
- Same single cell dataset, compare all dimensionality reduction algorithms
3. One-month anniversary: Give your boo some clusters
Homework
- Application spillover
- Same dataset, compare cluster finding
4. One-year anniversary: Find what makes your data tick using supervised learning
Homework
- Application spillover
- Same dataset, compare enriched genes in clusters
5. Ten-year anniversary: Reflect on where you've been together with pseudotime ordering
Pseudotime ordering is like biologically-driven "regression"
6. Couples counseling: Dealing with technical noise and batch effects
7. 50-year anniversary: Advanced topics
If you're already an experienced bioinformatician, you may be interested in working through the analyses steps of the papers assigned for the course. The simpler one is the Shalek2013 paper:
More advanced is the Macaulay2016 paper, which includes pseudotime ordering and Bayesian modeling.
8. Plotting tips
Tips for Python plotting with colors and such