| notebook.community

Over the course of your project, you'll have many ups and downs with your data, so we'll use the analogy of a romantic relationship to explain the different steps of single-cell bioinformatics analyses.

1. Matchmaking: Getting the publicly available deets on your data

"Pubmed stalking"... it's just like facebook stalking!

1.0_Introduction_to_bioinformatics.ipynb
1.1_Overview_of_analysis_steps.ipynb
1.2_Downloading_public_data_shalek2013.ipynb
1.3_Single-cell_overview_additional_reading.ipynb
1.4_Unix.ipynb - Optional additional exercises with the unix command line. If you're on Linux or Mac you can do this on your computer, if you have windows, do these on the Macs.

Homework

Spillover from what we didn't finish:
- Mapping/alignment spillover
- Downloading public data and filtering on expressed genes spillover
Find another single cell paper with GEO/ArrayExpress accession, download its data, and compare gene expression filtering strategies (Will use this dataset throughout the course)

Optional: Pandas from `.head()` to `.tail()`

The package we'll be using to deal with matrices and dataframes in Python is called Pandas. Thoughout the course, I've tried to show some different applications of pandas but this is definitely not complete. For a full introduction, I recommend the following tutorial from Tom Augspurger.

While this tutorial is aimed for newbies to Python and pandas, and thus the beginning would be review for intermediate to advanced Python and pandas users, the last few notebooks would be of interest to non-newbies.

Groupby
- Life-changing concept that has saved me hours of work. There's been many days where I've said to myself, "I LOVE GROUPBY!!!!!!"
Tidy Data
- Another Awesome life-changing concept that helps you think about how to structure your data, even as you're making Excel files. Based off of this paper by Hadley Wickham, the author of many many dataframe manipulation packages in R.
Pandas applied to Machine Learning and Statistics
- Categorical variables and transforming them to machine-learning friendly formats

2. First date: Get your data's life story with dimensionality reduction

2.0 Machine Learning Intro [Jake Vanderplas' tutorial]
2.1 Basic Principles in Machine Learning [Jake Vanderplas' tutorial]
2.2_Introduction_to_dimensionality_reduction.ipynb
2.3_PCA [Jake Vanderplas' tutorial]
2.4_ICA.ipynb
2.5_Manifold_learning.ipynb
2.6_Compare_dimensionality_reduction.ipynb
2.7_Apply_dimensionality_reduction_on_Shalek2013_Macaulay2016.ipynb
2.8_Additional reading.ipynb
2.9_tSNE_on_subsets_of_digits.ipynb

Homework

Application spillover
Same single cell dataset, compare all dimensionality reduction algorithms

3. One-month anniversary: Give your boo some clusters

3.0_Introduction_to_clustering.ipynb
3.1 $K$-means_clustering [Jake Vanderplas' tutorial]
3.2_Hierarchical_clustering.ipynb
3.3_Apply_clustering_to_Shalek2013_Macaulay2016.ipynb
3.4_Plotting_colors_and_evaluating_clustering.ipynb

Homework

Application spillover
Same dataset, compare cluster finding

4. One-year anniversary: Find what makes your data tick using supervised learning

4.0_Introduction_to_classifiers.ipynb
4.1_Overfitting.ipynb
4.1_Support_vector_machines [Jake Vanderplas' tutorial]
4.2_Decision_trees [Jake Vanderplas' tutorial]
4.4_Apply_SVM_to_Shalek2013_clustered_heatmap.ipynb
4.5_Apply_SVM_to_Shalek2013_with_violinplots.ipynb
4.6_Assess_clustering_with_gene_ontology.ipynb
4.7_Apply_tree_classifiers_to_Shalek2013_with_gene_ontology.ipynb
4.8_Apply_classifiers_to_Macaulay2016_with_gene_ontology.ipynb

Homework

Application spillover
Same dataset, compare enriched genes in clusters

5. Ten-year anniversary: Reflect on where you've been together with pseudotime ordering

Pseudotime ordering is like biologically-driven "regression"

6. Couples counseling: Dealing with technical noise and batch effects

7. 50-year anniversary: Advanced topics

If you're already an experienced bioinformatician, you may be interested in working through the analyses steps of the papers assigned for the course. The simpler one is the Shalek2013 paper:

7.2_Reproducing_Shalek2013_figures

More advanced is the Macaulay2016 paper, which includes pseudotime ordering and Bayesian modeling.

7.0_Case_Study_Macaulay2016.ipynb
- Links to the original notebooks supplied with the paper
7.1_Playing_with_analysis_decisions_in_Macaulay2016.ipynb
- Interactive widgets playing with PCA vs ICA vs MDS vs t-SNE, linkage methods, and distance metrics at key points of the Macaulay2016 analysis pipeline

8. Plotting tips

Tips for Python plotting with colors and such

8.0_Plotting_tips.ipynb

Single-cell Bioinformatics

1. Matchmaking: Getting the publicly available deets on your data

Homework

Optional: Pandas from .head() to .tail()

2. First date: Get your data's life story with dimensionality reduction

Homework

3. One-month anniversary: Give your boo some clusters

Homework

4. One-year anniversary: Find what makes your data tick using supervised learning

Homework

5. Ten-year anniversary: Reflect on where you've been together with pseudotime ordering

6. Couples counseling: Dealing with technical noise and batch effects

7. 50-year anniversary: Advanced topics

8. Plotting tips

Optional: Pandas from `.head()` to `.tail()`