With great power comes great responsibility!

By this point, you've already built a near perfect classifier of tumor/normal status from about 2000 breast biopsies, and you've applied it to an independent validation set of approximately 500 samples from TCGA.

You could now build a biomarker panel to identify people with a disease, people who would benefit from a drug, or address many other use cases. How's that for power? If we're going to use these algorithms, we should get a handle on how they can be applied, and what might go wrong.

For this, we're going to take a time machine back to the fall of 2015. The top song on the billboard top 100 music chart was The Hills by The Weeknd, and Prof. Voight was sitting in a conference room at ASHG. On stage, Tuck Ngun was sharing the results of his analysis to identify epigenetic marks associated with sexual orientation. This paper would erupt into a twitterstorm of epic ferocity. The back and forth is captured in these three blog posts:

Before class, read and think about these blog posts in the context of what you've learned thus far. Be ready to discuss how these articles fit into what you've learned, as we'll spend the first part of class on this.

[In case you can't get to these blog posts, but can get to sage math cloud, we've uploaded PDFs of them to this folder.]

Q1: In a nutshell, who do you think is right and why?