What this mega-tutorial does differently from other books is that it consistently makes use of a variety of different tools. The advantage of this strategy is that it lets us work more quickly once we know how the tools work. A major disadvantage is that it requires learning several different libraries and tools rather than just NumPy and Sklearn
Throughout this mega-tutorial I make use of only a single dataset. Focusing on just one dataset the entire time allows us to
In the real world, data scientists are typically going to be inspecting several different aspects of a dataset for a relatively long period of time.
Analytics has become such an interdisciplinary subject over the years that it seems like every single concept has at least 5 names. In the literature the $\vec{x}$ in $p(y|\vec{x})$ can be referred to as the following things:
As I am an econometrician by training, I will usually be calling $\vec{x}$. I believe that Practioners in the field will inevitably have to put up with the mixed up jargon for another
See the glossary in this mega-tutorial for synonyms for terms and acryonyms used in this book.
The appendix covers topics that are, strictly speaking, not necessary to apply the modeling techniques implemented in this tutorial.
They are, important, however for those who are using these modeling techniques in a context that is important.
In [ ]: