We started this course by looking at astronomical data and talking about some of its nasty features. Now that you understand the inferential tools we have at our disposal and have tested them out on relatively simple problems, it's time to get back to the dirty business of drawing conclusions from real data, warts and all.
Setting the stage
Below are three real astronomical data sets, all presenting different aspects of the "line-fit" problem we've seen before.
Question: What features can you identify in these data sets that don't fit into the simple model we've used previously?
Horizontal errors, Upper bounds only, i.e., 2d errors and non-gaussian errors - skewed errors. Not obvious that a "line" is really a good thing to try fitting. Why would thing2 vs thing1 be well-modeled by a line?
Shape of sampling distribution
intrinsic scatter
outliers/interlopers
limits/censored data
x-errors and correlated error
selection effects/truncated data
Let's see how we can include these features in a model. To the whiteboard!