Data Mining: -- "unsupervised learning"
Machine Learning -- "supervised learning"
Stats Terminology:
set of individual measurements: $x_i$ (where i, . . . ,N)
True Distribution
$h(x)$ - function that generates x
$h(x)dx\equiv$ probability distribution function (population pdf)
$H(x)=\int_{-\infty}^{x}h(x')dx'$ cumulative distribution function
Empirical Distribution
$f(x)$ - function that generates x
$f(x)dx\equiv$ empirical probability distribution function (empirical pdf)
$F(x)=\int_{-\infty}^{x}h(x')dx'$ cumulative empirical distribution function
(Normalized such that $H(\infty)=F(\infty)=1$)
Since data sets are never infinitely large (and well sampled) $f(x)\neq h(x)$:
$f(x)$ is a model of $h(x)$
Errors (associated with measurement $x_i$)
$e(x)=p(x|\mu,I)$ - $\mu$: true value, $I$: describes the error distribution
for a gaussian:
$p(x|\mu,\sigma)=\frac{1}{\sigma\sqrt{2\pi}}\exp{\left(\frac{-(x-\mu)^2}{2\sigma^2}\right)}$
Ramble about broad $f(x)$:
could be due to errors (larger sample will lead to better derivation of $h(x)$), or it could be due to broad h(x)
AstroML has "fetching functions" to download all of the datasets. To see a list in an iPython terminal one could type:
ln [ ]: from astroML.datasets import [TAB]
and it would list options (in an ipython notebook this comes up as a scrolling list)
SDSS:
SDSS imaging (p.16)
examples from the text (fetching the imaging data for 330753 objects)
In [12]:
%matplotlib inline
In [13]:
from astroML.datasets import fetch_imaging_sample
data = fetch_imaging_sample()
# determine the shape (size) of the downloaded data
data.shape
Out[13]:
In [15]:
# The code below finds the tags and prints the first five positions (RA,Dec)
print data.dtype.names
print data['ra'][:5], data['dec'][:5]
SDSS Spectroscopy: (p.19)
from astroML.datasets import fetch_sdss_spectrum
Galaxies (p.21)
from astroML.datasets import fetch_sdss_specgals
DR7 Quasar Catalog (p.23)
from astroML.datasets import fetch_dr7_quasar
SEGUE stellar parameters pipeline (SSPP) (p.25)
from astroML.datasets import fetch_sdss_sspp
SDSS standard stars from STRIPE 82 (p.26)
from astroML.datasets import fetch_sdss_S82standards
SDSS moving object catalog (p.30)
from astroML.datasets import fetch_moving_objects
LINEAR stellar light curves: (p.27)
from astroML.datasets import fetch_LINEAR_sample
Tufte book (Visual Display of Quantatative Information) I have a copy - come look at it if you haven't seen it before!
2D plotting:
3+D plotting:
In [36]:
# Partially I'm including these to demonstrate that you can simply pull examples from the textbook using the URLs!
from IPython.display import Image
from IPython.display import display
simple_plot = Image(url='http://www.astroml.org/_images/fig_sdss_S82standards_1.png',width=300)
contour_plot = Image(url='http://www.astroml.org/_images/fig_S82_scatter_contour_1.png',width=300)
hist_plot = Image(url='http://www.astroml.org/_images/fig_S82_hess_1.png',width=300)
display(simple_plot,contour_plot,hist_plot)
In [ ]: