Day 1: Exploratory Data Analysis

- Or: I have some data: now what? -

In an ideal world, one collects data to answer a specific research question. In this case, usually the way the data will be analyzed is already clear and laid out from the beginning. In astronomy, that isn't always possible or the case. We might get data from a new instrument, a new wavelength rage, a new way to process the observations, or another parameter space that was previously unexplored. We might not always know what we find, or even what exactly we're looking for. We might find unexpected sources in data taken for a different purpose. In any and all of these cases, having a toolbox to explore a data set will be crucial to extracting information from the data.

In this tutorial, we will present and practice some basic ways to represent and explore data, using an example data set from [???]. At the same time, we will be giving short tutorials for some of the most useful python packages for data analysis. The goal is to give you an overview of the various methods you might find useful, where to find them, and a starting point for the exploration of your own data sets.

A short overview of python packages

  • What they do and where to find them
  • numerical stuff: numpy + scipy
  • plotting: matplotlib and seaborn
  • data I/O: pandas, numpy (ASCII data), astropy.io, scipy.readsav

1: Loading and working with Data

ASCII Files

  • numpy.loadtxt

Reading/Writing FITS files

  • astropy.io.fits

IDL .sav files

  • scipy.readsav

Excursion: Advanced Data Handling in Pandas

  • Pandas stuff

2: Numerical Summaries of Data

  • mean, media, mode
  • variance and standard deviation
  • standard errors
  • ???
  • excursion: random sampling and statistical distributions in numpy and scipy

3: Visual Summaries of Data

  • Intro to matplotlib: line plots, histograms, errorbars, scatter plots
  • useful things for manipulating plots?
  • More useful plotting methods in seaborn: rug plots, bar plots and violin plots
  • customizing figures

4: But wait, my data is multidimensional!

  • correlations and simple linear regression, (weighted) least squares (scipy)
  • more plotting stuff in matplotlib and seaborn

Bonus: interactive plotting?


In [ ]: