In each case, it was useful to:
Visualize the data in various ways
Summarize the data, to reduce its dimensionality and bring out interesting structure
Classical statistics, such as
Means, medians and variances
Histograms
Correlation functions
and so on
are not only useful tools for summarizing data:
they get passed forward to new analyses where they are treated as data themselves.
One answer to our question "What is Data?" is therefore:
Data are *constants*
(usually numbers)
that we are *handed*
(typically in a data file)
that *we hope to learn something from.*
In the next session we'll look at how learning from data - inference - works.
In [ ]: