Data Frames

Definition: A Series is the programmatic embodiement of a data line, i.e. a labelled sequence of data points:

$$x_{i_1},x_{i_2},\dots, x_{i_n}$$

If the labels $i_1,\dots, i_n$ correspond to

  • unique ID of individuals from a population $\Omega=\{1,\dots,N\}$, and the data points correspond to the value of some characteristic $X:\Omega\rightarrow \mathbb R$ for these individuals, the data line is called a population sample (of observations): $x_{i_l} = X(i_l).$
  • instant in times $i_1 < i_2 <\cdots < i_n$ and the data points correspond to the value of the characteristic $X$ of a single fixed individual $\omega\in \Omega$ but at different moment in times, the data line is called a time series: $x_{i_l} = X_{t=i_l}(\omega).$

Definition: A Data Frame is the programmatic embodiement of a data table, i.e., a 2D array of values with rows and columns explicitely labelled. Usually,

  • the row labels, say $i_1,\dots, i_n$ correspond to a sample of the unique individual IDs in a population $\Omega = \{1,\dots, N\}$
  • the column labels correspond to the name $X,Y,Z$ of some characteristics $X,Y,Z:\Omega\rightarrow\mathbb R$ of our population $\Omega$
  • The actual values in the data tables are the values of these characteristics for the sampled individuals.

The class data.frame


In [ ]:

Reading data to a data frame


In [ ]:

Writing data from a data frame


In [ ]:

Data frames as lists


In [ ]:

Data frames as matrices


In [ ]:

Factors and modes


In [ ]:

Data transformations


In [ ]: