Title: Descriptive Statistics For pandas Dataframe
Slug: pandas_dataframe_descriptive_stats
Summary: Descriptive Statistics For pandas Dataframe
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

Import modules


In [40]:
import pandas as pd

Create dataframe


In [41]:
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
df


Out[41]:
name age preTestScore postTestScore
0 Jason 42 4 25
1 Molly 52 24 94
2 Tina 36 31 57
3 Jake 24 2 62
4 Amy 73 3 70

5 rows × 4 columns

The sum of all the ages


In [42]:
df['age'].sum()


Out[42]:
227

Mean preTestScore


In [43]:
df['preTestScore'].mean()


Out[43]:
12.800000000000001

Cumulative sum of preTestScores, moving from the rows from the top


In [44]:
df['preTestScore'].cumsum()


Out[44]:
0     4
1    28
2    59
3    61
4    64
Name: preTestScore, dtype: int64

Summary statistics on preTestScore


In [45]:
df['preTestScore'].describe()


Out[45]:
count     5.000000
mean     12.800000
std      13.663821
min       2.000000
25%       3.000000
50%       4.000000
75%      24.000000
max      31.000000
Name: preTestScore, dtype: float64

Count the number of non-NA values


In [46]:
df['preTestScore'].count()


Out[46]:
5

Minimum value of preTestScore


In [47]:
df['preTestScore'].min()


Out[47]:
2

Maximum value of preTestScore


In [48]:
df['preTestScore'].max()


Out[48]:
31

Median value of preTestScore


In [49]:
df['preTestScore'].median()


Out[49]:
4.0

Sample variance of preTestScore values


In [50]:
df['preTestScore'].var()


Out[50]:
186.69999999999999

Sample standard deviation of preTestScore values


In [51]:
df['preTestScore'].std()


Out[51]:
13.663820841916802

Skewness of preTestScore values


In [52]:
df['preTestScore'].skew()


Out[52]:
0.74334524573267591

Kurtosis of preTestScore values


In [53]:
df['preTestScore'].kurt()


Out[53]:
-2.4673543738411525

Correlation Matrix Of Values


In [54]:
df.corr()


Out[54]:
age preTestScore postTestScore
age 1.000000 -0.105651 0.328852
preTestScore -0.105651 1.000000 0.378039
postTestScore 0.328852 0.378039 1.000000

3 rows × 3 columns

Covariance Matrix Of Values


In [55]:
df.cov()


Out[55]:
age preTestScore postTestScore
age 340.80 -26.65 151.20
preTestScore -26.65 186.70 128.65
postTestScore 151.20 128.65 620.30

3 rows × 3 columns