Small Multiples



In [1]:

    
%matplotlib inline
import pandas as pd
from stemgraphic.grid import small_multiples

Using the Titanic dataset.



In [2]:

    
df = pd.read_csv('../datasets/train.csv', index_col='PassengerId')



In [3]:

    
df.head()









    Out[3]:







  
    
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
    
    
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
      0
      3
      Braund, Mr. Owen Harris
      male
      22.0
      1
      0
      A/5 21171
      7.2500
      NaN
      S
    
    
      2
      1
      1
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      female
      38.0
      1
      0
      PC 17599
      71.2833
      C85
      C
    
    
      3
      1
      3
      Heikkinen, Miss. Laina
      female
      26.0
      0
      0
      STON/O2. 3101282
      7.9250
      NaN
      S
    
    
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.0
      1
      0
      113803
      53.1000
      C123
      S
    
    
      5
      0
      3
      Allen, Mr. William Henry
      male
      35.0
      0
      0
      373450
      8.0500
      NaN
      S

Distribution of name as stem-and-leaf (alpha bigram in this case), by Male and Female. Using shared stem so they are back to back.



In [4]:

    
small_multiples(df, 'Name', legend=None, cols='Sex', 
                density=False, shared_stem=True, stem_display=True, flip_axes=False,
                limit_var=False);

The same, but using a density plot. Density plot recognizes variable as categorical and generates a name_code variable, since it can't plot density on names directly.



In [5]:

    
small_multiples(df, 'Name', legend=None, cols='Sex', density=True, flip_axes=False,
                limit_var=True);

Distribution of age, by sex, again using a shared stem, by port (C, Q, S - we'll use categorical further down)



In [6]:

    
small_multiples(df, 'Age', legend=None, cols='Sex',  rows='Embarked', 
                density=False, shared_stem=True, stem_display=True, flip_axes=False,
                limit_var=False);

Going back to name, this time not using a shared stem. Note O' name beginnings in Queenstown (nowadays Cobh, Ireland).



In [7]:

    
small_multiples(df, 'Name', legend=None, cols='Sex',  rows='Embarked', 
                density=False, stem_display=True, flip_axes=False,
                limit_var=False);

Using a layout with the legend at the top. Still not using a categorical, so passing row labels.



In [8]:

    
small_multiples(df, 'Age', legend='top', hues='Pclass', rows='Embarked', density=True,
             hue_labels=['1st', '2nd', '3rd'], row_labels=['Cherbourg', 'Queenstown', 'Southampton']);

Age can't be less than zero. Let density_plot infer from data with limit_var.



In [9]:

    
small_multiples(df, 'Age', hues='Pclass', limit_var=True, rows='Embarked', density=True,
             hue_labels=['1st', '2nd', '3rd'], row_labels=['Cherbourg', 'Queenstown', 'Southampton']);



In [10]:

    
fig, ax, x = small_multiples(df, 'Fare', hues='Pclass', legend='on', limit_var=True, cols='Sex', density=True,
                          hue_labels=['1st', '2nd', '3rd'])



In [11]:

    
fig, ax, x = small_multiples(df[df.Embarked=='Q'], 'Fare', hues='Pclass', legend='on', rows='Survived', density=True,
                          hue_labels=['1st', '2nd', '3rd'], row_labels=['Did not survive', 'Survived'])









    



/home/fdion/anaconda3_5/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:494: RuntimeWarning: invalid value encountered in true_divide
  binned = fast_linbin(X,a,b,gridsize)/(delta*nobs)
/home/fdion/anaconda3_5/lib/python3.6/site-packages/statsmodels/nonparametric/kdetools.py:34: RuntimeWarning: invalid value encountered in double_scalars
  FAC1 = 2*(np.pi*bw/RANGE)**2

categorical



In [12]:

    
df2 = df.copy()



In [13]:

    
df2.Survived = df2.Survived.astype('category')



In [14]:

    
df2.Survived.cat.set_categories(['Did not survived', 'survived'], rename=True, inplace=True)



In [15]:

    
df2.Pclass = df2.Pclass.astype('category')
df2.Pclass.cat.set_categories(['1st', '2nd', '3rd'], rename=True, inplace=True)



In [16]:

    
df2.Embarked = df2.Embarked.astype('category')
df2.Embarked.cat.set_categories(['Cherbourg', 'Queenstown', 'Southampton'], rename=True, inplace=True)



In [17]:

    
df2.Sex = df2.Sex.astype('category')



In [18]:

    
df2.head()









    Out[18]:







  
    
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
      Name_code
    
    
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
      Did not survived
      3rd
      Braund, Mr. Owen Harris
      male
      22.0
      1
      0
      A/5 21171
      7.2500
      NaN
      Southampton
      108
    
    
      2
      survived
      1st
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      female
      38.0
      1
      0
      PC 17599
      71.2833
      C85
      Cherbourg
      190
    
    
      3
      survived
      3rd
      Heikkinen, Miss. Laina
      female
      26.0
      0
      0
      STON/O2. 3101282
      7.9250
      NaN
      Southampton
      353
    
    
      4
      survived
      1st
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.0
      1
      0
      113803
      53.1000
      C123
      Southampton
      272
    
    
      5
      Did not survived
      3rd
      Allen, Mr. William Henry
      male
      35.0
      0
      0
      373450
      8.0500
      NaN
      Southampton
      15



In [19]:

    
fig, ax, x = small_multiples(df2[df2.Pclass=='3rd'], 'Age', hues='Pclass', rows='Embarked', cols='Sex', 
                             density=True, rug=True, limit_var=True);



In [20]:

    
small_multiples(df2[df2.Pclass.isin(['1st','2nd'])], 'Age', hues='Pclass', rows='Embarked', cols='Sex',
                density=True, hist=True, strip=True, jitter=True);



In [21]:

    
small_multiples(df2, 'Age', hues='Pclass', rows='Embarked', cols='Sex',
                density=True, hist=True, limit_var=True);



In [ ]:

	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
PassengerId
1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C
3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S
4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S
5	0	3	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	NaN	S

	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Name_code
PassengerId
1	Did not survived	3rd	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	Southampton	108
2	survived	1st	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	Cherbourg	190
3	survived	3rd	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	Southampton	353
4	survived	1st	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	Southampton	272
5	Did not survived	3rd	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	NaN	Southampton	15