notebook.community

Edit and run



In [11]:

    
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
from scipy import stats
# import qgrid
import matplotlib.pyplot as plt
import seaborn as sns



In [12]:

    
df = pd.read_csv('buffy.csv')

Your first step should probably be to check for whether the dataframe is formatted as you'd expect it to be. Here's the definition of a dataframe according to Python for Data Analysis:

A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.).



In [24]:

    
df.columns









    Out[24]:





Index([u'Character', u'Species', u'Height (inches)', u'Actor DOB', u'Number of Episodes', u'Ranking', u'Gender'], dtype='object')



In [25]:

    
# display first five rows
df.head()









    Out[25]:






  
    
      
      Character
      Species
      Height (inches)
      Actor DOB
      Number of Episodes
      Ranking
      Gender
    
  
  
    
      0
          Buffy
       Human
       64.0
       4/14/77
       145
        5
       F
    
    
      1
         Xander
       Human
       70.0
       4/12/71
       145
       11
       M
    
    
      2
         Willow
       Human
       64.5
       3/24/74
       144
        1
       F
    
    
      3
          Giles
       Human
       73.0
       2/20/54
       123
        3
       M
    
    
      4
       Cordelia
       Human
       67.0
       7/23/70
        58
        7
       F



In [26]:

    
df.tail()









    Out[26]:






  
    
      
      Character
      Species
      Height (inches)
      Actor DOB
      Number of Episodes
      Ranking
      Gender
    
  
  
    
      9 
           Tara
         Human
       64
         1/8/77
       47
       13
       F
    
    
      10
           Dawn
         Human
       66
       10/11/85
       66
       50
       F
    
    
      11
          Joyce
         Human
       67
        4/17/55
       58
       24
       F
    
    
      12
          Faith
         Human
       65
       12/30/80
       20
        4
       F
    
    
      13
       Drusilla
       Vampire
       66
        3/30/65
       17
       12
       F



In [27]:

    
df.describe()









    Out[27]:






  
    
      
      Height (inches)
      Number of Episodes
      Ranking
    
  
  
    
      count
       14.000000
        14.000000
       14.000000
    
    
      mean
       66.857143
        78.857143
       12.785714
    
    
      std
        3.236977
        45.195060
       12.741082
    
    
      min
       63.500000
        17.000000
        1.000000
    
    
      25%
       64.125000
        49.750000
        4.250000
    
    
      50%
       66.000000
        62.500000
       10.500000
    
    
      75%
       68.500000
       116.500000
       16.750000
    
    
      max
       73.000000
       145.000000
       50.000000



In [28]:

    
# check for null values
pd.notnull(df)









    Out[28]:






  
    
      
      Character
      Species
      Height (inches)
      Actor DOB
      Number of Episodes
      Ranking
      Gender
    
  
  
    
      0 
       True
       True
       True
       True
       True
       True
       True
    
    
      1 
       True
       True
       True
       True
       True
       True
       True
    
    
      2 
       True
       True
       True
       True
       True
       True
       True
    
    
      3 
       True
       True
       True
       True
       True
       True
       True
    
    
      4 
       True
       True
       True
       True
       True
       True
       True
    
    
      5 
       True
       True
       True
       True
       True
       True
       True
    
    
      6 
       True
       True
       True
       True
       True
       True
       True
    
    
      7 
       True
       True
       True
       True
       True
       True
       True
    
    
      8 
       True
       True
       True
       True
       True
       True
       True
    
    
      9 
       True
       True
       True
       True
       True
       True
       True
    
    
      10
       True
       True
       True
       True
       True
       True
       True
    
    
      11
       True
       True
       True
       True
       True
       True
       True
    
    
      12
       True
       True
       True
       True
       True
       True
       True
    
    
      13
       True
       True
       True
       True
       True
       True
       True

A column in a DataFrame can be retrieved as a Series either by dict-like notation or by attribute:



In [29]:

    
# attribute
df.Character









    Out[29]:





0        Buffy
1       Xander
2       Willow
3        Giles
4     Cordelia
5        Angel
6           Oz
7        Spike
8         Anya
9         Tara
10        Dawn
11       Joyce
12       Faith
13    Drusilla
Name: Character, dtype: object



In [30]:

    
# dict
df['Character']









    Out[30]:





0        Buffy
1       Xander
2       Willow
3        Giles
4     Cordelia
5        Angel
6           Oz
7        Spike
8         Anya
9         Tara
10        Dawn
11       Joyce
12       Faith
13    Drusilla
Name: Character, dtype: object

Rows can also be retrieved by position or name by a couple of methods, such as the ix indexing field (much more on this later):



In [33]:









    Out[33]:





Character               Angel
Species               Vampire
Height (inches)            73
Actor DOB             5/16/69
Number of Episodes         59
Ranking                    19
Gender                      M
Name: 5, dtype: object



In [ ]:

    
df.ix['Colorado', ['two', 'three']]

	Character	Species	Height (inches)	Actor DOB	Number of Episodes	Ranking	Gender
0	Buffy	Human	64.0	4/14/77	145	5	F
1	Xander	Human	70.0	4/12/71	145	11	M
2	Willow	Human	64.5	3/24/74	144	1	F
3	Giles	Human	73.0	2/20/54	123	3	M
4	Cordelia	Human	67.0	7/23/70	58	7	F

	Character	Species	Height (inches)	Actor DOB	Number of Episodes	Ranking	Gender
9	Tara	Human	64	1/8/77	47	13	F
10	Dawn	Human	66	10/11/85	66	50	F
11	Joyce	Human	67	4/17/55	58	24	F
12	Faith	Human	65	12/30/80	20	4	F
13	Drusilla	Vampire	66	3/30/65	17	12	F

	Height (inches)	Number of Episodes	Ranking
count	14.000000	14.000000	14.000000
mean	66.857143	78.857143	12.785714
std	3.236977	45.195060	12.741082
min	63.500000	17.000000	1.000000
25%	64.125000	49.750000	4.250000
50%	66.000000	62.500000	10.500000
75%	68.500000	116.500000	16.750000
max	73.000000	145.000000	50.000000

	Character	Species	Height (inches)	Actor DOB	Number of Episodes	Ranking	Gender
0	True	True	True	True	True	True	True
1	True	True	True	True	True	True	True
2	True	True	True	True	True	True	True
3	True	True	True	True	True	True	True
4	True	True	True	True	True	True	True
5	True	True	True	True	True	True	True
6	True	True	True	True	True	True	True
7	True	True	True	True	True	True	True
8	True	True	True	True	True	True	True
9	True	True	True	True	True	True	True
10	True	True	True	True	True	True	True
11	True	True	True	True	True	True	True
12	True	True	True	True	True	True	True
13	True	True	True	True	True	True	True