In [11]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
from scipy import stats
# import qgrid
import matplotlib.pyplot as plt
import seaborn as sns
In [12]:
df = pd.read_csv('buffy.csv')
Your first step should probably be to check for whether the dataframe is formatted as you'd expect it to be. Here's the definition of a dataframe according to Python for Data Analysis:
A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.).
In [24]:
df.columns
Out[24]:
In [25]:
# display first five rows
df.head()
Out[25]:
In [26]:
df.tail()
Out[26]:
In [27]:
df.describe()
Out[27]:
In [28]:
# check for null values
pd.notnull(df)
Out[28]:
A column in a DataFrame can be retrieved as a Series either by dict-like notation or by attribute:
In [29]:
# attribute
df.Character
Out[29]:
In [30]:
# dict
df['Character']
Out[30]:
Rows can also be retrieved by position or name by a couple of methods, such as the ix indexing field (much more on this later):
In [33]:
Out[33]:
In [ ]:
df.ix['Colorado', ['two', 'three']]