Title: Dropping Rows And Columns In pandas Dataframe
Slug: pandas_dropping_column_and_rows
Summary: Dropping Rows And Columns In pandas Dataframe
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

Import modules


In [1]:
import pandas as pd

Create a dataframe


In [2]:
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'year': [2012, 2012, 2013, 2014, 2014], 
        'reports': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
df


Out[2]:
name reports year
Cochice Jason 4 2012
Pima Molly 24 2012
Santa Cruz Tina 31 2013
Maricopa Jake 2 2014
Yuma Amy 3 2014

Drop an observation (row)


In [3]:
df.drop(['Cochice', 'Pima'])


Out[3]:
name reports year
Santa Cruz Tina 31 2013
Maricopa Jake 2 2014
Yuma Amy 3 2014

Drop a variable (column)

Note: axis=1 denotes that we are referring to a column, not a row


In [4]:
df.drop('reports', axis=1)


Out[4]:
name year
Cochice Jason 2012
Pima Molly 2012
Santa Cruz Tina 2013
Maricopa Jake 2014
Yuma Amy 2014

Drop a row if it contains a certain value (in this case, "Tina")

Specifically: Create a new dataframe called df that includes all rows where the value of a cell in the name column does not equal "Tina"


In [5]:
df[df.name != 'Tina']


Out[5]:
name reports year
Cochice Jason 4 2012
Pima Molly 24 2012
Maricopa Jake 2 2014
Yuma Amy 3 2014

Drop a row by row number (in this case, row 3)

Note that Pandas uses zero based numbering, so 0 is the first row, 1 is the second row, etc.


In [6]:
df.drop(df.index[2])


Out[6]:
name reports year
Cochice Jason 4 2012
Pima Molly 24 2012
Maricopa Jake 2 2014
Yuma Amy 3 2014

can be extended to dropping a range


In [7]:
df.drop(df.index[[2,3]])


Out[7]:
name reports year
Cochice Jason 4 2012
Pima Molly 24 2012
Yuma Amy 3 2014

or dropping relative to the end of the DF.


In [8]:
df.drop(df.index[-2])


Out[8]:
name reports year
Cochice Jason 4 2012
Pima Molly 24 2012
Santa Cruz Tina 31 2013
Yuma Amy 3 2014

you can select ranges relative to the top or drop relative to the bottom of the DF as well.


In [9]:
df[:3] #keep top 3


Out[9]:
name reports year
Cochice Jason 4 2012
Pima Molly 24 2012
Santa Cruz Tina 31 2013

In [10]:
df[:-3] #drop bottom 3


Out[10]:
name reports year
Cochice Jason 4 2012
Pima Molly 24 2012