Title: Random Sampling Dataframe
Slug: pandas_sampling_dataframe
Summary: Random Sampling Dataframe
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

import modules



In [1]:

    
import pandas as pd
import numpy as np

Create dataframe



In [2]:

    
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df









    Out[2]:






  
    
      
      first_name
      last_name
      age
      preTestScore
      postTestScore
    
  
  
    
      0
      Jason
      Miller
      42
      4
      25
    
    
      1
      Molly
      Jacobson
      52
      24
      94
    
    
      2
      Tina
      Ali
      36
      31
      57
    
    
      3
      Jake
      Milner
      24
      2
      62
    
    
      4
      Amy
      Cooze
      73
      3
      70

Select a random subset of 2 without replacement



In [3]:

    
df.take(np.random.permutation(len(df))[:2])









    Out[3]:






  
    
      
      first_name
      last_name
      age
      preTestScore
      postTestScore
    
  
  
    
      1
      Molly
      Jacobson
      52
      24
      94
    
    
      4
      Amy
      Cooze
      73
      3
      70

	first_name	last_name	age	preTestScore	postTestScore
0	Jason	Miller	42	4	25
1	Molly	Jacobson	52	24	94
2	Tina	Ali	36	31	57
3	Jake	Milner	24	2	62
4	Amy	Cooze	73	3	70