4. Explore the Data

"I don't know, what I don't know"

We want to first visually explore the data to see if we can confirm some of our initial hypotheses as well as make new hypothesis about the problem we are trying to solve.

For this we will start by loading the data and understanding the data structure of the dataframe we have.

Lets read the data



In [1]:

    
# Import the library we need, which is Pandas
import pandas as pd

You will find the variable df used quite often to store a dataframe



In [2]:

    
# Read the csv file of Monthwise Quantity and Price csv file we have.
df = pd.read_csv('MonthWiseMarketArrivals_clean.csv')

Understand Data Structure and Types



In [3]:

    
df.shape









    Out[3]:





(10227, 10)



In [4]:

    
df.head()









    Out[4]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      0
      ABOHAR(PB)
      January
      2005
      2350
      404
      493
      446
      PB
      ABOHAR
      January-2005
    
    
      1
      ABOHAR(PB)
      January
      2006
      900
      487
      638
      563
      PB
      ABOHAR
      January-2006
    
    
      2
      ABOHAR(PB)
      January
      2010
      790
      1283
      1592
      1460
      PB
      ABOHAR
      January-2010
    
    
      3
      ABOHAR(PB)
      January
      2011
      245
      3067
      3750
      3433
      PB
      ABOHAR
      January-2011
    
    
      4
      ABOHAR(PB)
      January
      2012
      1035
      523
      686
      605
      PB
      ABOHAR
      January-2012

Data Structure

So we have ten columns in our dataset. Let us understand what each one is.

Three are about the location of the Wholesale Market where Onion where sold.

state: This is the 2/3 letter abbreviation for the state in India (PB is Punjab and so on)
city: This is the city in India (ABOHAR, BANGALORE and so on)
market: This is a string with the combination of the state and city

Three are related to the

month: Month in January, February and so on.
year: Year in YYYY representastion
date: The combination of the two above.

Four are about quantity and price in these wholesale market.

quantity: The quanity of Onion arriving in the market in that month in quintals (100 kg)
priceMin: The minimum price in the month in Rs./quintal
priceMax: The maximum price in the month in Rs./quintal
priceMod: The modal price in the month in Rs./quintal

We would expect the following the columns to be of the following type

CATEGORICAL: state, city, market
TIME INTERVAL: month, year, date
QUANTITATIVE: quantity, priceMin, priceMax, priceModal

Let us see what pandas dataframe is reading these columns as.



In [5]:

    
# Get the typeof each column
df.dtypes









    Out[5]:





market      object
month       object
year         int64
quantity     int64
priceMin     int64
priceMax     int64
priceMod     int64
state       object
city        object
date        object
dtype: object

So we are getting the quantitive columns are correctly being shown as integers and the categorical columns are showing as objects(strings) which is fine. However, the date columns is being read as an object and not a Time-Interval. Let us at least fix the date column and make it into a datetime object



In [6]:

    
# Changing the date column to a Time Interval columnn
df.date = pd.DatetimeIndex(df.date)



In [13]:

    
df.shape









    Out[13]:





(10227, 10)



In [14]:

    
# Now checking for type of each column
df.dtypes









    Out[14]:





market              object
month               object
year                 int64
quantity             int64
priceMin             int64
priceMax             int64
priceMod             int64
state               object
city                object
date        datetime64[ns]
dtype: object



In [15]:

    
# Let us see the dataframe again now
df.head()









    Out[15]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      0
      ABOHAR(PB)
      January
      2005
      2350
      404
      493
      446
      PB
      ABOHAR
      2005-01-01
    
    
      1
      ABOHAR(PB)
      January
      2006
      900
      487
      638
      563
      PB
      ABOHAR
      2006-01-01
    
    
      2
      ABOHAR(PB)
      January
      2010
      790
      1283
      1592
      1460
      PB
      ABOHAR
      2010-01-01
    
    
      3
      ABOHAR(PB)
      January
      2011
      245
      3067
      3750
      3433
      PB
      ABOHAR
      2011-01-01
    
    
      4
      ABOHAR(PB)
      January
      2012
      1035
      523
      686
      605
      PB
      ABOHAR
      2012-01-01



In [21]:

    
# df.city.unique()
# Functional Approach
pd.unique(df.city)









    Out[21]:





array(['ABOHAR', 'AGRA', 'AHMEDABAD', 'AHMEDNAGAR', 'AJMER', 'ALIGARH',
       'ALWAR', 'AMRITSAR', 'BALLIA', 'BANGALORE', 'BAREILLY', 'BELGAUM',
       'BHATINDA', 'BHAVNAGAR', 'BHOPAL', 'BHUBNESWER', 'BIHARSHARIF',
       'BIJAPUR', 'BIKANER', 'BOMBORI', 'BURDWAN', 'CHAKAN', 'CHALLAKERE',
       'CHANDIGARH', 'CHANDVAD', 'CHENNAI', 'CHICKBALLAPUR', 'COIMBATORE',
       'DEESA', 'DEHRADOON', 'DELHI', 'DEORIA', 'DEVALA', 'DEWAS',
       'DHAVANGERE', 'DHULIA', 'DINDIGUL', 'DINDORI', 'ETAWAH', 'GONDAL',
       'GORAKHPUR', 'GUWAHATI', 'HALDWANI', 'HASSAN', 'HOSHIARPUR',
       'HUBLI', 'HYDERABAD', 'INDORE', 'JAIPUR', 'JALANDHAR', 'JALGAON',
       'JAMMU', 'JAMNAGAR', 'JODHPUR', 'JUNNAR', 'KALVAN', 'KANPUR',
       'KARNAL', 'KHANNA', 'KOLAR', 'KOLHAPUR', 'KOLKATA', 'KOPERGAON',
       'KOTA', 'KURNOOL', 'LASALGAON', 'LONAND', 'LUCKNOW', 'LUDHIANA',
       'MADURAI', 'MAHUVA', 'MALEGAON', 'MANDSOUR', 'MANMAD', 'MEERUT',
       'MIDNAPUR', 'MUMBAI', 'NAGPUR', 'NANDGAON', 'NASIK', 'NEEMUCH',
       'NEWASA', 'NIPHAD', 'PALAYAM', 'PATIALA', 'PATNA', 'PHALTAN ',
       'PIMPALGAON', 'PUNE', 'PURULIA', 'RAHATA', 'RAHURI', 'RAICHUR',
       'RAIPUR', 'RAJAHMUNDRY', 'RAJKOT', 'RANCHI', 'SAGAR', 'SAIKHEDA',
       'SANGALI', 'SANGAMNER', 'SATANA', 'SHEROAPHULY', 'SHIMLA',
       'SHRIRAMPUR', 'SINNAR', 'SOLAPUR', 'SRIGANGANAGAR', 'SRINAGAR',
       'SRIRAMPUR', 'SURAT', 'TRIVENDRUM', 'UDAIPUR', 'UJJAIN', 'VANI',
       'VARANASI', 'YEOLA'], dtype=object)

Question 1 - How big is the Bangalore onion market compared to other cities in India?

Let us try to do this examination for one of the year only. So we want to reduce our dataframe for only where the year = 2010. This process is called subset.

PRINCIPLE: Subset a Dataframe

How do you subset a dataframe on a given criteria

newDataframe = df[ <subset condition> ]



In [22]:

    
df2010 = df[df.year == 2010]



In [23]:

    
df2010.head()









    Out[23]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      2
      ABOHAR(PB)
      January
      2010
      790
      1283
      1592
      1460
      PB
      ABOHAR
      2010-01-01
    
    
      10
      ABOHAR(PB)
      February
      2010
      555
      1143
      1460
      1322
      PB
      ABOHAR
      2010-02-01
    
    
      18
      ABOHAR(PB)
      March
      2010
      385
      510
      878
      688
      PB
      ABOHAR
      2010-03-01
    
    
      25
      ABOHAR(PB)
      April
      2010
      840
      466
      755
      611
      PB
      ABOHAR
      2010-04-01
    
    
      32
      ABOHAR(PB)
      May
      2010
      2050
      391
      578
      494
      PB
      ABOHAR
      2010-05-01



In [24]:

    
# We can also subset on multiple criterias
df2010Bang = df[(df.year == 2010) & (df.city == 'BANGALORE')]



In [25]:

    
df2010Bang.head()









    Out[25]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      722
      BANGALORE
      January
      2010
      423649
      502
      1837
      1267
      KNT
      BANGALORE
      2010-01-01
    
    
      735
      BANGALORE
      February
      2010
      316685
      495
      1429
      1011
      KNT
      BANGALORE
      2010-02-01
    
    
      748
      BANGALORE
      March
      2010
      368644
      386
      895
      661
      KNT
      BANGALORE
      2010-03-01
    
    
      760
      BANGALORE
      April
      2010
      404805
      287
      730
      545
      KNT
      BANGALORE
      2010-04-01
    
    
      772
      BANGALORE
      May
      2010
      395519
      306
      827
      564
      KNT
      BANGALORE
      2010-05-01

Principle: Split Apply Combine

How do we get the sum of quantity for each city.

We need to SPLIT the data by each city, APPLY the sum to the quantity row and then COMBINE the data again

In pandas, we use the groupby function to do this.



In [35]:

    
# Group by using city
df2010City = df2010.groupby(['city']).sum()



In [36]:

    
df2010City.head()



In [37]:

    
type(df2010City)









    Out[37]:





pandas.core.frame.DataFrame



In [38]:

    
# If we only want to apply the sum function on quantity, then we specify the quantity column
df2010City = df2010.groupby(['city']).quantity.sum()



In [39]:

    
# Let us see this dataframe
df2010City.head()









    Out[39]:





city
ABOHAR          11835
AGRA           756755
AHMEDABAD     1135418
AHMEDNAGAR    1678032
ALWAR          561145
Name: quantity, dtype: int64



In [40]:

    
type(df2010City)









    Out[40]:





pandas.core.series.Series



In [41]:

    
# To create a dataframe again, it is best to specify index as false
df2010City = df2010.groupby(['city'], as_index=False).quantity.sum()



In [42]:

    
df2010City.head()



In [45]:

    
sorted(df2010City.quantity)



In [46]:

    
# Sort the Dataframe by Quantity to see which one is on top
df2010City = df2010City.sort_values(by = "quantity", ascending = False)
df2010City.head()



In [47]:

    
%timeit sorted(df2010City.quantity)









    



The slowest run took 92.81 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 25.5 µs per loop



In [49]:

    
%timeit df2010City.quantity.sort_values()









    



The slowest run took 20.41 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 170 µs per loop



In [48]:

    
%timeit df2010City.sort_values(by = "quantity", ascending = False)









    



1000 loops, best of 3: 381 µs per loop



In [ ]:

PRINCIPLE: Visual Exploration

Lets load the libraries required for plotting in python



In [50]:

    
# Load the visualisation libraries - Matplotlib
import matplotlib.pyplot as plt



In [51]:

    
# Let us see the output plots in the notebook itself
%matplotlib inline



In [52]:

    
# Set some parameters to get good visuals - style to ggplot and size to 15,10
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (15, 10)



In [54]:

    
# Plot the Data
df2010City.plot(kind ="barh", x = 'city', y = 'quantity')









    Out[54]:





<matplotlib.axes._subplots.AxesSubplot at 0x117bdac18>

Exercise: Find the State with Highest Quantity Sales in 2015?



In [ ]:

    
df2015 = df[]



In [ ]:

Exercise: Plot the State and Cities with Quantity Sales in 2015



In [ ]:



In [ ]:



In [ ]:

Question 2 - Have the price variation in Onion prices in Bangalore really gone up over the years?



In [55]:

    
df.head()









    Out[55]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      0
      ABOHAR(PB)
      January
      2005
      2350
      404
      493
      446
      PB
      ABOHAR
      2005-01-01
    
    
      1
      ABOHAR(PB)
      January
      2006
      900
      487
      638
      563
      PB
      ABOHAR
      2006-01-01
    
    
      2
      ABOHAR(PB)
      January
      2010
      790
      1283
      1592
      1460
      PB
      ABOHAR
      2010-01-01
    
    
      3
      ABOHAR(PB)
      January
      2011
      245
      3067
      3750
      3433
      PB
      ABOHAR
      2011-01-01
    
    
      4
      ABOHAR(PB)
      January
      2012
      1035
      523
      686
      605
      PB
      ABOHAR
      2012-01-01



In [56]:

    
dfBang = df[df.city == 'BANGALORE']



In [57]:

    
dfBang.head()









    Out[57]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      716
      BANGALORE
      January
      2004
      227832
      916
      1066
      991
      KNT
      BANGALORE
      2004-01-01
    
    
      717
      BANGALORE
      January
      2005
      335679
      470
      597
      522
      KNT
      BANGALORE
      2005-01-01
    
    
      718
      BANGALORE
      January
      2006
      412185
      286
      617
      537
      KNT
      BANGALORE
      2006-01-01
    
    
      719
      BANGALORE
      January
      2007
      268268
      586
      1167
      942
      KNT
      BANGALORE
      2007-01-01
    
    
      720
      BANGALORE
      January
      2008
      393806
      174
      671
      472
      KNT
      BANGALORE
      2008-01-01



In [58]:

    
dfBang.describe()









    Out[58]:






  
    
      
      year
      quantity
      priceMin
      priceMax
      priceMod
    
  
  
    
      count
      146.000000
      146.000000
      146.000000
      146.000000
      146.000000
    
    
      mean
      2009.589041
      523917.424658
      557.541096
      1313.883562
      1043.547945
    
    
      std
      3.522678
      302369.030403
      381.415623
      911.101944
      667.509329
    
    
      min
      2004.000000
      63824.000000
      145.000000
      338.000000
      320.000000
    
    
      25%
      2007.000000
      329690.000000
      307.000000
      685.000000
      551.000000
    
    
      50%
      2010.000000
      403530.500000
      448.000000
      1019.000000
      836.000000
    
    
      75%
      2013.000000
      661252.000000
      653.000000
      1636.000000
      1325.500000
    
    
      max
      2016.000000
      1639032.000000
      2377.000000
      4698.000000
      3430.000000



In [59]:

    
# Reduce the precision of numbers - so that it is easy to read
pd.set_option('precision', 0)



In [60]:

    
dfBang.describe()

PRINCIPLE: Setting Index



In [61]:

    
dfBang.head()









    Out[61]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      716
      BANGALORE
      January
      2004
      227832
      916
      1066
      991
      KNT
      BANGALORE
      2004-01-01
    
    
      717
      BANGALORE
      January
      2005
      335679
      470
      597
      522
      KNT
      BANGALORE
      2005-01-01
    
    
      718
      BANGALORE
      January
      2006
      412185
      286
      617
      537
      KNT
      BANGALORE
      2006-01-01
    
    
      719
      BANGALORE
      January
      2007
      268268
      586
      1167
      942
      KNT
      BANGALORE
      2007-01-01
    
    
      720
      BANGALORE
      January
      2008
      393806
      174
      671
      472
      KNT
      BANGALORE
      2008-01-01



In [62]:

    
dfBang.index









    Out[62]:





Int64Index([716, 717, 718, 719, 720, 721, 722, 723, 724, 725,
            ...
            852, 853, 854, 855, 856, 857, 858, 859, 860, 861],
           dtype='int64', length=146)



In [63]:

    
# Set the index as date
dfBang = dfBang.sort_values(by = "date")
dfBang.head()









    Out[63]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      716
      BANGALORE
      January
      2004
      227832
      916
      1066
      991
      KNT
      BANGALORE
      2004-01-01
    
    
      729
      BANGALORE
      February
      2004
      225133
      741
      870
      793
      KNT
      BANGALORE
      2004-02-01
    
    
      742
      BANGALORE
      March
      2004
      221952
      527
      586
      556
      KNT
      BANGALORE
      2004-03-01
    
    
      754
      BANGALORE
      April
      2004
      185150
      419
      518
      465
      KNT
      BANGALORE
      2004-04-01
    
    
      766
      BANGALORE
      May
      2004
      137390
      400
      516
      455
      KNT
      BANGALORE
      2004-05-01



In [64]:

    
# Set the Index for the Dataframe
dfBang.index = pd.PeriodIndex(dfBang.date, freq='M')



In [65]:

    
dfBang.head()









    Out[65]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
    
  
  
    
      2004-01
      BANGALORE
      January
      2004
      227832
      916
      1066
      991
      KNT
      BANGALORE
      2004-01-01
    
    
      2004-02
      BANGALORE
      February
      2004
      225133
      741
      870
      793
      KNT
      BANGALORE
      2004-02-01
    
    
      2004-03
      BANGALORE
      March
      2004
      221952
      527
      586
      556
      KNT
      BANGALORE
      2004-03-01
    
    
      2004-04
      BANGALORE
      April
      2004
      185150
      419
      518
      465
      KNT
      BANGALORE
      2004-04-01
    
    
      2004-05
      BANGALORE
      May
      2004
      137390
      400
      516
      455
      KNT
      BANGALORE
      2004-05-01



In [66]:

    
dfBang.priceMod.plot()









    Out[66]:





<matplotlib.axes._subplots.AxesSubplot at 0x1187e8438>



In [67]:

    
dfBang.plot(kind = "line", y = ['priceMin', 'priceMod', 'priceMax'])









    Out[67]:





<matplotlib.axes._subplots.AxesSubplot at 0x108c8c400>

To calculate the range of change, we will create a new price difference variable - which is the difference between the priceMin and priceMax



In [68]:

    
dfBang['priceDiff'] = dfBang['priceMax'] - dfBang['priceMin']



In [69]:

    
dfBang.head()









    Out[69]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
      priceDiff
    
  
  
    
      2004-01
      BANGALORE
      January
      2004
      227832
      916
      1066
      991
      KNT
      BANGALORE
      2004-01-01
      150
    
    
      2004-02
      BANGALORE
      February
      2004
      225133
      741
      870
      793
      KNT
      BANGALORE
      2004-02-01
      129
    
    
      2004-03
      BANGALORE
      March
      2004
      221952
      527
      586
      556
      KNT
      BANGALORE
      2004-03-01
      59
    
    
      2004-04
      BANGALORE
      April
      2004
      185150
      419
      518
      465
      KNT
      BANGALORE
      2004-04-01
      99
    
    
      2004-05
      BANGALORE
      May
      2004
      137390
      400
      516
      455
      KNT
      BANGALORE
      2004-05-01
      116



In [70]:

    
dfBang.plot(kind = 'line', y = 'priceDiff')









    Out[70]:





<matplotlib.axes._subplots.AxesSubplot at 0x1189360f0>

PRINCIPLE: Pivot Table

Pivot table is a way to summarize data frame data into index (rows), columns and value



In [71]:

    
# Create new variable for Integer Month
dfBang['monthVal'] = pd.DatetimeIndex(dfBang['date']).month



In [72]:

    
dfBang.head()









    Out[72]:






  
    
      
      market
      month
      year
      quantity
      priceMin
      priceMax
      priceMod
      state
      city
      date
      priceDiff
      monthVal
    
  
  
    
      2004-01
      BANGALORE
      January
      2004
      227832
      916
      1066
      991
      KNT
      BANGALORE
      2004-01-01
      150
      1
    
    
      2004-02
      BANGALORE
      February
      2004
      225133
      741
      870
      793
      KNT
      BANGALORE
      2004-02-01
      129
      2
    
    
      2004-03
      BANGALORE
      March
      2004
      221952
      527
      586
      556
      KNT
      BANGALORE
      2004-03-01
      59
      3
    
    
      2004-04
      BANGALORE
      April
      2004
      185150
      419
      518
      465
      KNT
      BANGALORE
      2004-04-01
      99
      4
    
    
      2004-05
      BANGALORE
      May
      2004
      137390
      400
      516
      455
      KNT
      BANGALORE
      2004-05-01
      116
      5



In [73]:

    
dfBangPivot = pd.pivot_table(dfBang, values = "priceDiff", 
                             columns = "year", index = "monthVal")



In [75]:

    
dfBangPivot



In [76]:

    
dfBangPivot.plot()









    Out[76]:





<matplotlib.axes._subplots.AxesSubplot at 0x108d23080>



In [77]:

    
dfBangPivot.plot(subplots = True, figsize=(15, 15), layout=(3, 5), sharey=True)









    Out[77]:





array([[<matplotlib.axes._subplots.AxesSubplot object at 0x108d565c0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x111b92d30>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x111be05c0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x111c1ba20>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x111c6b358>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x111ca7128>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x112b8ac18>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1195aca90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x119ad39e8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x119fcf550>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x11a1346d8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11a17ebe0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11a1baf60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11a3079b0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11a344cc0>]], dtype=object)

Exercise: Find the price variation for LASALGAON city?



In [78]:

    
dfBangPivot.plot(subplots = True, figsize=(15, 15), layout=(3, 5), 
                 sharey=False)









    Out[78]:





array([[<matplotlib.axes._subplots.AxesSubplot object at 0x11af325f8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b3f02e8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b543d68>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b584198>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b5cdb38>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x11b60ac50>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b659908>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b695748>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b6de828>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b71b518>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x11b768828>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11b7b2cc0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x119595be0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11ba03390>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11ba44048>]], dtype=object)



In [ ]:

	year	quantity	priceMin	priceMax	priceMod
city
ABOHAR	24120	11835	10896	16130	14012
AGRA	24120	756755	13213	14827	14149
AHMEDABAD	24120	1135418	8910	15819	13539
AHMEDNAGAR	24120	1678032	3955	20984	12923
ALWAR	24120	561145	8850	14717	11317

year	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
monthVal
1	150	127	331	581	497	706	1335	1138	575	973	892	1654	1743
2	129	110	209	611	373	663	934	218	511	907	721	1409	1305
3	59	79	147	398	334	387	509	137	484	745	784	1022	NaN
4	99	75	169	181	274	341	443	196	421	634	859	1012	NaN
5	116	56	142	196	331	288	521	186	447	764	1245	1491	NaN
6	135	107	157	159	350	466	634	378	498	997	1675	1416	NaN
7	167	176	189	248	348	374	629	292	514	1143	1446	1608	NaN
8	145	169	192	381	317	379	668	465	609	2849	1580	3256	NaN
9	98	219	290	382	537	495	324	939	687	2793	1377	2864	NaN
10	111	602	319	914	588	1259	405	826	724	2485	1432	2980	NaN
11	120	1131	337	824	604	1616	725	866	1089	3083	2164	2933	NaN
12	177	521	189	643	604	1914	913	679	978	1463	1900	1813	NaN

	market	month	year	quantity	priceMin	priceMax	priceMod	state	city	date
0	ABOHAR(PB)	January	2005	2350	404	493	446	PB	ABOHAR	January-2005
1	ABOHAR(PB)	January	2006	900	487	638	563	PB	ABOHAR	January-2006
2	ABOHAR(PB)	January	2010	790	1283	1592	1460	PB	ABOHAR	January-2010
3	ABOHAR(PB)	January	2011	245	3067	3750	3433	PB	ABOHAR	January-2011
4	ABOHAR(PB)	January	2012	1035	523	686	605	PB	ABOHAR	January-2012

	market	month	year	quantity	priceMin	priceMax	priceMod	state	city	date
722	BANGALORE	January	2010	423649	502	1837	1267	KNT	BANGALORE	2010-01-01
735	BANGALORE	February	2010	316685	495	1429	1011	KNT	BANGALORE	2010-02-01
748	BANGALORE	March	2010	368644	386	895	661	KNT	BANGALORE	2010-03-01
760	BANGALORE	April	2010	404805	287	730	545	KNT	BANGALORE	2010-04-01
772	BANGALORE	May	2010	395519	306	827	564	KNT	BANGALORE	2010-05-01

	city	quantity
6	BANGALORE	6079067
16	DELHI	3508582
39	KOLKATA	3495320
56	PUNE	3326024
67	SOLAPUR	3310419

	year	quantity	priceMin	priceMax	priceMod
count	146.000000	146.000000	146.000000	146.000000	146.000000
mean	2009.589041	523917.424658	557.541096	1313.883562	1043.547945
std	3.522678	302369.030403	381.415623	911.101944	667.509329
min	2004.000000	63824.000000	145.000000	338.000000	320.000000
25%	2007.000000	329690.000000	307.000000	685.000000	551.000000
50%	2010.000000	403530.500000	448.000000	1019.000000	836.000000
75%	2013.000000	661252.000000	653.000000	1636.000000	1325.500000
max	2016.000000	1639032.000000	2377.000000	4698.000000	3430.000000

	market	month	year	quantity	priceMin	priceMax	priceMod	state	city	date
2004-01	BANGALORE	January	2004	227832	916	1066	991	KNT	BANGALORE	2004-01-01
2004-02	BANGALORE	February	2004	225133	741	870	793	KNT	BANGALORE	2004-02-01
2004-03	BANGALORE	March	2004	221952	527	586	556	KNT	BANGALORE	2004-03-01
2004-04	BANGALORE	April	2004	185150	419	518	465	KNT	BANGALORE	2004-04-01
2004-05	BANGALORE	May	2004	137390	400	516	455	KNT	BANGALORE	2004-05-01