3. Refine the Data

"Data is messy"

We will be performing the following operation on our Onion price to refine it

  • Remove e.g. remove redundant data from the data frame
  • Derive e.g. State and City from the market field
  • Parse e.g. extract date from year and month column

Other stuff you may need to do to refine are...

  • Missing e.g. Check for missing or incomplete data
  • Quality e.g. Check for duplicates, accuracy, unusual data
  • Convert e.g. free text to coded value
  • Calculate e.g. percentages, proportion
  • Merge e.g. first and surname for full name
  • Aggregate e.g. rollup by year, cluster by area
  • Filter e.g. exclude based on location
  • Sample e.g. extract a representative data
  • Summary e.g. show summary stats like mean

In [1]:
library(plyr)
library(dplyr)
library(tidyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:plyr':

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


In [2]:
df <- read.csv('MonthWiseMarketArrivalsAll.csv', stringsAsFactor = FALSE)

In [3]:
str(df)


'data.frame':	10321 obs. of  7 variables:
 $ Market              : chr  "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" ...
 $ Month.Name          : chr  "January" "January" "January" "January" ...
 $ Year                : chr  "2005" "2006" "2010" "2011" ...
 $ Arrival..q.         : int  2350 900 790 245 1035 675 440 1305 1400 1800 ...
 $ Price.Minimum..Rs.q.: chr  "404" "487" "1283" "3067" ...
 $ Price.Maximum..Rs.q.: chr  "493" "638" "1592" "3750" ...
 $ Modal.Price..Rs.q.  : chr  "446" "563" "1460" "3433" ...

In [4]:
dim(df)


Out[4]:
  1. 10321
  2. 7

In [5]:
column_names <- c('market', 'month', 'year', 'quantity', 'priceMin', 'priceMax', 'priceMod')

In [6]:
colnames(df) <- column_names

In [7]:
head(df)


Out[7]:
marketmonthyearquantitypriceMinpriceMaxpriceMod
1ABOHAR(PB)January20052350404493446
2ABOHAR(PB)January2006900487638563
3ABOHAR(PB)January2010790128315921460
4ABOHAR(PB)January2011245306737503433
5ABOHAR(PB)January20121035523686605
6ABOHAR(PB)January2013675132719001605

In [8]:
class(df)


Out[8]:
'data.frame'

In [9]:
str(df)


'data.frame':	10321 obs. of  7 variables:
 $ market  : chr  "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" ...
 $ month   : chr  "January" "January" "January" "January" ...
 $ year    : chr  "2005" "2006" "2010" "2011" ...
 $ quantity: int  2350 900 790 245 1035 675 440 1305 1400 1800 ...
 $ priceMin: chr  "404" "487" "1283" "3067" ...
 $ priceMax: chr  "493" "638" "1592" "3750" ...
 $ priceMod: chr  "446" "563" "1460" "3433" ...

In [10]:
tail(df,1)


Out[10]:
marketmonthyearquantitypriceMinpriceMaxpriceMod
10321Total789858881646(Avg)1211(Avg)983(Avg)

In [11]:
df <- df %>% 
      filter( year != "Total")

In [12]:
tail(df)


Out[12]:
marketmonthyearquantitypriceMinpriceMaxpriceMod
10315YEOLA(MS)December20105758654127131830
10316YEOLA(MS)December2011131326282612526
10317YEOLA(MS)December201220706648513271136
10318YEOLA(MS)December201321588347214271177
10319YEOLA(MS)December201420107744616541456
10320YEOLA(MS)December201522331560914461126

In [13]:
dim(df)


Out[13]:
  1. 10320
  2. 7

In [14]:
# selecting particular columns from the df
head(df %>% 
     select(priceMin,priceMax,priceMod))


Out[14]:
priceMinpriceMaxpriceMod
1404493446
2487638563
3128315921460
4306737503433
5523686605
6132719001605

In [15]:
str(df)


'data.frame':	10320 obs. of  7 variables:
 $ market  : chr  "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" ...
 $ month   : chr  "January" "January" "January" "January" ...
 $ year    : chr  "2005" "2006" "2010" "2011" ...
 $ quantity: int  2350 900 790 245 1035 675 440 1305 1400 1800 ...
 $ priceMin: chr  "404" "487" "1283" "3067" ...
 $ priceMax: chr  "493" "638" "1592" "3750" ...
 $ priceMod: chr  "446" "563" "1460" "3433" ...

In [16]:
#we see that all of them are character, we know that they are integers, so lets convert
df$quantity <- as.numeric(df$quantity)
df$year     <- as.numeric(df$year)
df$priceMin <- as.numeric(df$priceMin)
df$priceMax <- as.numeric(df$priceMax)
df$priceMod <- as.numeric(df$priceMod)

In [17]:
str(df)


'data.frame':	10320 obs. of  7 variables:
 $ market  : chr  "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" ...
 $ month   : chr  "January" "January" "January" "January" ...
 $ year    : num  2005 2006 2010 2011 2012 ...
 $ quantity: num  2350 900 790 245 1035 ...
 $ priceMin: num  404 487 1283 3067 523 ...
 $ priceMax: num  493 638 1592 3750 686 ...
 $ priceMod: num  446 563 1460 3433 605 ...

In [18]:
head(df)


Out[18]:
marketmonthyearquantitypriceMinpriceMaxpriceMod
1ABOHAR(PB)January20052350404493446
2ABOHAR(PB)January2006900487638563
3ABOHAR(PB)January2010790128315921460
4ABOHAR(PB)January2011245306737503433
5ABOHAR(PB)January20121035523686605
6ABOHAR(PB)January2013675132719001605

In [19]:
summary(df)


Out[19]:
    market             month                year         quantity      
 Length:10320       Length:10320       Min.   :1996   Min.   :     15  
 Class :character   Class :character   1st Qu.:2006   1st Qu.:   8862  
 Mode  :character   Mode  :character   Median :2009   Median :  27414  
                                       Mean   :2009   Mean   :  76537  
                                       3rd Qu.:2013   3rd Qu.:  87875  
                                       Max.   :2016   Max.   :1639032  
    priceMin         priceMax       priceMod     
 Min.   :  16.0   Min.   : 145   Min.   :  80.0  
 1st Qu.: 210.0   1st Qu.: 560   1st Qu.: 450.0  
 Median : 441.0   Median : 925   Median : 747.0  
 Mean   : 646.2   Mean   :1211   Mean   : 982.7  
 3rd Qu.: 827.0   3rd Qu.:1520   3rd Qu.:1240.2  
 Max.   :6000.0   Max.   :8192   Max.   :6400.0  

Extracting the states from market names


In [20]:
library(stringr)

In [21]:
# we see that city and state are combined, lets split them up
df$market[1]
strsplit(as.character(df$market[1]),'\\(')[[1]][2]


Out[21]:
'ABOHAR(PB)'
Out[21]:
'PB)'

In [22]:
head(df)


Out[22]:
marketmonthyearquantitypriceMinpriceMaxpriceMod
1ABOHAR(PB)January20052350404493446
2ABOHAR(PB)January2006900487638563
3ABOHAR(PB)January2010790128315921460
4ABOHAR(PB)January2011245306737503433
5ABOHAR(PB)January20121035523686605
6ABOHAR(PB)January2013675132719001605

In [25]:
head(df)


Out[25]:
marketmonthyearquantitypriceMinpriceMaxpriceMod
1ABOHAR(PB)January20052350404493446
2ABOHAR(PB)January2006900487638563
3ABOHAR(PB)January2010790128315921460
4ABOHAR(PB)January2011245306737503433
5ABOHAR(PB)January20121035523686605
6ABOHAR(PB)January2013675132719001605

In [24]:
tail(df)


Out[24]:
marketmonthyearquantitypriceMinpriceMaxpriceMod
10315YEOLA(MS)December20105758654127131830
10316YEOLA(MS)December2011131326282612526
10317YEOLA(MS)December201220706648513271136
10318YEOLA(MS)December201321588347214271177
10319YEOLA(MS)December201420107744616541456
10320YEOLA(MS)December201522331560914461126

In [26]:
df <- df %>%
      mutate(market1 = market) %>%
      separate(market1, c("city", "state"), sep = "\\(")


Warning message:
: Too many values at 157 locations: 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, ...Warning message:
: Too few values at 1880 locations: 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, ...

In [27]:
head(df)


Out[27]:
marketmonthyearquantitypriceMinpriceMaxpriceModcitystate
1ABOHAR(PB)January20052350404493446ABOHARPB)
2ABOHAR(PB)January2006900487638563ABOHARPB)
3ABOHAR(PB)January2010790128315921460ABOHARPB)
4ABOHAR(PB)January2011245306737503433ABOHARPB)
5ABOHAR(PB)January20121035523686605ABOHARPB)
6ABOHAR(PB)January2013675132719001605ABOHARPB)

In [28]:
df$state <- df$state %>% str_replace("\\)","")

In [29]:
head(df)


Out[29]:
marketmonthyearquantitypriceMinpriceMaxpriceModcitystate
1ABOHAR(PB)January20052350404493446ABOHARPB
2ABOHAR(PB)January2006900487638563ABOHARPB
3ABOHAR(PB)January2010790128315921460ABOHARPB
4ABOHAR(PB)January2011245306737503433ABOHARPB
5ABOHAR(PB)January20121035523686605ABOHARPB
6ABOHAR(PB)January2013675132719001605ABOHARPB

In [30]:
head(df %>% filter(city == 'BANGALORE'))


Out[30]:
marketmonthyearquantitypriceMinpriceMaxpriceModcitystate
1BANGALOREJanuary20042278329161066991BANGALORENA
2BANGALOREJanuary2005335679470597522BANGALORENA
3BANGALOREJanuary2006412185286617537BANGALORENA
4BANGALOREJanuary20072682685861167942BANGALORENA
5BANGALOREJanuary2008393806174671472BANGALORENA
6BANGALOREJanuary200937438084815541328BANGALORENA

In [31]:
df <- df %>%
      mutate(state = ifelse(is.na(state), market, state))

In [32]:
dim(df)


Out[32]:
  1. 10320
  2. 9

In [33]:
tail(df)


Out[33]:
marketmonthyearquantitypriceMinpriceMaxpriceModcitystate
10315YEOLA(MS)December20105758654127131830YEOLAMS
10316YEOLA(MS)December2011131326282612526YEOLAMS
10317YEOLA(MS)December201220706648513271136YEOLAMS
10318YEOLA(MS)December201321588347214271177YEOLAMS
10319YEOLA(MS)December201420107744616541456YEOLAMS
10320YEOLA(MS)December201522331560914461126YEOLAMS

In [34]:
str(df)


'data.frame':	10320 obs. of  9 variables:
 $ market  : chr  "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" ...
 $ month   : chr  "January" "January" "January" "January" ...
 $ year    : num  2005 2006 2010 2011 2012 ...
 $ quantity: num  2350 900 790 245 1035 ...
 $ priceMin: num  404 487 1283 3067 523 ...
 $ priceMax: num  493 638 1592 3750 686 ...
 $ priceMod: num  446 563 1460 3433 605 ...
 $ city    : chr  "ABOHAR" "ABOHAR" "ABOHAR" "ABOHAR" ...
 $ state   : chr  "PB" "PB" "PB" "PB" ...

In [35]:
unique(df$state)


Out[35]:
  1. 'PB'
  2. 'UP'
  3. 'GUJ'
  4. 'MS'
  5. 'RAJ'
  6. 'BANGALORE'
  7. 'KNT'
  8. 'BHOPAL'
  9. 'OR'
  10. 'BHR'
  11. 'WB'
  12. 'CHANDIGARH'
  13. 'CHENNAI'
  14. 'TN '
  15. 'UTT'
  16. 'DELHI'
  17. 'MP'
  18. 'TN'
  19. 'GUWAHATI'
  20. 'HYDERABAD'
  21. 'JAIPUR'
  22. 'WHITE'
  23. 'JAMMU'
  24. 'HR'
  25. 'KOLKATA'
  26. 'AP'
  27. 'LUCKNOW'
  28. 'MUMBAI'
  29. 'NAGPUR'
  30. 'KER'
  31. 'PATNA'
  32. 'CHGARH'
  33. 'JH'
  34. 'SHIMLA'
  35. 'SRINAGAR'
  36. 'TRIVENDRUM'

In [36]:
changeList <- c('BANGALORE' = 'KNT', 'BHOPAL' = 'MP', 'CHANDIGARH' = 'HR', 'CHENNAI' = 'TN', 'UTT' = 'UP', 
               'GUWAHATI' = 'ASM', 'HYDERABAD' = 'AP', 'DELHI' = 'DEL', 'JAIPUR' = 'RAJ', 'WHITE' = 'MS', 'JAMMU' = 'JK', 
                'KOLKATA' = 'WB', 'LUCKNOW' = 'UP', 'MUMBAI' = 'MH','NAGPUR' = 'MH', 'PATNA' = 'BHR', 
                'CHGARH' = 'HR', 'SHIMLA' = 'HP', 'SRINAGAR' = 'JK', 'TRIVENDRUM'= 'KEL')

In [37]:
df$state <- revalue(df$state, changeList)

In [38]:
unique(df$state)


Out[38]:
  1. 'PB'
  2. 'UP'
  3. 'GUJ'
  4. 'MS'
  5. 'RAJ'
  6. 'KNT'
  7. 'MP'
  8. 'OR'
  9. 'BHR'
  10. 'WB'
  11. 'HR'
  12. 'TN'
  13. 'TN '
  14. 'DEL'
  15. 'ASM'
  16. 'AP'
  17. 'JK'
  18. 'MH'
  19. 'KER'
  20. 'JH'
  21. 'HP'
  22. 'KEL'

Date Fix


In [39]:
df <- df %>%
      mutate(date = paste(month, year, sep="-"))

In [40]:
head(df)


Out[40]:
marketmonthyearquantitypriceMinpriceMaxpriceModcitystatedate
1ABOHAR(PB)January20052350404493446ABOHARPBJanuary-2005
2ABOHAR(PB)January2006900487638563ABOHARPBJanuary-2006
3ABOHAR(PB)January2010790128315921460ABOHARPBJanuary-2010
4ABOHAR(PB)January2011245306737503433ABOHARPBJanuary-2011
5ABOHAR(PB)January20121035523686605ABOHARPBJanuary-2012
6ABOHAR(PB)January2013675132719001605ABOHARPBJanuary-2013

In [41]:
df$date = as.Date(paste("01-",df$date,sep=""), "%d-%B-%Y")

In [42]:
head(df)


Out[42]:
marketmonthyearquantitypriceMinpriceMaxpriceModcitystatedate
1ABOHAR(PB)January20052350404493446ABOHARPB2005-01-01
2ABOHAR(PB)January2006900487638563ABOHARPB2006-01-01
3ABOHAR(PB)January2010790128315921460ABOHARPB2010-01-01
4ABOHAR(PB)January2011245306737503433ABOHARPB2011-01-01
5ABOHAR(PB)January20121035523686605ABOHARPB2012-01-01
6ABOHAR(PB)January2013675132719001605ABOHARPB2013-01-01

In [43]:
str(df)


'data.frame':	10320 obs. of  10 variables:
 $ market  : chr  "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" "ABOHAR(PB)" ...
 $ month   : chr  "January" "January" "January" "January" ...
 $ year    : num  2005 2006 2010 2011 2012 ...
 $ quantity: num  2350 900 790 245 1035 ...
 $ priceMin: num  404 487 1283 3067 523 ...
 $ priceMax: num  493 638 1592 3750 686 ...
 $ priceMod: num  446 563 1460 3433 605 ...
 $ city    : chr  "ABOHAR" "ABOHAR" "ABOHAR" "ABOHAR" ...
 $ state   : chr  "PB" "PB" "PB" "PB" ...
 $ date    : Date, format: "2005-01-01" "2006-01-01" ...

In [44]:
write.csv(df, 'MonthWiseMarketArrivals_Clean.csv', row.names = FALSE)

SQL Works too

  • Loading SQL
  • Basic verbs
  • Laziness
  • SQL Query

SQL helps you strive towards data independence. Data independence is defined as:

... the independence of application programs and terminal activities from the growth in data types and changes in data representations."

Let us just use sqlite for today, as it is the most simple to setup


In [45]:
install.packages('RSQLite',repos='http://ftp.iitm.ac.in/cran/')


Installing package into '/usr/local/lib/R/3.2/site-library'
(as 'lib' is unspecified)
The downloaded source packages are in
	'/private/var/folders/04/r20f0_4n2m7cv23lr8t97wp00000gn/T/RtmpPoJ2Cx/downloaded_packages'

In [46]:
library(RSQLite)


Loading required package: DBI

Loading data into SQL


In [47]:
my_db <- src_sqlite("my_db.sqlite3", create = TRUE)

Now let us load the csv into the database


In [48]:
df <- read.csv('MonthWiseMarketArrivalsAll.csv', stringsAsFactor = FALSE)

In [49]:
column_names <- c('market', 'month', 'year', 'quantity', 'priceMin', 'priceMax', 'priceMod')

In [50]:
colnames(df) <- column_names

In [51]:
onion_sqlite <- copy_to(dest=my_db,df=df,name='onion',temporary =  FALSE)

In [52]:
tbl(my_db,sql("SELECT * from onion limit 3"))


Out[52]:
Source: sqlite 3.8.6 [my_db.sqlite3]
From: <derived table> [?? x 7]

       market   month  year quantity priceMin priceMax priceMod
        (chr)   (chr) (chr)    (int)    (chr)    (chr)    (chr)
1  ABOHAR(PB) January  2005     2350      404      493      446
2  ABOHAR(PB) January  2006      900      487      638      563
3  ABOHAR(PB) January  2010      790     1283     1592     1460
..        ...     ...   ...      ...      ...      ...      ...

Basic verbs

  • select
  • filter
  • arrange
  • mutate
  • summarize

In [53]:
#let us select entries from the table
select(onion_sqlite,market,month)


Out[53]:
Source: sqlite 3.8.6 [my_db.sqlite3]
From: onion [10,321 x 2]

       market    month
        (chr)    (chr)
1  ABOHAR(PB)  January
2  ABOHAR(PB)  January
3  ABOHAR(PB)  January
4  ABOHAR(PB)  January
5  ABOHAR(PB)  January
6  ABOHAR(PB)  January
7  ABOHAR(PB)  January
8  ABOHAR(PB)  January
9  ABOHAR(PB) February
10 ABOHAR(PB) February
..        ...      ...

In [54]:
# now let us select entries which belong to the Market ABOHAR(PB)
filter(onion_sqlite, market =='ABOHAR(PB)')


Out[54]:
Source: sqlite 3.8.6 [my_db.sqlite3]
From: onion [90 x 7]
Filter: market == "ABOHAR(PB)" 

       market    month  year quantity priceMin priceMax priceMod
        (chr)    (chr) (chr)    (int)    (chr)    (chr)    (chr)
1  ABOHAR(PB)  January  2005     2350      404      493      446
2  ABOHAR(PB)  January  2006      900      487      638      563
3  ABOHAR(PB)  January  2010      790     1283     1592     1460
4  ABOHAR(PB)  January  2011      245     3067     3750     3433
5  ABOHAR(PB)  January  2012     1035      523      686      605
6  ABOHAR(PB)  January  2013      675     1327     1900     1605
7  ABOHAR(PB)  January  2014      440     1025     1481     1256
8  ABOHAR(PB)  January  2015     1305     1309     1858     1613
9  ABOHAR(PB) February  2005     1400      286      365      324
10 ABOHAR(PB) February  2006     1800      343      411      380
..        ...      ...   ...      ...      ...      ...      ...

In [55]:
# now let us sort on year for ABOHAR(FB)
filter(onion_sqlite, market =='ABOHAR(PB)') %>% arrange(year)


Out[55]:
Source: sqlite 3.8.6 [my_db.sqlite3]
From: onion [90 x 7]
Filter: market == "ABOHAR(PB)" 
Arrange: year 

       market     month  year quantity priceMin priceMax priceMod
        (chr)     (chr) (chr)    (int)    (chr)    (chr)    (chr)
1  ABOHAR(PB)   January  2005     2350      404      493      446
2  ABOHAR(PB)  February  2005     1400      286      365      324
3  ABOHAR(PB)     March  2005      900      281      357      322
4  ABOHAR(PB)       May  2005      375      390      550      475
5  ABOHAR(PB)      June  2005     1250      269      368      318
6  ABOHAR(PB)      July  2005      825      319      387      348
7  ABOHAR(PB)    August  2005     1400      457      543      499
8  ABOHAR(PB) September  2005     2525      896     1048      963
9  ABOHAR(PB)   October  2005      150      783     1370     1217
10 ABOHAR(PB)  November  2005      100      600      800      700
..        ...       ...   ...      ...      ...      ...      ...

In [56]:
#lets now create a new column where generate priceMax - priceMin
filter(onion_sqlite, market =='ABOHAR(PB)') %>% mutate(diff = priceMax - priceMin)


Out[56]:
Source: sqlite 3.8.6 [my_db.sqlite3]
From: onion [90 x 8]
Filter: market == "ABOHAR(PB)" 

       market    month  year quantity priceMin priceMax priceMod  diff
        (chr)    (chr) (chr)    (int)    (chr)    (chr)    (chr) (int)
1  ABOHAR(PB)  January  2005     2350      404      493      446    89
2  ABOHAR(PB)  January  2006      900      487      638      563   151
3  ABOHAR(PB)  January  2010      790     1283     1592     1460   309
4  ABOHAR(PB)  January  2011      245     3067     3750     3433   683
5  ABOHAR(PB)  January  2012     1035      523      686      605   163
6  ABOHAR(PB)  January  2013      675     1327     1900     1605   573
7  ABOHAR(PB)  January  2014      440     1025     1481     1256   456
8  ABOHAR(PB)  January  2015     1305     1309     1858     1613   549
9  ABOHAR(PB) February  2005     1400      286      365      324    79
10 ABOHAR(PB) February  2006     1800      343      411      380    68
..        ...      ...   ...      ...      ...      ...      ...   ...

In [57]:
# now let us generate avg of diff
filter(onion_sqlite, market =='ABOHAR(PB)') %>% 
mutate(diff = priceMax - priceMin) %>% 
summarise(meanDiff = mean(diff))


Out[57]:
Source: sqlite 3.8.6 [my_db.sqlite3]
From: <derived table> [?? x 1]

   meanDiff
      (dbl)
1  419.5778
..      ...

Laziness

When working with databases, dplyr tries to be as lazy as possible. It’s lazy in two ways:

It never pulls data back to R unless you explicitly ask for it.

It delays doing any work until the last possible minute, collecting together everything you want to do then sending that to the database in one step.

In [58]:
m1 <- filter(onion_sqlite, market =='ABOHAR(PB)')

In [59]:
m2 <- mutate(m1,diff = priceMax - priceMin)

In [60]:
m3 <- summarise(m2,meanDiff = mean(diff))

In [61]:
m3


Out[61]:
Source: sqlite 3.8.6 [my_db.sqlite3]
From: <derived table> [?? x 1]

   meanDiff
      (dbl)
1  419.5778
..      ...

In [62]:
explain(m3)


<SQL>
SELECT "meanDiff"
FROM (SELECT AVG("diff") AS "meanDiff"
FROM (SELECT "market", "month", "year", "quantity", "priceMin", "priceMax", "priceMod", "priceMax" - "priceMin" AS "diff"
FROM "onion"
WHERE "market" = 'ABOHAR(PB)') AS "zzz4") AS "zzz5"


<PLAN>
  selectid order from           detail
1        0     0    0 SCAN TABLE onion

SQL Hands on


In [63]:
tbl(my_db,sql("SELECT * from onion limit 3"))


Out[63]:
Source: sqlite 3.8.6 [my_db.sqlite3]
From: <derived table> [?? x 7]

       market   month  year quantity priceMin priceMax priceMod
        (chr)   (chr) (chr)    (int)    (chr)    (chr)    (chr)
1  ABOHAR(PB) January  2005     2350      404      493      446
2  ABOHAR(PB) January  2006      900      487      638      563
3  ABOHAR(PB) January  2010      790     1283     1592     1460
..        ...     ...   ...      ...      ...      ...      ...

In [ ]: