SQL vs R Exercise

Use the Adventure Works dataset to create the following reports. The dataset is availablt for download in it's original format at https://msdn.microsoft.com/en-us/library/hh403424.aspx (follow instructions to download).

Note

  1. We only use base functions in R. (With the exception of loading the Excel spreadsheets.)
  2. NULL values may be converted to '' (empty strings).
  3. Dates are loaded as factors and need to be converted to date values. We perform the conversion when it's needed.

Task

  1. write the R expression to produce a table as described in the problem statements.
  2. The SQL expression may give you a hint. It also allows you to see both systems side-by-side.
  3. If you don't know SQL just ignore the SQL code.

In [1]:
library(xlsx)


Loading required package: rJava
Loading required package: xlsxjars

In [5]:
system.time (
    Employees <- read.xlsx('/home/data/AdventureWorks/Employees.xls', 1)
)


Out[5]:
   user  system elapsed 
  4.914   0.022   4.750 

In [13]:
system.time (
    Territory <- read.xlsx('/home/data/AdventureWorks/SalesTerritory.xls', 1)
)


Out[13]:
   user  system elapsed 
  0.075   0.001   0.074 

In [15]:
system.time (
    Customers <- read.xlsx('/home/data/AdventureWorks/Customers.xls', 1)
)


Out[15]:
   user  system elapsed 
  0.099   0.000   0.093 

In [17]:
system.time (
    Orders <- read.xlsx('/home/data/AdventureWorks/ItemsOrdered.xls', 1)
)


Out[17]:
   user  system elapsed 
  0.123   0.001   0.122 

Filtering (with)

1. Provide a list of employees that are married.

SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.MaritalStatus = 'M';

In [22]:
Employees[Employees$MaritalStatus == 'M', c('EmployeeID', 'FirstName', 'LastName', 'MaritalStatus')]


Out[22]:
EmployeeIDFirstNameLastNameMaritalStatus
1259 Ben MillerM
2278 GarrettVargas M
3204 Gabe MaresM
478 ReubenD'sa M
5255 GordonHee M
1110 MichaelRaheem M
13175 ReedKochM
15144 Paul SinghM
16192 BrendaDiaz M
19203 Ken MyerM
20245 Barbara MorelandM
25163 Alex NaybergM
26118 Don HallM
27176 David LawrenceM
28147 Sandra Reátegui AlayoM
29138 SamanthaSmith M
3095 Jim ScardelisM
31231 Jo BerryM
3286 Ryan CornelsenM
34214 Andreas BerglundM
3572 Steven SelikoffM
37107 ChristopherHill M
38269 Dan BaconM
39187 YvonneMcKay M
4112 ThierryD'Hers M
4330 BrittaSimon M
44198 LorraineNay M
4631 MargieShoop M
493 Roberto TamburelloM
50199 Paula NartkerM
228136 SylvesterValdez M
23235 Brandon HeidepriemM
233228 Christian KleinermanM
234223 SairajUddin M
235100 LolanSong M
240200 FrankLee M
241279 Tsvi ReiterM
24473 CarolePolandM
25021 Terry EminhizerM
253254 FukikoOgisu M
25653 DianeGlimpM
257282 José SaraivaM
259209 KathieFlood M
26050 SidneyHiga M
26270 DavidOrtizM
265140 Prasanna SamarawickramaM
2675 Gail EricksonM
268252 ArvindRao M
270233 Magnus HedlundM
27183 PatrickCook M
273241 DavidLiu M
276239 Mindy MartinM
27938 Kim AbercrombieM
280154 RaymondSam M
281114 Mindaugas KrapauskasM
282123 VamsiKuppaM
284101 Houman PournassehM
285221 Chris NorredM
289289JaePakM
291999 ChadwickSmith M

2a. Show me a list of employees that have a lastname that begins with "R".

SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.LastName LIKE 'R%';

In [28]:
Employees[grep('^R', Employees$LastName), c('EmployeeID', 'FirstName', 'LastName')]


Out[28]:
EmployeeIDFirstNameLastName
10124 Kim Ralls
1110 MichaelRaheem
17166 Jack Richins
28147 Sandra Reátegui Alayo
45133 Michael Rothkugel
9644 Simon Rapier
10065 Randy Reeves
129145 CynthiaRandall
132149 AndyRuth
16774 Bjorn Rettig
18171 MichaelRay
189190 Robert Rounthwaite
220157 Linda Randall
241279 Tsvi Reiter
268252 ArvindRao

2b. Show me a list of employees that have a lastname that ends with "r"

SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.LastName LIKE '%r';

In [29]:
Employees[grep('r$', Employees$LastName), c('EmployeeID', 'FirstName', 'LastName')]


Out[29]:
EmployeeIDFirstNameLastName
1259 Ben Miller
9161 Kirk Koenigsbauer
19203 Ken Myer
50199 Paula Nartker
5441 BryanBaker
57104 Mary Baker
65225 Alan Brewer
76156 Lane Sacksteder
9644 Simon Rapier
9896 ElizabethKeyser
99229 Lori Penor
12839 Ed Dudenhoefer
13042 James Kramer
166117 Chad Niswonger
16957 Frank Miller
173105 KevinHomer
174253 Linda Meisner
1977 Dylan Miller
201183 BarbaraDecker
204142 OlindaTurner
21794 RussellHunter
223273 Brian Welcker
230216 Sean Alexander
241279 Tsvi Reiter
25021 Terry Eminhizer
255262 David Barber
28329 Mark McArthur

2c. Provide a list of employees that have a hyphenated lastname.

SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.LastName LIKE '%-%';

In [32]:
Employees[grep('-', Employees$LastName), c('EmployeeID', 'FirstName', 'LastName')]


Out[32]:
EmployeeIDFirstNameLastName
115284 Tete Mensa-Annan
135180 Katie McAskill-White
177280 Pamela Ansman-Wolfe

3a. Provide a list of employees that are on salary and have more than 35 vacation hours left.

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.VacationHours ,e.SalariedFlag FROM dbo.Employees AS e WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 35);

In [36]:
Employees[(Employees$SalariedFlag==1) & (Employees$VacationHours>35), c('EmployeeID', 'FirstName', 'LastName', 'SalariedFlag', 'VacationHours')]


Out[36]:
EmployeeIDFirstNameLastNameSalariedFlagVacationHours
7270 FrançoisAjenstat1 67
12248 Mike Seamans1 59
20245 Barbara Moreland1 58
38269 Dan Bacon1 72
58263 Jean Trenary1 65
66212 PengWu 1 81
67227 Gary Altman1 86
94286 Lynn Tsoflias1 36
10226 PeterKrebs1 43
10716 David Bradley1 40
109268 Ramesh Meyyappan1 73
115284 Tete Mensa-Annan1 39
122235 Paula Barreto de Mattos1 54
1241 Ken Sánchez1 99
126249 WendyKahn 1 55
1318 Diane Margheim1 62
141211 Hazem Abolrous1 80
146267 KarenBerg 1 74
148250 SheelaWord 1 49
152121 Pilar Ackerman1 93
168271 Dan Wilson1 66
170272 JanainaBueno 1 71
1977 Dylan Miller1 61
211264 StephanieConroy 1 68
2249 Gigi Matthew1 63
233228 Christian Kleinerman1 92
236275 MichaelBlythe 1 38
273241 DavidLiu 1 57
27425 James Hamilton1 64
289289JaePak1 37

3b. Show the same as above but limit it to American employees.

SELECT DISTINCT CountryName FROM dbo.Employees; SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.VacationHours ,e.SalariedFlag ,e.CountryName FROM dbo.Employees AS e WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 5) AND e.CountryName = 'United States';

In [37]:
Employees[(Employees$SalariedFlag==1) & (Employees$VacationHours>35) & (Employees$CountryName=='United States')
          , c('EmployeeID', 'FirstName', 'LastName', 'SalariedFlag', 'VacationHours')]


Out[37]:
EmployeeIDFirstNameLastNameSalariedFlagVacationHours
7270 FrançoisAjenstat1 67
12248 Mike Seamans1 59
20245 Barbara Moreland1 58
38269 Dan Bacon1 72
58263 Jean Trenary1 65
66212 PengWu 1 81
67227 Gary Altman1 86
10226 PeterKrebs1 43
10716 David Bradley1 40
109268 Ramesh Meyyappan1 73
115284 Tete Mensa-Annan1 39
122235 Paula Barreto de Mattos1 54
1241 Ken Sánchez1 99
126249 WendyKahn 1 55
1318 Diane Margheim1 62
141211 Hazem Abolrous1 80
146267 KarenBerg 1 74
148250 SheelaWord 1 49
152121 Pilar Ackerman1 93
168271 Dan Wilson1 66
170272 JanainaBueno 1 71
1977 Dylan Miller1 61
211264 StephanieConroy 1 68
2249 Gigi Matthew1 63
233228 Christian Kleinerman1 92
236275 MichaelBlythe 1 38
273241 DavidLiu 1 57
27425 James Hamilton1 64

3c. Show the same as above but limit it to non-American employees.

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.VacationHours ,e.SalariedFlag ,e.CountryName FROM dbo.Employees AS e WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 5) AND e.CountryName != 'United States';

In [38]:
Employees[(Employees$SalariedFlag==1) & (Employees$VacationHours>35) & (Employees$CountryName!='United States')
          , c('EmployeeID', 'FirstName', 'LastName', 'SalariedFlag', 'VacationHours')]


Out[38]:
EmployeeIDFirstNameLastNameSalariedFlagVacationHours
94286 Lynn Tsoflias1 36
289289JaePak1 37

4a. List the married employees with more than 35 vacation hours, only ones living in Washington state.

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.VacationHours ,e.SalariedFlag ,e.StateProvinceName ,e.CountryName FROM dbo.Employees AS e WHERE e.MaritalStatus = 'M' AND e.VacationHours > 35 AND e.StateProvinceName = 'Washington' ;

In [39]:
Employees[(Employees$MaritalStatus=='M') & (Employees$VacationHours>35) & (Employees$StateProvinceName=='Washington')
          , c('EmployeeID', 'FirstName', 'LastName', 'MaritalStatus', 'VacationHours', 'StateProvinceName')]


Out[39]:
EmployeeIDFirstNameLastNameMaritalStatusVacationHoursStateProvinceName
1259 Ben Miller M 55 Washington
3204 Gabe Mares M 57 Washington
478 Reuben D'sa M 72 Washington
5255 Gordon Hee M 52 Washington
16192 Brenda Diaz M 71 Washington
20245 Barbara Moreland M 58 Washington
25163 Alex Nayberg M 77 Washington
26118 Don Hall M 88 Washington
28147 Sandra Reátegui AlayoM 37 Washington
31231 Jo Berry M 91 Washington
3286 Ryan Cornelsen M 59 Washington
34214 Andreas Berglund M 84 Washington
3572 Steven Selikoff M 41 Washington
37107 ChristopherHill M 91 Washington
38269 Dan Bacon M 72 Washington
39187 Yvonne McKay M 79 Washington
44198 Lorraine Nay M 49 Washington
50199 Paula Nartker M 54 Washington
5148 Ruth EllerbrockM 83 Washington
52126 Jimmy Bischoff M 96 Washington
53230 Stuart Macrae M 88 Washington
56226 Brian LaMee M 48 Washington
57104 Mary Baker M 94 Washington
65225 Alan Brewer M 47 Washington
66212 Peng Wu M 81 Washington
67227 Gary Altman M 86 Washington
7062 John Campbell M 81 Washington
83220 Karen Berge M 79 Washington
8524 Jill Williams M 47 Washington
86132 Nicole Holliday M 67 Washington
175261 Reinout Hillmann M 51 Washington
178260 Annette Hill M 50 Washington
186165 Chris Preston M 76 Washington
188224 William Vong M 45 Washington
191247 Janet SheperdigianM 64 Washington
19276 Carol Philips M 45 Washington
19520 Wanida Benshoof M 41 Washington
1977 Dylan Miller M 61 Washington
20387 Cristian PetculescuM 67 Washington
20845 Thomas Michaels M 37 Washington
210208 Scott Gode M 86 Washington
218242 Deborah Poe M 60 Washington
2249 Gigi Matthew M 63 Washington
233228 Christian KleinermanM 92 Washington
234223 Sairaj Uddin M 46 Washington
240200 Frank Lee M 56 Washington
24473 Carole Poland M 42 Washington
25021 Terry Eminhizer M 44 Washington
253254 Fukiko Ogisu M 57 Washington
25653 Diane Glimp M 87 Washington
259209 Kathie Flood M 87 Washington
26050 Sidney Higa M 84 Washington
268252 Arvind Rao M 60 Washington
270233 Magnus Hedlund M 87 Washington
27183 Patrick Cook M 61 Washington
273241 David Liu M 57 Washington
276239 Mindy Martin M 51 Washington
281114 Mindaugas KrapauskasM 97 Washington
282123 Vamsi Kuppa M 95 Washington
285221 Chris Norred M 75 Washington

4b. Change the logic to include anyone who meets any of the 3 conditions (i.e., people who are either married, live in Washington state, or have more than 35 vacation hours left)

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.MaritalStatus ,e.VacationHours ,e.SalariedFlag ,e.StateProvinceName ,e.CountryName FROM dbo.Employees AS e WHERE e.MaritalStatus = 'M' OR e.VacationHours > 35 OR e.StateProvinceName = 'Washington' ;

In [40]:
Employees[(Employees$MaritalStatus=='M') | (Employees$VacationHours>35) | (Employees$StateProvinceName=='Washington')
          , c('EmployeeID', 'FirstName', 'LastName', 'MaritalStatus', 'VacationHours', 'StateProvinceName')]


Out[40]:
EmployeeIDFirstNameLastNameMaritalStatusVacationHoursStateProvinceName
1259 Ben Miller M 55 Washington
2278 GarrettVargas M 33 Alberta
3204 Gabe Mares M 57 Washington
478 Reuben D'sa M 72 Washington
5255 Gordon Hee M 52 Washington
666 Karan Khanna S 28 Washington
7270 François Ajenstat S 67 Washington
822 Sariya HarnpadoungsatayaS 45 Washington
9161 Kirk KoenigsbauerS 74 Washington
10124 Kim Ralls S 98 Washington
1110 Michael Raheem M 16 California
12248 Mike Seamans S 59 Washington
13175 Reed Koch M 23 Washington
14155 Fadi Fakhouri S 16 Washington
15144 Paul Singh M 6 Washington
16192 Brenda Diaz M 71 Washington
17166 Jack Richins S 68 Washington
18112 John Evans S 91 Washington
19203 Ken Myer M 51 Alberta
20245 Barbara Moreland M 58 Washington
21257 Eric Kurjan S 54 Washington
22244 Bryan Walton S 62 Alberta
2369 Steve Masters S 31 Washington
24210 Belinda Newman S 83 Washington
25163 Alex Nayberg M 77 Washington
26118 Don Hall M 88 Washington
27176 David Lawrence M 29 Washington
28147 Sandra Reátegui AlayoM 37 Washington
29138 Samantha Smith M 7 Washington
3095 Jim Scardelis M 1 Washington
26050 Sidney Higa M 84 Washington
261222 A. Scott Wright S 44 Washington
26270 David Ortiz M 33 Washington
263219 Sean Chai S 78 Washington
264148 Jason Watters S 38 Washington
265140 Prasanna SamarawickramaM 5 Washington
2675 Gail Erickson M 5 Washington
268252 Arvind Rao M 60 Washington
269266 Peter Connelly S 69 Washington
270233 Magnus Hedlund M 87 Washington
27183 Patrick Cook M 61 Washington
272251 Mikael Sandberg S 59 Washington
273241 David Liu M 57 Washington
27425 James Hamilton S 64 Washington
275108 Jinghao Liu S 77 Washington
276239 Mindy Martin M 51 Washington
27727 Jo Brown S 80 Washington
278186 Shane Kim S 75 Washington
27938 Kim AbercrombieM 24 Washington
280154 Raymond Sam M 10 Washington
281114 Mindaugas KrapauskasM 97 Washington
282123 Vamsi Kuppa M 95 Washington
28329 Mark McArthur S 19 Washington
284101 Houman PournassehM 3 Washington
285221 Chris Norred M 75 Washington
28618 John Wood S 48 Washington
28746 Eugene Kogan S 36 Washington
288202 Tawana Nusbaum S 50 Washington
289289 Jae Pak M 37 England
291999 ChadwickSmith M 55 Georgia

4c. Show the same as above, but only for Production Technicians

SELECT DISTINCT JobTitle FROM dbo.Employees; --- look at job titles SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.MaritalStatus ,e.JobTitle ,e.VacationHours ,e.SalariedFlag ,e.StateProvinceName ,e.CountryName FROM dbo.Employees AS e WHERE e.JobTitle LIKE 'Production Technician%' AND (e.MaritalStatus = 'M' OR e.VacationHours > 35 OR e.StateProvinceName = 'Washington') ;

In [53]:
Employees[grepl('Production Technician', Employees$JobTitle) &
    (
        (Employees$MaritalStatus=='M')
      | (Employees$VacationHours>35)
      | (Employees$StateProvinceName=='Washington')
    )
    , c('EmployeeID', 'FirstName', 'LastName', 'JobTitle', 'MaritalStatus', 'VacationHours', 'StateProvinceName')]


Out[53]:
EmployeeIDFirstNameLastNameJobTitleMaritalStatusVacationHoursStateProvinceName
3204 Gabe Mares Production Technician - WC40M 57 Washington
666 Karan Khanna Production Technician - WC60S 28 Washington
9161 Kirk Koenigsbauer Production Technician - WC45S 74 Washington
13175 Reed Koch Production Technician - WC30M 23 Washington
14155 Fadi Fakhouri Production Technician - WC20S 16 Washington
15144 Paul Singh Production Technician - WC20M 6 Washington
18112 John Evans Production Technician - WC50S 91 Washington
19203 Ken Myer Production Technician - WC40M 51 Alberta
2369 Steve Masters Production Technician - WC60S 31 Washington
24210 Belinda Newman Production Technician - WC45S 83 Washington
25163 Alex Nayberg Production Technician - WC45M 77 Washington
26118 Don Hall Production Technician - WC50M 88 Washington
27176 David Lawrence Production Technician - WC30M 29 Washington
28147 Sandra Reátegui Alayo Production Technician - WC30M 37 Washington
29138 Samantha Smith Production Technician - WC20M 7 Washington
3095 Jim Scardelis Production Technician - WC50M 1 Washington
3286 Ryan Cornelsen Production Technician - WC40M 59 Washington
3572 Steven Selikoff Production Technician - WC30M 41 Washington
36162 Laura Steele Production Technician - WC45S 75 Washington
37107 Christopher Hill Production Technician - WC10M 91 Washington
39187 Yvonne McKay Production Technician - WC45M 79 Washington
4075 Michiko Osada Production Technician - WC30S 44 Washington
42116 Michael Patten Production Technician - WC50S 98 Washington
4330 Britta Simon Production Technician - WC60M 14 Washington
44198 Lorraine Nay Production Technician - WC40M 49 Washington
45133 Michael Rothkugel Production Technician - WC40S 70 Washington
4631 Margie Shoop Production Technician - WC60M 18 Washington
47168 Garrett Young Production Technician - WC30S 34 Washington
50199 Paula Nartker Production Technician - WC40M 54 Washington
5148 Ruth Ellerbrock Production Technician - WC10M 83 Washington
235100 Lolan Song Production Technician - WC50M 7 Washington
23785 Brian Goldstein Production Technician - WC40S 63 Washington
23861 Diane Tibbott Production Technician - WC50S 8 Washington
239106 John Kane Production Technician - WC10S 92 Washington
240200 Frank Lee Production Technician - WC40M 56 Washington
242146 Jian Shuo Wang Production Technician - WC30S 36 Washington
243103 Ebru Ersan Production Technician - WC10S 93 Washington
24473 Carole Poland Production Technician - WC30M 42 Washington
245185 Stefen Hesse Production Technician - WC20S 19 Washington
24637 Chris Okelberry Production Technician - WC60S 16 Washington
248179 Jan Miksovsky Production Technician - WC30S 30 Washington
24991 Kimberly Zimmerman Production Technician - WC10S 95 Washington
251207 Greg Alderson Production Technician - WC45S 85 Washington
25282 Jack Creasey Production Technician - WC40S 62 Washington
25467 Jay Adams Production Technician - WC60S 32 Washington
25653 Diane Glimp Production Technician - WC10M 87 Washington
258169 Susan Metters Production Technician - WC30S 31 Washington
259209 Kathie Flood Production Technician - WC45M 87 Washington
26050 Sidney Higa Production Technician - WC10M 84 Washington
26270 David Ortiz Production Technician - WC60M 33 Washington
264148 Jason Watters Production Technician - WC30S 38 Washington
265140 Prasanna Samarawickrama Production Technician - WC20M 5 Washington
27183 Patrick Cook Production Technician - WC40M 61 Washington
27938 Kim Abercrombie Production Technician - WC60M 24 Washington
280154 Raymond Sam Production Technician - WC20M 10 Washington
281114 Mindaugas Krapauskas Production Technician - WC50M 97 Washington
28329 Mark McArthur Production Technician - WC60S 19 Washington
284101 Houman Pournasseh Production Technician - WC50M 3 Washington
28746 Eugene Kogan Production Technician - WC60S 36 Washington
288202 Tawana Nusbaum Production Technician - WC40S 50 Washington

5a. List all employees living in Redmond, Seattle, and Bellevue, showing EmployeeID, FirstName, LastName, and City. Sort the list alphabetically by city.

SELECT e.EmployeeID, e.FirstName, e.LastName, e.City FROM dbo.Employees AS e WHERE e.City in ('Redmond', 'Seattle', 'Bellevue') ORDER BY e.City ;

In [65]:
Employees[order('City')]


Out[65]:
EmployeeID
1259
2278
3204
478
5255
666
7270
822
9161
10124
1110
12248
13175
14155
15144
16192
17166
18112
19203
20245
21257
22244
2369
24210
25163
26118
27176
28147
29138
3095
26270
263219
264148
265140
266290
2675
268252
269266
270233
27183
272251
273241
27425
275108
276239
27727
278186
27938
280154
281114
282123
28329
284101
285221
28618
28746
288202
289289
290288
291999

In [84]:
df <- Employees[Employees$City %in% c('Redmond', 'Seattle', 'Bellevue')
          , c('EmployeeID', 'FirstName', 'LastName', 'City', 'StateProvinceName')
         ]
df[order(df$City), ]


Out[84]:
EmployeeIDFirstNameLastNameCityStateProvinceName
5255 Gordon Hee Bellevue Washington
15144 Paul Singh Bellevue Washington
29138 Samantha Smith Bellevue Washington
55283 David Campbell Bellevue Washington
74258 Erin Hagens Bellevue Washington
7989 Patrick Wedge Bellevue Washington
8432 Rebecca Laszlo Bellevue Washington
89113 Linda Moschell Bellevue Washington
90196 Shammi Mohamed Bellevue Washington
9799 Nuan Yu Bellevue Washington
10226 Peter Krebs Bellevue Washington
10393 Kok-Ho Loh Bellevue Washington
105151 Rostislav Shabalin Bellevue Washington
112177 Russell King Bellevue Washington
122235 Paula Barreto de MattosBellevue Washington
132149 Andy Ruth Bellevue Washington
16134 Suchitra Mohan Bellevue Washington
168271 Dan Wilson Bellevue Washington
173105 Kevin Homer Bellevue Washington
174253 Linda Meisner Bellevue Washington
175261 Reinout Hillmann Bellevue Washington
17619 Mary Dempsey Bellevue Washington
180206 Stuart Munson Bellevue Washington
188224 William Vong Bellevue Washington
196159 Terrence Earls Bellevue Washington
204142 Olinda Turner Bellevue Washington
206128 Paul KomosinskiBellevue Washington
209141 Min Su Bellevue Washington
2249 Gigi Matthew Bellevue Washington
228136 Sylvester Valdez Bellevue Washington
12839 Ed DudenhoeferSeattle Washington
135180 Katie McAskill-WhiteSeattle Washington
137152 Yuhong Li Seattle Washington
138171 David Yalovsky Seattle Washington
14336 Jose Lugo Seattle Washington
14960 Pete Male Seattle Washington
150120 Kitti LertpiriyasuwatSeattle Washington
152121 Pilar Ackerman Seattle Washington
15777 Merav Netz Seattle Washington
16774 Bjorn Rettig Seattle Washington
18171 Michael Ray Seattle Washington
183102 Zheng Mu Seattle Washington
186165 Chris Preston Seattle Washington
189190 Robert RounthwaiteSeattle Washington
19940 JoLynn Dobney Seattle Washington
200193 Alejandro McGuel Seattle Washington
20387 Cristian PetculescuSeattle Washington
210208 Scott Gode Seattle Washington
21258 Kendall Keil Seattle Washington
216195 Kevin Liu Seattle Washington
21794 Russell Hunter Seattle Washington
220157 Linda Randall Seattle Washington
222191 Lionel Penuchot Seattle Washington
22633 Annik Stahl Seattle Washington
240200 Frank Lee Seattle Washington
24991 Kimberly Zimmerman Seattle Washington
25282 Jack Creasey Seattle Washington
26270 David Ortiz Seattle Washington
27183 Patrick Cook Seattle Washington
288202 Tawana Nusbaum Seattle Washington

5b. For the list above, make sure these are only in the state of Washington, just to be careful.

SELECT DISTINCT e.City, e.StateProvinceName, e.CountryName FROM dbo.Employees AS e WHERE e.City in ('Redmond', 'Seattle', 'Bellevue') ; SELECT e.EmployeeID, e.FirstName, e.LastName, e.City, e.StateProvinceName FROM dbo.Employees AS e WHERE e.City in ('Redmond', 'Seattle', 'Bellevue') AND e.StateProvinceName = 'Washington' ORDER BY e.City ;

In [85]:
df <- Employees[(Employees$City %in% c('Redmond', 'Seattle', 'Bellevue')) & (Employees$StateProvinceName=='Washington')
          , c('EmployeeID', 'FirstName', 'LastName', 'City', 'StateProvinceName')
         ]
df[order(df$City), ]


Out[85]:
EmployeeIDFirstNameLastNameCityStateProvinceName
5255 Gordon Hee Bellevue Washington
15144 Paul Singh Bellevue Washington
29138 Samantha Smith Bellevue Washington
55283 David Campbell Bellevue Washington
74258 Erin Hagens Bellevue Washington
7989 Patrick Wedge Bellevue Washington
8432 Rebecca Laszlo Bellevue Washington
89113 Linda Moschell Bellevue Washington
90196 Shammi Mohamed Bellevue Washington
9799 Nuan Yu Bellevue Washington
10226 Peter Krebs Bellevue Washington
10393 Kok-Ho Loh Bellevue Washington
105151 Rostislav Shabalin Bellevue Washington
112177 Russell King Bellevue Washington
122235 Paula Barreto de MattosBellevue Washington
132149 Andy Ruth Bellevue Washington
16134 Suchitra Mohan Bellevue Washington
168271 Dan Wilson Bellevue Washington
173105 Kevin Homer Bellevue Washington
174253 Linda Meisner Bellevue Washington
175261 Reinout Hillmann Bellevue Washington
17619 Mary Dempsey Bellevue Washington
180206 Stuart Munson Bellevue Washington
188224 William Vong Bellevue Washington
196159 Terrence Earls Bellevue Washington
204142 Olinda Turner Bellevue Washington
206128 Paul KomosinskiBellevue Washington
209141 Min Su Bellevue Washington
2249 Gigi Matthew Bellevue Washington
228136 Sylvester Valdez Bellevue Washington
12839 Ed DudenhoeferSeattle Washington
135180 Katie McAskill-WhiteSeattle Washington
137152 Yuhong Li Seattle Washington
138171 David Yalovsky Seattle Washington
14336 Jose Lugo Seattle Washington
14960 Pete Male Seattle Washington
150120 Kitti LertpiriyasuwatSeattle Washington
152121 Pilar Ackerman Seattle Washington
15777 Merav Netz Seattle Washington
16774 Bjorn Rettig Seattle Washington
18171 Michael Ray Seattle Washington
183102 Zheng Mu Seattle Washington
186165 Chris Preston Seattle Washington
189190 Robert RounthwaiteSeattle Washington
19940 JoLynn Dobney Seattle Washington
200193 Alejandro McGuel Seattle Washington
20387 Cristian PetculescuSeattle Washington
210208 Scott Gode Seattle Washington
21258 Kendall Keil Seattle Washington
216195 Kevin Liu Seattle Washington
21794 Russell Hunter Seattle Washington
220157 Linda Randall Seattle Washington
222191 Lionel Penuchot Seattle Washington
22633 Annik Stahl Seattle Washington
240200 Frank Lee Seattle Washington
24991 Kimberly Zimmerman Seattle Washington
25282 Jack Creasey Seattle Washington
26270 David Ortiz Seattle Washington
27183 Patrick Cook Seattle Washington
288202 Tawana Nusbaum Seattle Washington

6. Provide a list of employees who have no title, whether it's a NULL or empty string.

SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.Title IS NULL OR e.Title = '' ;

In [95]:
# when loading from the Excel, missing fields where replaced with ''
Employees[!(is.na(Employees$Title) | (Employees$Title=='')), c('EmployeeID', 'Title','FirstName', 'LastName')]


Out[95]:
EmployeeIDTitleFirstNameLastName
8524 Ms. Jill Williams
9313 Ms. JaniceGalvin
115284 Mr. Tete Mensa-Annan
1596 Mr. Jossef Goldberg
198139 Mr. Hung-FuTing
213285 Mr. Syed Abbas
223273 Mr. Brian Welcker
2675 Ms. Gail Erickson

7a. Provide a list of employees who have at least 60 vacation hours left.

SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours FROM dbo.Employees AS e WHERE e.VacationHours >= 60 ;

In [97]:
Employees[Employees$VacationHours>=60, c('EmployeeID', 'FirstName', 'LastName', 'VacationHours')]


Out[97]:
EmployeeIDFirstNameLastNameVacationHours
478 ReubenD'sa 72
7270 FrançoisAjenstat67
9161 Kirk Koenigsbauer74
10124 Kim Ralls98
16192 BrendaDiaz 71
17166 Jack Richins68
18112 John Evans91
22244 Bryan Walton62
24210 BelindaNewman 83
25163 Alex Nayberg77
26118 Don Hall88
31231 Jo Berry91
34214 Andreas Berglund84
36162 Laura Steele75
37107 ChristopherHill 91
38269 Dan Bacon72
39187 YvonneMcKay 79
42116 MichaelPatten 98
45133 Michael Rothkugel70
5148 Ruth Ellerbrock83
52126 Jimmy Bischoff96
53230 StuartMacrae88
57104 Mary Baker94
58263 Jean Trenary65
59232 Pat Coleman89
6181 MihailFrintu64
63119 MichaelEntin 94
66212 PengWu 81
67227 Gary Altman86
7062 John Campbell81
206128 Paul Komosinski68
210208 ScottGode 86
211264 StephanieConroy 68
218242 DeborahPoe 60
222191 Lionel Penuchot78
2249 Gigi Matthew63
230216 Sean Alexander82
233228 Christian Kleinerman92
23785 Brian Goldstein63
239106 JohnKane92
243103 Ebru Ersan93
247215 Mark Harrington83
24991 Kimberly Zimmerman95
251207 Greg Alderson85
25282 Jack Creasey62
25653 DianeGlimp87
259209 KathieFlood 87
26050 SidneyHiga 84
263219 SeanChai78
268252 ArvindRao 60
269266 Peter Connelly69
270233 Magnus Hedlund87
27183 PatrickCook 61
27425 James Hamilton64
275108 JinghaoLiu 77
27727 Jo Brown80
278186 ShaneKim 75
281114 Mindaugas Krapauskas97
282123 VamsiKuppa95
285221 Chris Norred75

7b. Provide a list of employees who have less than 60 vacation hours left.


In [ ]:
SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours
	FROM dbo.Employees AS e
	WHERE e.VacationHours < 60
	;

In [98]:
Employees[Employees$VacationHours<60, c('EmployeeID', 'FirstName', 'LastName', 'VacationHours')]


Out[98]:
EmployeeIDFirstNameLastNameVacationHours
1259 Ben Miller55
2278 GarrettVargas 33
3204 Gabe Mares57
5255 GordonHee 52
666 Karan Khanna28
822 Sariya Harnpadoungsataya45
1110 MichaelRaheem 16
12248 Mike Seamans59
13175 ReedKoch23
14155 Fadi Fakhouri16
15144 Paul Singh6
19203 Ken Myer51
20245 Barbara Moreland58
21257 Eric Kurjan54
2369 Steve Masters31
27176 David Lawrence29
28147 Sandra Reátegui Alayo37
29138 SamanthaSmith 7
3095 Jim Scardelis1
3286 Ryan Cornelsen59
33181 MichaelHines 20
3572 Steven Selikoff41
4075 MichikoOsada 44
4112 ThierryD'Hers 9
4330 BrittaSimon 14
44198 LorraineNay 49
4631 MargieShoop 18
47168 GarrettYoung 34
48236 Grant Culbertson53
493 Roberto Tamburello2
242146 Jian ShuoWang 36
24473 CarolePoland42
245185 StefenHesse 19
24637 Chris Okelberry16
248179 Jan Miksovsky30
25021 Terry Eminhizer44
253254 FukikoOgisu 57
25467 Jay Adams32
255262 David Barber56
257282 José Saraiva31
258169 Susan Metters31
261222 A. ScottWright 44
26270 DavidOrtiz33
264148 Jason Watters38
265140 Prasanna Samarawickrama5
266290 Ranjit Varkey Chudukatil34
2675 Gail Erickson5
272251 Mikael Sandberg59
273241 DavidLiu 57
276239 Mindy Martin51
27938 Kim Abercrombie24
280154 RaymondSam 10
28329 Mark McArthur19
284101 Houman Pournasseh3
28618 JohnWood48
28746 EugeneKogan 36
288202 Tawana Nusbaum50
289289JaePak37
290288 RachelValdez35
291999 ChadwickSmith 55

7c. Show me employees who have more than 20 and less than 60 vacation hours left.

SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours FROM dbo.Employees AS e WHERE e.VacationHours BETWEEN 21 AND 59 ;

In [100]:
Employees[(Employees$VacationHours>20)&(Employees$VacationHours<60),  c('EmployeeID', 'FirstName', 'LastName', 'VacationHours')]


Out[100]:
EmployeeIDFirstNameLastNameVacationHours
1259 Ben Miller55
2278 GarrettVargas 33
3204 Gabe Mares57
5255 GordonHee 52
666 Karan Khanna28
822 Sariya Harnpadoungsataya45
12248 Mike Seamans59
13175 ReedKoch23
19203 Ken Myer51
20245 Barbara Moreland58
21257 Eric Kurjan54
2369 Steve Masters31
27176 David Lawrence29
28147 Sandra Reátegui Alayo37
3286 Ryan Cornelsen59
3572 Steven Selikoff41
4075 MichikoOsada 44
44198 LorraineNay 49
47168 GarrettYoung 34
48236 Grant Culbertson53
50199 Paula Nartker54
5441 BryanBaker35
55283 David Campbell23
56226 BrianLaMee48
62172 Marc Ingle22
64276 Linda Mitchell27
65225 Alan Brewer47
69194 Fred Northup47
73150 Michael Vanderhyde35
74258 Erin Hagens53
229173 Eugene Zabokritski33
231277 JillianCarson 24
23235 Brandon Heidepriem22
234223 SairajUddin 46
236275 MichaelBlythe 38
240200 FrankLee 56
241279 Tsvi Reiter29
242146 Jian ShuoWang 36
24473 CarolePoland42
248179 Jan Miksovsky30
25021 Terry Eminhizer44
253254 FukikoOgisu 57
25467 Jay Adams32
255262 David Barber56
257282 José Saraiva31
258169 Susan Metters31
261222 A. ScottWright 44
26270 DavidOrtiz33
264148 Jason Watters38
266290 Ranjit Varkey Chudukatil34
272251 Mikael Sandberg59
273241 DavidLiu 57
276239 Mindy Martin51
27938 Kim Abercrombie24
28618 JohnWood48
28746 EugeneKogan 36
288202 Tawana Nusbaum50
289289JaePak37
290288 RachelValdez35
291999 ChadwickSmith 55

7d. If you did not use BETWEEN for 7c, do the same but use BETWEEN. If you did use BETWEEN for 7c, do it another way.

SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours FROM dbo.Employees AS e WHERE e.VacationHours > 20 AND e.VacationHours < 60 ;

In [102]:
is.between <- function(x, a, b) {
    a < x & x < b
}
Employees[is.between(Employees$VacationHours, 20, 60),  c('EmployeeID', 'FirstName', 'LastName', 'VacationHours')]


Out[102]:
EmployeeIDFirstNameLastNameVacationHours
1259 Ben Miller55
2278 GarrettVargas 33
3204 Gabe Mares57
5255 GordonHee 52
666 Karan Khanna28
822 Sariya Harnpadoungsataya45
12248 Mike Seamans59
13175 ReedKoch23
19203 Ken Myer51
20245 Barbara Moreland58
21257 Eric Kurjan54
2369 Steve Masters31
27176 David Lawrence29
28147 Sandra Reátegui Alayo37
3286 Ryan Cornelsen59
3572 Steven Selikoff41
4075 MichikoOsada 44
44198 LorraineNay 49
47168 GarrettYoung 34
48236 Grant Culbertson53
50199 Paula Nartker54
5441 BryanBaker35
55283 David Campbell23
56226 BrianLaMee48
62172 Marc Ingle22
64276 Linda Mitchell27
65225 Alan Brewer47
69194 Fred Northup47
73150 Michael Vanderhyde35
74258 Erin Hagens53
229173 Eugene Zabokritski33
231277 JillianCarson 24
23235 Brandon Heidepriem22
234223 SairajUddin 46
236275 MichaelBlythe 38
240200 FrankLee 56
241279 Tsvi Reiter29
242146 Jian ShuoWang 36
24473 CarolePoland42
248179 Jan Miksovsky30
25021 Terry Eminhizer44
253254 FukikoOgisu 57
25467 Jay Adams32
255262 David Barber56
257282 José Saraiva31
258169 Susan Metters31
261222 A. ScottWright 44
26270 DavidOrtiz33
264148 Jason Watters38
266290 Ranjit Varkey Chudukatil34
272251 Mikael Sandberg59
273241 DavidLiu 57
276239 Mindy Martin51
27938 Kim Abercrombie24
28618 JohnWood48
28746 EugeneKogan 36
288202 Tawana Nusbaum50
289289JaePak37
290288 RachelValdez35
291999 ChadwickSmith 55

Grouping

1a. What is the earliest birthdate for all employees?

SELECT MIN(e.BirthDate) FROM dbo.Employees AS e;

In [110]:
min(as.POSIXct(Employees$BirthDate))


Out[110]:
[1] "1945-11-17 EST"

1b. Add to the above, the most recent birthdate for all employees

SELECT MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday' FROM dbo.Employees AS e;

In [111]:
max(as.POSIXct(Employees$BirthDate))


Out[111]:
[1] "1985-07-01 EDT"

In [131]:
### This is getting TireSome => We create a new column that converts the BirthDate to a proper date
Employees$BD = as.POSIXct(Employees$BirthDate)

1c. Show the above results broken down by gender

SELECT e.Gender, MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday' FROM dbo.Employees AS e GROUP BY e.Gender ;

In [161]:
names(df)


Out[161]:
  1. 'Gender'
  2. 'BD'

In [174]:
aggregate(BD ~ Gender, data=Employees,
                function(x){c(format.Date(min(x)), format.Date(max(x)))})


Out[174]:
GenderBD
1F 1946-10-291985-07-01
2M 1945-11-171985-05-07

The above solution creates only two columns.


In [189]:
df = cbind(aggregate(BD ~ Gender, data=Employees, function(x){format.Date(min(x))}),
      aggregate(BD ~ Gender, data=Employees, function(x){format.Date(max(x))}))[,c(1,2,4)]
names(df)[2:3] <- c('EarliestBirthdate', 'MostRecentBirthdate')
df


Out[189]:
GenderEarliestBirthdateMostRecentBirthdate
1F 1946-10-291985-07-01
2M 1945-11-171985-05-07

1d. Show the above results broken down by gender, and salaried/hourly

SELECT e.Gender, e.SalariedFlag, MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday' FROM dbo.Employees AS e GROUP BY e.Gender, e.SalariedFlag ;

In [191]:
aggregate(BD ~ Gender + SalariedFlag, data=Employees,
                function(x){c(format.Date(min(x)), format.Date(max(x)))})


Out[191]:
GenderSalariedFlagBD
1F 0 1948-05-251985-07-01
2M 0 1946-04-031985-05-07
3F 1 1946-10-291980-07-06
4M 1 1945-11-171982-04-14

2a. What are the average vacation hours for all employees?

SELECT AVG(e.VacationHours) FROM dbo.Employees AS e ;

In [192]:
mean(Employees$VacationHours)


Out[192]:
50.6288659793814

2b. Add to the above, the minimum vacation hours for all employees

SELECT AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minumum Vacation' FROM dbo.Employees AS e ;

In [193]:
c(mean(Employees$VacationHours), min(Employees$VacationHours))


Out[193]:
  1. 50.6288659793814
  2. 0

2c. Show the above results broken down and ordered by job title

SELECT e.JobTitle, AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minimum Vacation' FROM dbo.Employees AS e GROUP BY e.JobTitle ;

In [194]:
aggregate(VacationHours ~ JobTitle, data=Employees,
                function(x){c(mean(x), min(x))})


Out[194]:
JobTitleVacationHours
1Accountant58.5 58
2Accounts Manager57 57
3Accounts Payable Specialist63.5 63
4Accounts Receivable Specialist61 60
5Application Specialist72.5 71
6Assistant to the Chief Financial Officer56 56
7Benefits Specialist51 51
8BI Professor55 55
9Buyer56 52
10Chief Executive Officer99 99
11Chief Financial Officer0 0
12Control Specialist75.5 75
13Database Administrator66.5 66
14Design Engineer5 4
15Document Control Assistant78.5 78
16Document Control Manager77 77
17Engineering Manager2 2
18European Sales Manager21 21
19Facilities Administrative Assistant87 87
20Facilities Manager86 86
21Finance Manager55 55
22Human Resources Administrative Assistant52.5 52
23Human Resources Manager54 54
24Information Services Manager65 65
25Janitor89.5 88
26Maintenance Supervisor92 92
27Marketing Assistant42 41
28Marketing Manager40 40
29Marketing Specialist46 44
30Master Scheduler44 44
39Production Supervisor - WC4072 71
40Production Supervisor - WC4575 74
41Production Supervisor - WC5078 77
42Production Supervisor - WC6081 80
43Production Technician - WC1091 83
44Production Technician - WC2010.5 0
45Production Technician - WC3034 22
46Production Technician - WC4059.5 47
47Production Technician - WC4580 73
48Production Technician - WC5046.6538461538462 0
49Production Technician - WC6026.5 14
50Purchasing Assistant50.5 50
51Purchasing Manager49 49
52Quality Assurance Manager80 80
53Quality Assurance Supervisor81 81
54Quality Assurance Technician83.5 82
55Recruiter49.5 49
56Research and Development Engineer62.5 62
57Research and Development Manager38.5 16
58Sales Representative31 22
59Scheduling Assistant46.5 45
60Senior Design Engineer3 3
61Senior Tool Designer27.5 7
62Shipping and Receiving Clerk94.5 94
63Shipping and Receiving Supervisor93 93
64Stocker97 96
65Tool Designer8.5 8
66Vice President of Engineering1 1
67Vice President of Production64 64
68Vice President of Sales10 10

2d. Show the above results broken down by job title, and married/single employees

SELECT e.JobTitle, e.MaritalStatus, AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minimum Vacation' FROM dbo.Employees AS e GROUP BY e.JobTitle, e.MaritalStatus ;

In [196]:
aggregate(VacationHours ~ JobTitle + MaritalStatus, data=Employees,
                function(x){c(mean(x), min(x))})


Out[196]:
JobTitleMaritalStatusVacationHours
1AccountantM 58 58
2Accounts ManagerM 57 57
3Accounts Payable SpecialistM 63.5 63
4Accounts Receivable SpecialistM 60 60
5Application SpecialistM 71.5 71
6Benefits SpecialistM 51 51
7BI ProfessorM 55 55
8Buyer M 56.333333333333352
9Chief Financial OfficerM 0 0
10Control SpecialistM 75 75
11Database AdministratorM 66 66
12Design EngineerM 5 4
13Document Control AssistantM 79 79
14Document Control ManagerM 77 77
15Engineering ManagerM 2 2
16European Sales ManagerM 21 21
17Facilities Administrative AssistantM 87 87
18Facilities ManagerM 86 86
19Human Resources ManagerM 54 54
20Janitor M 89.666666666666788
21Maintenance SupervisorM 92 92
22Marketing AssistantM 41 41
23Marketing SpecialistM 45.6666666666667 44
24North American Sales ManagerM 14 14
25Pacific Sales ManagerM 20 20
26Production Control ManagerM 43 43
27Production Supervisor - WC10M 67 67
28Production Supervisor - WC20M 40.5 40
29Production Supervisor - WC40M 71.5 71
30Production Supervisor - WC45M 74 74
66Master SchedulerS 44 44
67Network AdministratorS 69.5 69
68Network ManagerS 68 68
69Production Supervisor - WC10S 65.5 65
70Production Supervisor - WC20S 42 42
71Production Supervisor - WC30S 69 68
72Production Supervisor - WC40S 73 73
73Production Supervisor - WC45S 75.5 75
74Production Supervisor - WC50S 77.5 77
75Production Supervisor - WC60S 81 80
76Production Technician - WC10S 92.7777777777778 85
77Production Technician - WC20S 11.9166666666667 0
78Production Technician - WC30S 33.4166666666667 25
79Production Technician - WC40S 61.8571428571429 47
80Production Technician - WC45S 79.4444444444444 73
81Production Technician - WC50S 53.2307692307692 0
82Production Technician - WC60S 28.5833333333333 16
83Purchasing ManagerS 49 49
84Quality Assurance ManagerS 80 80
85Quality Assurance TechnicianS 82.5 82
86RecruiterS 49.5 49
87Research and Development EngineerS 62 62
88Sales RepresentativeS 30.2857142857143 22
89Senior Design EngineerS 3 3
90Senior Tool DesignerS 27.5 7
91Shipping and Receiving SupervisorS 93 93
92StockerS 97.5 97
93Vice President of EngineeringS 1 1
94Vice President of ProductionS 64 64
95Vice President of SalesS 10 10

2e. Add to the above, the maximum vacation hours per group

SELECT e.JobTitle, e.MaritalStatus , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e GROUP BY e.JobTitle, e.MaritalStatus ;

In [197]:
aggregate(VacationHours ~ JobTitle + MaritalStatus, data=Employees,
                function(x){c(mean(x), min(x), max(x))})


Out[197]:
JobTitleMaritalStatusVacationHours
1AccountantM 58 58 58
2Accounts ManagerM 57 57 57
3Accounts Payable SpecialistM 63.5 63 64
4Accounts Receivable SpecialistM 60 60 60
5Application SpecialistM 71.5 71 72
6Benefits SpecialistM 51 51 51
7BI ProfessorM 55 55 55
8Buyer M 56.333333333333352 60
9Chief Financial OfficerM 0 0 0
10Control SpecialistM 75 75 75
11Database AdministratorM 66 66 66
12Design EngineerM 5 4 6
13Document Control AssistantM 79 79 79
14Document Control ManagerM 77 77 77
15Engineering ManagerM 2 2 2
16European Sales ManagerM 21 21 21
17Facilities Administrative AssistantM 87 87 87
18Facilities ManagerM 86 86 86
19Human Resources ManagerM 54 54 54
20Janitor M 89.666666666666788 91
21Maintenance SupervisorM 92 92 92
22Marketing AssistantM 41 41 41
23Marketing SpecialistM 45.6666666666667 44 47
24North American Sales ManagerM 14 14 14
25Pacific Sales ManagerM 20 20 20
26Production Control ManagerM 43 43 43
27Production Supervisor - WC10M 67 67 67
28Production Supervisor - WC20M 40.5 40 41
29Production Supervisor - WC40M 71.5 71 72
30Production Supervisor - WC45M 74 74 74
66Master SchedulerS 44 44 44
67Network AdministratorS 69.5 69 70
68Network ManagerS 68 68 68
69Production Supervisor - WC10S 65.5 65 66
70Production Supervisor - WC20S 42 42 42
71Production Supervisor - WC30S 69 68 70
72Production Supervisor - WC40S 73 73 73
73Production Supervisor - WC45S 75.5 75 76
74Production Supervisor - WC50S 77.5 77 78
75Production Supervisor - WC60S 81 80 82
76Production Technician - WC10S 92.7777777777778 85 99
77Production Technician - WC20S 11.9166666666667 0 21
78Production Technician - WC30S 33.4166666666667 25 44
79Production Technician - WC40S 61.8571428571429 47 72
80Production Technician - WC45S 79.4444444444444 73 85
81Production Technician - WC50S 53.2307692307692 0 98
82Production Technician - WC60S 28.5833333333333 16 38
83Purchasing ManagerS 49 49 49
84Quality Assurance ManagerS 80 80 80
85Quality Assurance TechnicianS 82.5 82 83
86RecruiterS 49.5 49 50
87Research and Development EngineerS 62 62 62
88Sales RepresentativeS 30.2857142857143 22 38
89Senior Design EngineerS 3 3 3
90Senior Tool DesignerS 27.5 7 48
91Shipping and Receiving SupervisorS 93 93 93
92StockerS 97.5 97 98
93Vice President of EngineeringS 1 1 1
94Vice President of ProductionS 64 64 64
95Vice President of SalesS 10 10 10

2f. Show the above results broken down by job title, married/single employees, and State

SELECT e.JobTitle, e.MaritalStatus, e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e GROUP BY e.JobTitle, e.MaritalStatus, e.StateProvinceName ;

In [198]:
aggregate(VacationHours ~ JobTitle + MaritalStatus + StateProvinceName, data=Employees,
                function(x){c(mean(x), min(x), max(x))})


Out[198]:
JobTitleMaritalStatusStateProvinceNameVacationHours
1Production Technician - WC20M Alberta 9 9 9
2Production Technician - WC40M Alberta 51 51 51
3Sales RepresentativeM Alberta 33 33 33
4Accounts Receivable SpecialistS Alberta 62 62 62
5Production Technician - WC20S Alberta 20 20 20
6Research and Development ManagerM California 16 16 16
7Sales RepresentativeM California 26 26 26
8Sales RepresentativeM England 37 37 37
9BI ProfessorM Georgia 55 55 55
10Sales RepresentativeS Gironde 34 34 34
11Sales RepresentativeS Hamburg 35 35 35
12Sales RepresentativeM Massachusetts 39 39 39
13Sales RepresentativeS Michigan 38 38 38
14Sales RepresentativeS Minnesota 24 24 24
15Senior Tool DesignerS Minnesota 48 48 48
16Sales RepresentativeM Ontario 31 31 31
17Sales RepresentativeS Oregon 22 22 22
18Sales RepresentativeM Tennessee 29 29 29
19Sales RepresentativeM Utah 27 27 27
20Sales RepresentativeS Victoria 36 36 36
21AccountantM Washington58 58 58
22Accounts ManagerM Washington 57 57 57
23Accounts Payable SpecialistM Washington 63.5 63 64
24Accounts Receivable SpecialistM Washington 60 60 60
25Application SpecialistM Washington 71.5 71 72
26Benefits SpecialistM Washington 51 51 51
27Buyer M Washington 56.333333333333352 60
28Chief Financial OfficerM Washington 0 0 0
29Control SpecialistM Washington 75 75 75
30Database AdministratorM Washington 66 66 66
84Master SchedulerS Washington 44 44 44
85Network AdministratorS Washington 69.5 69 70
86Network ManagerS Washington 68 68 68
87Production Supervisor - WC10S Washington 65.5 65 66
88Production Supervisor - WC20S Washington 42 42 42
89Production Supervisor - WC30S Washington 69 68 70
90Production Supervisor - WC40S Washington 73 73 73
91Production Supervisor - WC45S Washington 75.5 75 76
92Production Supervisor - WC50S Washington 77.5 77 78
93Production Supervisor - WC60S Washington 81 80 82
94Production Technician - WC10S Washington 92.7777777777778 85 99
95Production Technician - WC20S Washington 11.1818181818182 0 21
96Production Technician - WC30S Washington 33.4166666666667 25 44
97Production Technician - WC40S Washington 61.8571428571429 47 72
98Production Technician - WC45S Washington 79.4444444444444 73 85
99Production Technician - WC50S Washington 53.2307692307692 0 98
100Production Technician - WC60S Washington 28.5833333333333 16 38
101Purchasing ManagerS Washington 49 49 49
102Quality Assurance ManagerS Washington 80 80 80
103Quality Assurance TechnicianS Washington 82.5 82 83
104Recruiter S Washington49.5 49 50
105Research and Development EngineerS Washington 62 62 62
106Sales RepresentativeS Washington 23 23 23
107Senior Design EngineerS Washington 3 3 3
108Senior Tool DesignerS Washington 7 7 7
109Shipping and Receiving SupervisorS Washington 93 93 93
110Stocker S Washington97.5 97 98
111Vice President of EngineeringS Washington 1 1 1
112Vice President of ProductionS Washington 64 64 64
113Vice President of SalesS Washington 10 10 10

2g. Show the above results but only for American employees

SELECT e.JobTitle, e.MaritalStatus, e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.JobTitle, e.MaritalStatus, e.StateProvinceName ;

In [200]:
aggregate(VacationHours ~ JobTitle + MaritalStatus + StateProvinceName
          , data=Employees[Employees$CountryName=='United States', ]
          , function(x){c(mean(x), min(x), max(x))})


Out[200]:
JobTitleMaritalStatusStateProvinceNameVacationHours
1Research and Development ManagerM California 16 16 16
2Sales RepresentativeM California 26 26 26
3BI ProfessorM Georgia 55 55 55
4Sales RepresentativeM Massachusetts 39 39 39
5Sales RepresentativeS Michigan 38 38 38
6Sales RepresentativeS Minnesota 24 24 24
7Senior Tool DesignerS Minnesota 48 48 48
8Sales RepresentativeS Oregon 22 22 22
9Sales RepresentativeM Tennessee 29 29 29
10Sales RepresentativeM Utah 27 27 27
11AccountantM Washington58 58 58
12Accounts ManagerM Washington 57 57 57
13Accounts Payable SpecialistM Washington 63.5 63 64
14Accounts Receivable SpecialistM Washington 60 60 60
15Application SpecialistM Washington 71.5 71 72
16Benefits SpecialistM Washington 51 51 51
17Buyer M Washington 56.333333333333352 60
18Chief Financial OfficerM Washington 0 0 0
19Control SpecialistM Washington 75 75 75
20Database AdministratorM Washington 66 66 66
21Design EngineerM Washington 5 4 6
22Document Control AssistantM Washington 79 79 79
23Document Control ManagerM Washington 77 77 77
24Engineering ManagerM Washington 2 2 2
25European Sales ManagerM Washington 21 21 21
26Facilities Administrative AssistantM Washington 87 87 87
27Facilities ManagerM Washington 86 86 86
28Human Resources ManagerM Washington 54 54 54
29Janitor M Washington 89.666666666666788 91
30Maintenance SupervisorM Washington 92 92 92
74Master SchedulerS Washington 44 44 44
75Network AdministratorS Washington 69.5 69 70
76Network ManagerS Washington 68 68 68
77Production Supervisor - WC10S Washington 65.5 65 66
78Production Supervisor - WC20S Washington 42 42 42
79Production Supervisor - WC30S Washington 69 68 70
80Production Supervisor - WC40S Washington 73 73 73
81Production Supervisor - WC45S Washington 75.5 75 76
82Production Supervisor - WC50S Washington 77.5 77 78
83Production Supervisor - WC60S Washington 81 80 82
84Production Technician - WC10S Washington 92.7777777777778 85 99
85Production Technician - WC20S Washington 11.1818181818182 0 21
86Production Technician - WC30S Washington 33.4166666666667 25 44
87Production Technician - WC40S Washington 61.8571428571429 47 72
88Production Technician - WC45S Washington 79.4444444444444 73 85
89Production Technician - WC50S Washington 53.2307692307692 0 98
90Production Technician - WC60S Washington 28.5833333333333 16 38
91Purchasing ManagerS Washington 49 49 49
92Quality Assurance ManagerS Washington 80 80 80
93Quality Assurance TechnicianS Washington 82.5 82 83
94Recruiter S Washington49.5 49 50
95Research and Development EngineerS Washington 62 62 62
96Sales RepresentativeS Washington 23 23 23
97Senior Design EngineerS Washington 3 3 3
98Senior Tool DesignerS Washington 7 7 7
99Shipping and Receiving SupervisorS Washington 93 93 93
100Stocker S Washington97.5 97 98
101Vice President of EngineeringS Washington 1 1 1
102Vice President of ProductionS Washington 64 64 64
103Vice President of SalesS Washington 10 10 10

2h. Change the grouping above so it's broken down by married/single and State, no more job title

SELECT e.MaritalStatus, e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.MaritalStatus, e.StateProvinceName ;

In [201]:
aggregate(VacationHours ~ MaritalStatus + StateProvinceName
          , data=Employees[Employees$CountryName=='United States', ]
          , function(x){c(mean(x), min(x), max(x))})


Out[201]:
MaritalStatusStateProvinceNameVacationHours
1M California21 16 26
2M Georgia55 55 55
3M Massachusetts39 39 39
4S Michigan38 38 38
5S Minnesota36 24 48
6S Oregon22 22 22
7M Tennessee29 29 29
8M Utah27 27 27
9M Washington 49.42647058823530 99
10S Washington 54.3629629629630 99

2i. Limit the results above to States where the average vacation hours is greater than 30

SELECT e.MaritalStatus, e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.MaritalStatus, e.StateProvinceName HAVING AVG(e.VacationHours) > 30 ;
SELECT e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.StateProvinceName HAVING AVG(e.VacationHours) > 30 ;

In [206]:
df <- aggregate(VacationHours ~ MaritalStatus + StateProvinceName
          , data=Employees[Employees$CountryName=='United States', ]
          , function(x){c(mean(x), min(x), max(x))})
df[df$VacationHours[,1]>30, ]


Out[206]:
MaritalStatusStateProvinceNameVacationHours
2M Georgia55 55 55
3M Massachusetts39 39 39
4S Michigan38 38 38
5S Minnesota36 24 48
9M Washington 49.42647058823530 99
10S Washington 54.3629629629630 99

2j. Limit the results above to States where the average vacation hours is greater than 30 and the maximum vacation hours is less than 50

SELECT e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.StateProvinceName HAVING AVG(e.VacationHours) BETWEEN 31 AND 49 ;

In [207]:
1:10


Out[207]:
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10

In [209]:
df <- aggregate(VacationHours ~ MaritalStatus + StateProvinceName
          , data=Employees[Employees$CountryName=='United States', ]
          , function(x){c(mean(x), min(x), max(x))})
df[(df$VacationHours[,1]>30) & (df$VacationHours[,1]<50) , ]


Out[209]:
MaritalStatusStateProvinceNameVacationHours
3M Massachusetts39 39 39
4S Michigan38 38 38
5S Minnesota36 24 48
9M Washington 49.42647058823530 99

2k. Show the same results but only for non-American employees

SELECT e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName != 'United States' GROUP BY e.StateProvinceName HAVING AVG(e.VacationHours) BETWEEN 31 AND 49 ;

In [210]:
df <- aggregate(VacationHours ~ MaritalStatus + StateProvinceName
          , data=Employees[Employees$CountryName!='United States', ]
          , function(x){c(mean(x), min(x), max(x))})
df[(df$VacationHours[,1]>30) & (df$VacationHours[,1]<50) , ]


Out[210]:
MaritalStatusStateProvinceNameVacationHours
1M Alberta31 9 51
2S Alberta41 20 62
3M England37 37 37
4S Gironde34 34 34
5S Hamburg35 35 35
6M Ontario31 31 31
7S Victoria36 36 36

3a. Report how many employees are in the company

SELECT COUNT(1) FROM dbo.Employees;

In [211]:
dim(Employees)[1]


Out[211]:
291

3b. For the above report, show the number of employees per manager (hint: use ManagerID)

SELECT e.ManagerID, COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e GROUP BY e.ManagerID ;

In [213]:
aggregate(EmployeeID ~ ManagerID
          , data=Employees
          , function(x){length(x)})


Out[213]:
ManagerIDEmployeeID
1 1
218
3168
42276
52343
62355
72499
82530
925017
1026 178
112639
122733
1327410
142851
152873

3c. Remove any manager ID's that are NULL from the results above

SELECT e.ManagerID, COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e WHERE e.ManagerID IS NOT NULL GROUP BY e.ManagerID ;

In [236]:
aggregate(EmployeeID ~ ManagerID
          , data=Employees[!(is.null(Employees$ManagerID) | (as.character(Employees$ManagerID)=='')), ]
          , function(x){length(x)})


Out[236]:
ManagerIDEmployeeID
118
2168
32276
42343
52355
62499
72530
825017
926 178
102639
112733
1227410
132851
142873

3d. Show the same results as above, but only for managers who have at least 5 employees

SELECT e.ManagerID, COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e WHERE e.ManagerID IS NOT NULL GROUP BY e.ManagerID HAVING COUNT(1) >= 5 ;

In [239]:
df <- aggregate(EmployeeID ~ ManagerID
          , data=Employees[!(is.null(Employees$ManagerID) | (as.character(Employees$ManagerID)=='')), ]
          , function(x){length(x)})
names(df)[2] <- 'NumEmployees'
df[df$NumEmployees >= 5, ]


Out[239]:
ManagerIDNumEmployees
118
2168
32276
52355
62499
72530
825017
926 178
102639
1227410

4a. List the average vacation hours of all employees

SELECT AVG(e.VacationHours) FROM dbo.Employees AS e ;

In [240]:
mean(Employees$VacationHours)


Out[240]:
50.6288659793814

4b. Break down the results by State

SELECT e.StateProvinceName, AVG(e.VacationHours) AS 'Average Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName ;

In [242]:
aggregate(VacationHours ~ StateProvinceName, data=Employees, mean)


Out[242]:
StateProvinceNameVacationHours
1Alberta35
2California21
3England37
4Georgia55
5Gironde34
6Hamburg35
7Massachusetts39
8Michigan38
9Minnesota36
10Ontario31
11Oregon22
12Tennessee29
13Utah27
14Victoria36
15Washington 51.8856088560886

4c. Break down the results by city and State

SELECT e.StateProvinceName, e.City, AVG(e.VacationHours) AS 'Average Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ;

In [243]:
aggregate(VacationHours ~ StateProvinceName + City, data=Employees, mean)


Out[243]:
StateProvinceNameCityVacationHours
1GeorgiaAtlanta55
2Washington Bellevue 44.3055555555556
3HamburgBerlin 35
4Gironde Bordeaux34
5Washington Bothell 65.6923076923077
6AlbertaCalgary35
7England Cambridge37
8MassachusettsCambridge 39
9WashingtonCarnation 59
10MichiganDetroit 38
11MinnesotaDuluth 24
12WashingtonDuvall 42.2
13WashingtonEdmonds 47.64
14Washington Everett 57.8235294117647
15WashingtonGold Bar 70.8
16WashingtonIndex 76
17Washington Issaquah 67.6666666666667
18Washington Kenmore 53.9166666666667
19WashingtonKent 5
20Victoria Melbourne36
21TennesseeMemphis 29
22Minnesota Minneapolis48
23Washington Monroe 47.9166666666667
24Utah Nevada27
25Washington Newport Hills 62.8333333333333
26OntarioOttawa 31
27Oregon Portland22
28Washington Redmond 37.4285714285714
29Washington Renton 54.1764705882353
30Washington Sammamish 42.8235294117647
31California San Francisco21
32Washington Seattle 54.6590909090909
33WashingtonSnohomish 52.8

4d. Add something that shows the number of employees per city

SELECT e.StateProvinceName, e.City , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ;

In [247]:
aggregate(VacationHours ~ StateProvinceName + City, data=Employees, function(x){c(mean(x), length(x))})


Out[247]:
StateProvinceNameCityVacationHours
1GeorgiaAtlanta55 1
2Washington Bellevue 44.305555555555636
3HamburgBerlin 35 1
4Gironde Bordeaux34 1
5Washington Bothell 65.692307692307713
6AlbertaCalgary35 5
7England Cambridge37 1
8MassachusettsCambridge 39 1
9WashingtonCarnation 59 5
10MichiganDetroit 38 1
11MinnesotaDuluth 24 1
12WashingtonDuvall 42.2 10
13WashingtonEdmonds 47.64 25
14Washington Everett 57.823529411764717
15WashingtonGold Bar 70.8 5
16WashingtonIndex 76 5
17Washington Issaquah 67.666666666666715
18Washington Kenmore 53.916666666666712
19WashingtonKent 5 1
20Victoria Melbourne36 1
21TennesseeMemphis 29 1
22Minnesota Minneapolis48 1
23Washington Monroe 47.916666666666712
24Utah Nevada27 1
25Washington Newport Hills 62.83333333333336
26OntarioOttawa 31 1
27Oregon Portland22 1
28Washington Redmond 37.428571428571421
29Washington Renton 54.176470588235317
30Washington Sammamish 42.823529411764717
31California San Francisco21 2
32Washington Seattle 54.659090909090944
33WashingtonSnohomish 52.8 10

4e. Sort the results by the city and state

SELECT e.StateProvinceName, e.City , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ORDER BY e.StateProvinceName, e.City ;

In [258]:
df <- aggregate(VacationHours ~ StateProvinceName + City, data=Employees, function(x){c(mean(x), length(x))})
df[order(df$StateProvinceName, df$City), ]


Out[258]:
StateProvinceNameCityVacationHours
6AlbertaCalgary35 5
31California San Francisco21 2
7England Cambridge37 1
1GeorgiaAtlanta55 1
4Gironde Bordeaux34 1
3HamburgBerlin 35 1
8MassachusettsCambridge 39 1
10MichiganDetroit 38 1
11MinnesotaDuluth 24 1
22Minnesota Minneapolis48 1
26OntarioOttawa 31 1
27Oregon Portland22 1
21TennesseeMemphis 29 1
24Utah Nevada27 1
20Victoria Melbourne36 1
2Washington Bellevue 44.305555555555636
5Washington Bothell 65.692307692307713
9WashingtonCarnation 59 5
12WashingtonDuvall 42.2 10
13WashingtonEdmonds 47.64 25
14Washington Everett 57.823529411764717
15WashingtonGold Bar 70.8 5
16WashingtonIndex 76 5
17Washington Issaquah 67.666666666666715
18Washington Kenmore 53.916666666666712
19WashingtonKent 5 1
23Washington Monroe 47.916666666666712
25Washington Newport Hills 62.83333333333336
28Washington Redmond 37.428571428571421
29Washington Renton 54.176470588235317
30Washington Sammamish 42.823529411764717
32Washington Seattle 54.659090909090944
33WashingtonSnohomish 52.8 10

4f. Make city and State a single column in the format of "City, State"

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ORDER BY e.StateProvinceName, e.City ;

In [294]:
df <- aggregate(VacationHours ~ StateProvinceName + City, data=Employees, function(x){c(mean(x), length(x))})
df2 <- cbind(
    df[, c('StateProvinceName', 'City')]
    , data.frame(CityState=paste(df$City, df$StateProvinceName, sep = ', '))
    , as.data.frame(df$VacationHours)
)
names(df2)[4:5] <- c('AvgVacationHours', 'NumEmployees')
df2


Out[294]:
StateProvinceNameCityCityStateAvgVacationHoursNumEmployees
1Georgia Atlanta Atlanta, Georgia55 1
2Washington Bellevue Bellevue, Washington44.3055555555556 36
3Hamburg Berlin Berlin, Hamburg35 1
4Gironde Bordeaux Bordeaux, Gironde34 1
5Washington Bothell Bothell, Washington65.6923076923077 13
6Alberta Calgary Calgary, Alberta35 5
7England Cambridge Cambridge, England37 1
8Massachusetts Cambridge Cambridge, Massachusetts39 1
9Washington Carnation Carnation, Washington59 5
10Michigan Detroit Detroit, Michigan38 1
11Minnesota Duluth Duluth, Minnesota24 1
12Washington Duvall Duvall, Washington42.2 10
13Washington Edmonds Edmonds, Washington47.64 25
14Washington Everett Everett, Washington57.8235294117647 17
15Washington Gold Bar Gold Bar, Washington70.8 5
16Washington Index Index, Washington76 5
17Washington Issaquah Issaquah, Washington67.6666666666667 15
18Washington Kenmore Kenmore, Washington53.9166666666667 12
19Washington Kent Kent, Washington5 1
20Victoria Melbourne Melbourne, Victoria36 1
21Tennessee Memphis Memphis, Tennessee29 1
22Minnesota Minneapolis Minneapolis, Minnesota48 1
23Washington Monroe Monroe, Washington47.9166666666667 12
24Utah Nevada Nevada, Utah27 1
25Washington Newport Hills Newport Hills, Washington62.8333333333333 6
26Ontario Ottawa Ottawa, Ontario31 1
27Oregon Portland Portland, Oregon22 1
28Washington Redmond Redmond, Washington37.4285714285714 21
29Washington Renton Renton, Washington54.1764705882353 17
30Washington Sammamish Sammamish, Washington42.8235294117647 17
31California San Francisco San Francisco, California21 2
32Washington Seattle Seattle, Washington54.6590909090909 44
33Washington Snohomish Snohomish, Washington52.8 10

4g. Add a column that shows the difference between the maximum vacation hours and minimum vacation hours for each city

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' , MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ORDER BY e.StateProvinceName, e.City ;

In [295]:
df <- aggregate(VacationHours ~ StateProvinceName + City, data=Employees
                , function(x){c(mean(x), length(x), max(x)-min(x))})
df2 <- cbind(
    df[, c('StateProvinceName', 'City')]
    , data.frame(CityState=paste(df$City, df$StateProvinceName, sep = ', '))
    , as.data.frame(df$VacationHours)
)
names(df2)[4:6] <- c('AvgVacationHours', 'NumEmployees', 'DifferenceVacHrs')
df2


Out[295]:
StateProvinceNameCityCityStateAvgVacationHoursNumEmployeesDifferenceVacHrs
1Georgia Atlanta Atlanta, Georgia55 1 0
2Washington Bellevue Bellevue, Washington44.3055555555556 36 95
3Hamburg Berlin Berlin, Hamburg35 1 0
4Gironde Bordeaux Bordeaux, Gironde34 1 0
5Washington Bothell Bothell, Washington65.6923076923077 13 90
6Alberta Calgary Calgary, Alberta35 5 53
7England Cambridge Cambridge, England37 1 0
8Massachusetts Cambridge Cambridge, Massachusetts39 1 0
9Washington Carnation Carnation, Washington59 5 64
10Michigan Detroit Detroit, Michigan38 1 0
11Minnesota Duluth Duluth, Minnesota24 1 0
12Washington Duvall Duvall, Washington42.2 10 92
13Washington Edmonds Edmonds, Washington47.64 25 97
14Washington Everett Everett, Washington57.8235294117647 17 71
15Washington Gold Bar Gold Bar, Washington70.8 5 39
16Washington Index Index, Washington76 5 32
17Washington Issaquah Issaquah, Washington67.6666666666667 15 73
18Washington Kenmore Kenmore, Washington53.9166666666667 12 90
19Washington Kent Kent, Washington5 1 0
20Victoria Melbourne Melbourne, Victoria36 1 0
21Tennessee Memphis Memphis, Tennessee29 1 0
22Minnesota Minneapolis Minneapolis, Minnesota48 1 0
23Washington Monroe Monroe, Washington47.9166666666667 12 71
24Utah Nevada Nevada, Utah27 1 0
25Washington Newport Hills Newport Hills, Washington62.8333333333333 6 55
26Ontario Ottawa Ottawa, Ontario31 1 0
27Oregon Portland Portland, Oregon22 1 0
28Washington Redmond Redmond, Washington37.4285714285714 21 92
29Washington Renton Renton, Washington54.1764705882353 17 97
30Washington Sammamish Sammamish, Washington42.8235294117647 17 93
31California San Francisco San Francisco, California21 2 10
32Washington Seattle Seattle, Washington54.6590909090909 44 89
33Washington Snohomish Snohomish, Washington52.8 10 85

4h. Now sort the results by the new column created above

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' , MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ORDER BY 'Difference Vacation Hours' DESC ;

In [297]:
df <- aggregate(VacationHours ~ StateProvinceName + City, data=Employees
                , function(x){c(mean(x), length(x), max(x)-min(x))})
df2 <- cbind(
    df[, c('StateProvinceName', 'City')]
    , data.frame(CityState=paste(df$City, df$StateProvinceName, sep = ', '))
    , as.data.frame(df$VacationHours)
)
names(df2)[4:6] <- c('AvgVacationHours', 'NumEmployees', 'DifferenceVacHrs')
df2[order(-df2$DifferenceVacHrs), ]


Out[297]:
StateProvinceNameCityCityStateAvgVacationHoursNumEmployeesDifferenceVacHrs
13Washington Edmonds Edmonds, Washington47.64 25 97
29Washington Renton Renton, Washington54.1764705882353 17 97
2Washington Bellevue Bellevue, Washington44.3055555555556 36 95
30Washington Sammamish Sammamish, Washington42.8235294117647 17 93
12Washington Duvall Duvall, Washington42.2 10 92
28Washington Redmond Redmond, Washington37.4285714285714 21 92
5Washington Bothell Bothell, Washington65.6923076923077 13 90
18Washington Kenmore Kenmore, Washington53.9166666666667 12 90
32Washington Seattle Seattle, Washington54.6590909090909 44 89
33Washington Snohomish Snohomish, Washington52.8 10 85
17Washington Issaquah Issaquah, Washington67.6666666666667 15 73
14Washington Everett Everett, Washington57.8235294117647 17 71
23Washington Monroe Monroe, Washington47.9166666666667 12 71
9Washington Carnation Carnation, Washington59 5 64
25Washington Newport Hills Newport Hills, Washington62.8333333333333 6 55
6Alberta Calgary Calgary, Alberta35 5 53
15Washington Gold Bar Gold Bar, Washington70.8 5 39
16Washington Index Index, Washington76 5 32
31California San Francisco San Francisco, California21 2 10
1Georgia Atlanta Atlanta, Georgia55 1 0
3Hamburg Berlin Berlin, Hamburg35 1 0
4Gironde Bordeaux Bordeaux, Gironde34 1 0
7England Cambridge Cambridge, England37 1 0
8Massachusetts Cambridge Cambridge, Massachusetts39 1 0
10Michigan Detroit Detroit, Michigan38 1 0
11Minnesota Duluth Duluth, Minnesota24 1 0
19Washington Kent Kent, Washington5 1 0
20Victoria Melbourne Melbourne, Victoria36 1 0
21Tennessee Memphis Memphis, Tennessee29 1 0
22Minnesota Minneapolis Minneapolis, Minnesota48 1 0
24Utah Nevada Nevada, Utah27 1 0
26Ontario Ottawa Ottawa, Ontario31 1 0
27Oregon Portland Portland, Oregon22 1 0

4i. Limit the results to cities that have more than 1 employee

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' , MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City HAVING COUNT(1) > 1 ORDER BY 'Difference Vacation Hours' DESC ;

In [309]:
df <- aggregate(VacationHours ~ StateProvinceName + City, data=Employees
                , function(x){c(mean(x), length(x), max(x)-min(x))})
df2 <- cbind(
    df[, c('StateProvinceName', 'City')]
    , data.frame(CityState=paste(df$City, df$StateProvinceName, sep = ', '))
    , as.data.frame(df$VacationHours)
)
names(df2)[4:6] <- c('AvgVacationHours', 'NumEmployees', 'DifferenceVacHrs')
df3 <- df2[df2$NumEmployees>1, ]
df3[order(-df3$DifferenceVacHrs), ]


Out[309]:
StateProvinceNameCityCityStateAvgVacationHoursNumEmployeesDifferenceVacHrs
13Washington Edmonds Edmonds, Washington47.64 25 97
29Washington Renton Renton, Washington54.1764705882353 17 97
2Washington Bellevue Bellevue, Washington44.3055555555556 36 95
30Washington Sammamish Sammamish, Washington42.8235294117647 17 93
12Washington Duvall Duvall, Washington42.2 10 92
28Washington Redmond Redmond, Washington37.4285714285714 21 92
5Washington Bothell Bothell, Washington65.6923076923077 13 90
18Washington Kenmore Kenmore, Washington53.9166666666667 12 90
32Washington Seattle Seattle, Washington54.6590909090909 44 89
33Washington Snohomish Snohomish, Washington52.8 10 85
17Washington Issaquah Issaquah, Washington67.6666666666667 15 73
14Washington Everett Everett, Washington57.8235294117647 17 71
23Washington Monroe Monroe, Washington47.9166666666667 12 71
9Washington Carnation Carnation, Washington59 5 64
25Washington Newport Hills Newport Hills, Washington62.8333333333333 6 55
6Alberta Calgary Calgary, Alberta35 5 53
15Washington Gold Bar Gold Bar, Washington70.8 5 39
16Washington Index Index, Washington76 5 32
31California San Francisco San Francisco, California21 2 10

4j. Limit the results to non-U.S. cities

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' , MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours' FROM dbo.Employees AS e WHERE e.CountryName != 'United States' GROUP BY e.StateProvinceName, e.City HAVING COUNT(1) > 1 ORDER BY 'Difference Vacation Hours' DESC ;

In [313]:
df <- aggregate(VacationHours ~ StateProvinceName + City
                , data=Employees[Employees$CountryName != 'United States', ]
                , function(x){c(mean(x), length(x), max(x)-min(x))})
df2 <- cbind(
    df[, c('StateProvinceName', 'City')]
    , data.frame(CityState=paste(df$City, df$StateProvinceName, sep = ', '))
    , as.data.frame(df$VacationHours)
)
names(df2)[4:6] <- c('AvgVacationHours', 'NumEmployees', 'DifferenceVacHrs')
df3 <- df2[df2$NumEmployees>1, ]
df3[order(-df3$DifferenceVacHrs), ]


Out[313]:
StateProvinceNameCityCityStateAvgVacationHoursNumEmployeesDifferenceVacHrs
3Alberta Calgary Calgary, Alberta35 5 53