Topic:    Challenge Set 9 Part II
Subject:  SQL
Date:     02/20/2017
Name:     Prashant Tatineni



In [2]:

    
from sqlalchemy import create_engine
import pandas as pd
cnx = create_engine('postgresql://prashant:ptpro3@52.14.144.23:5432/prashant')
#port ~ 5432



In [2]:

    
pd.read_sql_query('''SELECT * FROM allstarfull LIMIT 5''',cnx)









    Out[2]:






  
    
      
      playerid
      yearid
      gamenum
      gameid
      teamid
      lgid
      gp
      startingpos
    
  
  
    
      0
      gomezle01
      1933
      0
      ALS193307060
      NYA
      AL
      1
      1
    
    
      1
      ferreri01
      1933
      0
      ALS193307060
      BOS
      AL
      1
      2
    
    
      2
      gehrilo01
      1933
      0
      ALS193307060
      NYA
      AL
      1
      3
    
    
      3
      gehrich01
      1933
      0
      ALS193307060
      DET
      AL
      1
      4
    
    
      4
      dykesji01
      1933
      0
      ALS193307060
      CHA
      AL
      1
      5



In [3]:

    
pd.read_sql_query('''SELECT * FROM schools LIMIT 5''',cnx)









    Out[3]:






  
    
      
      schoolid
      schoolname
      schoolcity
      schoolstate
      schoolnick
    
  
  
    
      0
      abilchrist
      Abilene Christian University
      Abilene
      TX
      Wildcats
    
    
      1
      adelphi
      Adelphi University
      Garden City
      NY
      Panthers
    
    
      2
      adrianmi
      Adrian College
      Adrian
      MI
      Bulldogs
    
    
      3
      airforce
      United States Air Force Academy
      Colorado Springs
      CO
      Falcons
    
    
      4
      akron
      University of Akron
      Akron
      OH
      Zips



In [35]:

    
pd.read_sql_query('''SELECT * FROM salaries LIMIT 5''',cnx)



In [4]:

    
pd.read_sql_query('''SELECT schoolstate,Count(schoolid) as ct FROM schools Group By schoolstate ORDER BY ct DESC LIMIT 5''',cnx)









    Out[4]:






  
    
      
      schoolstate
      ct
    
  
  
    
      0
      PA
      57
    
    
      1
      CA
      48
    
    
      2
      NY
      45
    
    
      3
      TX
      41
    
    
      4
      OH
      33



In [5]:

    
pd.read_sql_query('''SELECT playerid,salary 
                     FROM Salaries 
                     WHERE yearid = '1985' and salary > '500000' LIMIT 5;''',cnx)



In [ ]:



In [6]:









    Out[6]:






  
    
      
      schoolid
      schoolname
      schoolcity
      schoolstate
      schoolnick
    
  
  
    
      0
      abilchrist
      Abilene Christian University
      Abilene
      TX
      Wildcats
    
    
      1
      adelphi
      Adelphi University
      Garden City
      NY
      Panthers
    
    
      2
      adrianmi
      Adrian College
      Adrian
      MI
      Bulldogs
    
    
      3
      airforce
      United States Air Force Academy
      Colorado Springs
      CO
      Falcons
    
    
      4
      akron
      University of Akron
      Akron
      OH
      Zips



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

1. What was the total spent on salaries by each team, each year?



In [14]:

    
pd.read_sql_query('''SELECT yearid, teamid, SUM(salary) FROM salaries
                        GROUP BY 1,2
                        ORDER BY 1 DESC
                        LIMIT 10
                        ''',cnx)









    Out[14]:






  
    
      
      yearid
      teamid
      sum
    
  
  
    
      0
      2013
      ARI
      90132000.0
    
    
      1
      2013
      PHI
      169863189.0
    
    
      2
      2013
      LAA
      124174750.0
    
    
      3
      2013
      SLN
      92260110.0
    
    
      4
      2013
      PIT
      77062000.0
    
    
      5
      2013
      SFN
      140180334.0
    
    
      6
      2013
      KCA
      80091725.0
    
    
      7
      2013
      MIN
      75337500.0
    
    
      8
      2013
      COL
      74409071.0
    
    
      9
      2013
      MIL
      76947033.0

2. What is the first and last year played for each player? Hint: Create a new table from 'Fielding.csv'.



In [19]:

    
pd.read_sql_query('''SELECT playerid, min(yearid), max(yearid)
                        FROM fielding
                        GROUP BY 1
                        LIMIT 10
                        ''',cnx)

3. Who has played the most all star games?



In [22]:

    
pd.read_sql_query('''SELECT playerid, COUNT(*)
                        FROM allstarfull
                        GROUP BY 1
                        ORDER BY 2 DESC
                        LIMIT 1
                        ''',cnx)









    Out[22]:






  
    
      
      playerid
      count
    
  
  
    
      0
      aaronha01
      25

4. Which school has generated the most distinct players? Hint: Create new table from 'CollegePlaying.csv'.



In [8]:

    
pd.read_sql_query('''SELECT schoolid, count(distinct playerid)
                        FROM schoolsplayers
                        GROUP BY 1
                        ORDER BY 2 DESC
                        LIMIT 1
                        ''',cnx)









    Out[8]:






  
    
      
      schoolid
      count
    
  
  
    
      0
      usc
      102

5. Which players have the longest career? Assume that the debut and finalGame columns comprise the start and end, respectively, of a player's career. Hint: Create a new table from 'Master.csv'. Also note that strings can be converted to dates using the DATE function and can then be subtracted from each other yielding their difference in days.



In [30]:

    
pd.read_sql_query('''SELECT playerid, finalgame, debut, (finalgame-debut) AS days
                        FROM master
                        WHERE finalgame IS NOT NULL and debut IS NOT NULL
                        ORDER BY 4 DESC
                        LIMIT 5''',cnx)

6. What is the distribution of debut months? Hint: Look at the DATE and EXTRACT functions.



In [34]:

    
pd.read_sql_query('''SELECT EXTRACT(MONTH FROM debut) AS debut_month, COUNT(*)
                        FROM master
                        GROUP BY 1
                        ORDER BY 1 ASC''',cnx)









    Out[34]:






  
    
      
      debut_month
      count
    
  
  
    
      0
      3.0
      41
    
    
      1
      4.0
      4711
    
    
      2
      5.0
      2230
    
    
      3
      6.0
      1893
    
    
      4
      7.0
      1978
    
    
      5
      8.0
      1943
    
    
      6
      9.0
      5061
    
    
      7
      10.0
      308
    
    
      8
      NaN
      189

7. What is the effect of table join order on mean salary for the players listed in the main (master) table? Hint: Perform two different queries, one that joins on playerID in the salary table and other that joins on the same column in the master table. You will have to use left joins for each since right joins are not currently supported with SQLalchemy.



In [43]:

    
pd.read_sql_query('''SELECT S.playerid, AVG(salary)
                        FROM Salaries S LEFT JOIN Master M
                        ON S.playerid = M.playerid
                        GROUP BY 1
                        LIMIT 5''',cnx)



In [48]:

    
pd.read_sql_query('''SELECT M.playerid, AVG(salary)
                        FROM Master M LEFT JOIN Salaries S
                        ON M.playerid = S.playerid
                        GROUP BY 1
                        LIMIT 5''',cnx)









    Out[48]:






  
    
      
      playerid
      avg
    
  
  
    
      0
      gagnied01
      None
    
    
      1
      ramospe01
      None
    
    
      2
      putkolu01
      None
    
    
      3
      boyerke01
      None
    
    
      4
      dillibo01
      None

By joining to the Master table, we can see that there's many players with no salary data.

	playerid	salary
0	barkele01	870000.0
1	bedrost01	550000.0
2	benedbr01	545000.0
3	campri01	633333.0
4	ceronri01	625000.0

	playerid	min	max
0	gagnied01	1914	1915
1	ramospe01	1955	1970
2	putkolu01	2012	2013
3	boyerke01	1955	1969
4	dillibo01	1946	1951
5	jeterde01	1995	2013
6	sperrst01	1936	1938
7	vidrojo01	1997	2008
8	izturce01	2001	2013
9	clarkda06	2007	2007

	playerid	finalgame	debut	days
0	altroni01	1933-10-01	1898-07-14	12862
1	orourji01	1904-09-22	1872-04-26	11836
2	minosmi01	1980-10-05	1949-04-19	11492
3	olearch01	1934-09-30	1904-04-14	11126
4	lathaar01	1909-09-30	1880-07-05	10678

	playerid	avg
0	bookech01	350000.0
1	wisede01	443750.0
2	contrjo01	6750000.0
3	myersro02	175000.0
4	jeterde01	14034338.5

	playerid	yearid	gameid	teamid	lgid	gp	startingpos
0	gomezle01	1933	ALS193307060	NYA	AL	1	1
1	ferreri01	1933	ALS193307060	BOS	AL	1	2
2	gehrilo01	1933	ALS193307060	NYA	AL	1	3
3	gehrich01	1933	ALS193307060	DET	AL	1	4
4	dykesji01	1933	ALS193307060	CHA	AL	1	5

	schoolid	schoolname	schoolcity	schoolstate	schoolnick
0	abilchrist	Abilene Christian University	Abilene	TX	Wildcats
1	adelphi	Adelphi University	Garden City	NY	Panthers
2	adrianmi	Adrian College	Adrian	MI	Bulldogs
3	airforce	United States Air Force Academy	Colorado Springs	CO	Falcons
4	akron	University of Akron	Akron	OH	Zips

	yearid	teamid	lgid	playerid	salary
0	1985	BAL	AL	murraed02	1472819.0
1	1985	BAL	AL	lynnfr01	1090000.0
2	1985	BAL	AL	ripkeca01	800000.0
3	1985	BAL	AL	lacyle01	725000.0
4	1985	BAL	AL	flanami01	641667.0

	yearid	teamid	sum
0	2013	ARI	90132000.0
1	2013	PHI	169863189.0
2	2013	LAA	124174750.0
3	2013	SLN	92260110.0
4	2013	PIT	77062000.0
5	2013	SFN	140180334.0
6	2013	KCA	80091725.0
7	2013	MIN	75337500.0
8	2013	COL	74409071.0
9	2013	MIL	76947033.0

	debut_month	count
0	3.0	41
1	4.0	4711
2	5.0	2230
3	6.0	1893
4	7.0	1978
5	8.0	1943
6	9.0	5061
7	10.0	308
8	NaN	189

	playerid	avg
0	gagnied01	None
1	ramospe01	None
2	putkolu01	None
3	boyerke01	None
4	dillibo01	None