In [1]:
from sportstat.models import Team, Athlete, Game, Play, Action, Observation

Parsing tsv files and populating the database


In [2]:
osu_roster_filepath = '../data/osu_roster.csv'

Inspecting the first few lines of the file, we get a feel for this data schema.

Mongo Considerations:

- can specify categories for validation


In [3]:
!head {osu_roster_filepath}


13	apple eli	Apple, Eli	CB	73	6-1	200	3	SO	Voorhees, N.J. (Eastern)
28	ball warren	Ball, Warren	RB	73	6-1	225	5	JR	Columbus, Ohio (DeSales)
16	barrett j.t.	Barrett, J.T.	QB	74	6-2	225	3	SO	Wichita Falls, Texas (Rider)
85	baugh marcus	Baugh, Marcus	TE	77	6-5	255	3	SO	Riverside, Calif. (John W. North)
11	bell vonn	Bell, Vonn	SAF	71	5-11	205	5	JR	Rossville, Ga. (Ridgeland)
44	berger kyle	Berger, Kyle	LB	74	6-2	230	7	SR	Cleveland, Ohio (St. Ignatius)
33	booker dante	Booker, Dante	LB	75	6-3	233	3	SO	Akron (St. Vincent-St. Mary)
50	boren jacoby	Boren, Jacoby	OL	74	6-2	285	7	SR	Pickerington, Ohio (Pickerington Central)
97	bosa joey	Bosa, Joey	DL	78	6-6	275	5	JR	Fort Lauderdale, Fla. (St. Thomas Aquinas)
80	brown noah	Brown, Noah	WR	74	6-2	222	3	SO	Flanders, N.J. (Pope John XXIII)
Number Name Name Position Something Height Weight Something Year Hometown
... ... .... ... ... ... ... ... ... ...

In [5]:
ohio_state = Team()

In [6]:
ohio_state.city = 'Columbus'
ohio_state.name = 'Buckeyes'
ohio_state.state = 'OH'

In [7]:
ohio_state.save()

In [9]:
with open(osu_roster_filepath, 'r') as f:
    for line in f.readlines():
        items = line.split('\t')
        number = int(items[0])
        name = items[1]
        first, last = items[2].split(', ')
        position = items[3]
        something = items[4]
        height = items[5]
        weight = items[6]
        something = items[7]
        year = items[8]
        hometown = items[9]
        
        athlete = Athlete()
        athlete.number = number
        athlete.name = name
        athlete.first = first
        athlete.last = last
        athlete.position = position
        athlete.height = height
        athlete.weight = weight
        athlete.year = year
        athlete.hometown = hometown
        
        # athlete.save()
        print(number, name, first, last, position, something, height, weight, something, year, hometown)


13 apple eli Apple Eli CB 3 6-1 200 3 SO Voorhees, N.J. (Eastern)

28 ball warren Ball Warren RB 5 6-1 225 5 JR Columbus, Ohio (DeSales)

16 barrett j.t. Barrett J.T. QB 3 6-2 225 3 SO Wichita Falls, Texas (Rider)

85 baugh marcus Baugh Marcus TE 3 6-5 255 3 SO Riverside, Calif. (John W. North)

11 bell vonn Bell Vonn SAF 5 5-11 205 5 JR Rossville, Ga. (Ridgeland)

44 berger kyle Berger Kyle LB 7 6-2 230 7 SR Cleveland, Ohio (St. Ignatius)

33 booker dante Booker Dante LB 3 6-3 233 3 SO Akron (St. Vincent-St. Mary)

50 boren jacoby Boren Jacoby OL 7 6-2 285 7 SR Pickerington, Ohio (Pickerington Central)

97 bosa joey Bosa Joey DL 5 6-6 275 5 JR Fort Lauderdale, Fla. (St. Thomas Aquinas)

80 brown noah Brown Noah WR 3 6-2 222 3 SO Flanders, N.J. (Pope John XXIII)

48 burger joe Burger Joe LB 7 6-2 230 7 SR Cincinnati, Ohio (LaSalle)

16 burrows cam Burrows Cam DB 5 6-0 208 5 JR Trotwood, Ohio (Trotwood Madison)

21 campbell parris Campbell Parris WR 3 6-1 205 3 SO Akron (St. Vincent-St. Mary)

28 cibene michael Cibene Michael SAF 5 6-0 185 5 JR Fort Lauderdale (Pine Crest)

82 clark james Clark James WR 3 5-10 185 3 SO New Smyrna Beach, Fla. (New Smyrna Beach)

13 collier stephen Collier Stephen QB 3 6-4 225 3 SO Leesburg, Ga. (Lee County)

19 conley gareon Conley Gareon CB 3 6-0 195 3 SO Massillon, Ohio (Washington)

34 conner nick Conner Nick LB 1 6-3 230 1 FR Dublin, Ohio (Scioto)

11 cook justin Cook Justin QB 3 6-3 225 3 SO Philadelphia, Pa. (Western Reserve Academy)

9 cornell jashon Cornell Jashon DL 1 6-3 265 1 FR St. Paul, Minn. (Cretin-Derham Hall)

4 dean jamel Dean Jamel DB 1 6-3 200 1 FR Cocoa, Fla. (Cocoa)

68 decker taylor Decker Taylor OL 7 6-8 315 7 SR Vandalia, Ohio (Butler)

1 dixon johnnie Dixon Johnnie WR 1 5-11 194 1 FR West Palm Beach, Fla. (Dwyer)

25 dunn bri'onte Dunn Bri'onte RB 5 6-0 215 5 JR Canton, Ohio (GlenOak)

65 elflein pat Elflein Pat OL 5 6-3 300 5 JR Pickerington, Ohio (Pickerington North)

15 elliott ezekiel Elliott Ezekiel RB 5 6-0 225 5 JR St. Louis, Mo. (John Burroughs)

38 fada craig Fada Craig LB 7 6-1 230 7 SR Columbus, Ohio (Bishop Watterson)

57 farris chase Farris Chase OL 7 6-5 310 7 SR Elyria, Ohio (Elyria)

87 ferrelli guy Ferrelli Guy TE 3 6-1 245 3 SO Columbus, Ohio (Bishop Ready)

70 fong chris Fong Chris DE 7 6-2 260 7 SR Troy, Ohio (Troy)

21 forte trevon Forte Trevon CB 3 5-8 175 3 SO Youngstown, Ohio (Cardinal Mooney)

3 franklin khaleed Franklin Khaleed SAF 5 6-1 215 5 JR Columbus, Ohio (Beechcroft)

61 gaskey logan Gaskey Logan OL 5 6-4 295 5 JR Long Grove, Ill. (Adlai E. Stevenson)

89 greene jeff Greene Jeff WR 7 6-5 220 7 SR Peachtree City, Ga. (Starr's Mill)

51 hale joel Hale Joel OL 7 6-4 295 7 SR Greenwood, Ind. (Center Grove)

41 haynes bryce Haynes Bryce LS 7 6-4 225 7 SR Cumming, Ga. (Pinecrest Academy)

77 hill michael Hill Michael DL 3 6-3 295 3 SO Pendleton, S.C. (Pendleton)

10 holmes jalyn Holmes Jalyn DL 3 6-5 265 3 SO Norfolk, Va. (Lake Taylor)

24 hooker malik Hooker Malik SAF 1 6-2 205 1 FR New Castle, Pa. (New Castle)

49 hubbard sam Hubbard Sam DE 1 6-5 265 1 FR Cincinnati, Ohio (Archbishop Moeller)

95 johnston cameron Johnston Cameron P 5 5-11 195 5 JR Geelong, Australia (St. Joseph's)

12 jones cardale Jones Cardale QB 5 6-5 250 5 JR Cleveland, Ohio (Glenville)

74 jones jamarco Jones Jamarco OL 3 6-5 310 3 SO Chicago (De La Salle)

64 jones marcelys Jones Marcelys OL 3 6-4 315 3 SO Cleveland, Ohio (Glenville)

78 knox demetrius Knox Demetrius OL 1 6-4 305 1 FR Fort Worth, Texas (All Saints Episcopal)

2 lattimore marshon Lattimore Marshon CB 1 6-0 195 1 FR Cleveland (Glenville)

43 lee darron Lee Darron LB 3 6-2 235 3 SO New Albany, Ohio (New Albany)

59 lewis tyquan Lewis Tyquan DL 3 6-4 260 3 SO Tarboro, N.C. (Tarboro)

75 lisle evan Lisle Evan OL 3 6-7 305 3 SO Centerville, Ohio (Centreville)

29 maduko mike Maduko Mike SAF 3 5-8 188 3 SO Naperville, Ill. (Montini Catholic)

17 marshall jalin Marshall Jalin H-B 3 5-11 205 3 SO Middletown, Ohio (Middletown)

42 mawhirter aaron Mawhirter Aaron LS 5 6-1 230 5 JR Sandusky, Ohio (Perkins)

20 mcdaniel devlin McDaniel Devlin WR 5 5-11 195 5 JR Marion, Ohio (Marion Pleasant)

83 mclaurin terry McLaurin Terry WR 3 6-1 200 3 SO Indianapolis (Cathedral)

5 mcmillan raekwon McMillan Raekwon LB 3 6-2 240 3 SO Hinesville, Ga. (Liberty County)

5 miller braxton Miller Braxton QB 7 6-2 215 7 SR Huber Heights, Ohio (Wayne)

18 mitchell kato Mitchell Kato WR 7 6-0 190 7 SR Cleveland, Ohio (John Hay)

44 morgan luke Morgan Luke QB 5 6-2 235 5 JR Lebanon, Ohio (Lebanon)

62 morris r.j. Morris R.J. OL 3 6-2 305 3 SO Naperville, Ill. (Naperville)

52 munger donovan Munger Donovan DL 3 6-4 300 3 SO Shaker Heights, Ohio (Shaker Heights)

96 nuernberger sean Nuernberger Sean K 3 6-1 220 3 SO Buckner, Ky. (Oldham Co.)

91 parry aaron Parry Aaron DL 5 6-5 275 5 JR Zanesville, Ohio (Bishop Rosecrans)

37 perry joshua Perry Joshua LB 7 6-4 254 7 SR Galena, Ohio (Olentangy)

23 powell tyvis Powell Tyvis SAF 5 6-3 210 5 JR Bedford, Ohio (Bedford)

54 price billy Price Billy OL 3 6-4 315 3 SO Austintown, Ohio (Fitch)

19 ramstetter joe Ramstetter Joe WR 5 6-3 225 5 JR Cincinnati, Ohio (Elder)

4 samuel curtis Samuel Curtis RB 3 5-11 200 3 SO Brooklyn, N.Y. (Erasmus Hall)

67 schmidt grant Schmidt Grant OL 1 6-6 300 1 FR Sioux Falls, S.D. (Roosevelt)

90 schutt tommy Schutt Tommy DL 7 6-3 290 7 SR Glen Ellyn, Ill. (Glenbard West)

42 slade darius Slade Darius DL 1 6-4 255 1 FR Montclair, N.J. (Montclair)

84 smith corey Smith Corey WR 7 6-1 195 7 SR Akron, Ohio (Akron Buchtel)

1 smith erick Smith Erick SAF 3 6-0 202 3 SO Cleveland, Ohio (Glenville)

93 sprinkle tracy Sprinkle Tracy DL 3 6-3 290 3 SO Elyria, Ohio (Elyria)

79 taylor brady Taylor Brady OL 1 6-5 300 1 FR Columbus, Ohio (Bishop Ready)

15 tensing nick Tensing Nick QB 3 6-5 215 3 SO Cincinnati (St. Xavier)

3 thomas michael Thomas Michael WR 5 6-3 210 5 JR Los Angeles, Calif. (Woodland Hills Taft)

94 thompson dylan Thompson Dylan DL 1 6-5 275 1 FR Lombard, Ill. (Montini Catholic)

71 trout kyle Trout Kyle OL 1 6-6 310 1 FR Lancaster, Ohio (Lancaster)

36 turnure zach Turnure Zach LB 5 6-1 235 5 JR St. Louis, Mo. (CBC)

81 vannett nick Vannett Nick TE 7 6-6 260 7 SR Westerville, Ohio (Central)

92 washington adolphus Washington Adolphus DL 7 6-4 290 7 SR Cincinnati, Ohio (Taft)

7 webb damon Webb Damon CB 3 5-11 193 3 SO Detroit, Mich (Cass Tech)

55 williams camren Williams Camren LB 7 6-1 225 7 SR West Roxbury, Mass. (Catholic Memorial)

2 wilson dontre Wilson Dontre H-B 5 5-10 195 5 JR DeSoto, Texas (DeSoto)

35 worley chris Worley Chris LB 3 6-2 225 3 SO Cleveland, Ohio (Glenville)

Notes:

  • high school should be a separate column
  • height needs a format converter to SI units

Suggested Reform:

numbers  last  first  positions  


Wait--answers are in--rosters come from web--so very customized scrapers are probably the best answer.


In [ ]: