Project_2: Read Chess Tournament Cross Table


In [1]:
import pandas as pd
import re
import sys
import numpy as np

Reading the text file into the Ipython Notebook


In [2]:
textfile = open('tournamentinfo.txt')
  • First i want to seperate the lines so i will strip them, and store them in a list using a list comprehension.

In [3]:
text_table = [line.strip() for line in textfile.readlines()]
  • Print out the list, to inspect how it's formated and changes that will need to be made. (Cleaning)

In [4]:
text_table


Out[4]:
['-----------------------------------------------------------------------------------------',
 'Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|',
 'Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |',
 '-----------------------------------------------------------------------------------------',
 '1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|',
 'ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|',
 'MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|',
 'MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|',
 'MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |',
 '-----------------------------------------------------------------------------------------',
 '5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|',
 'MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|',
 'OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|',
 'MI | 11146376 / R: 1649   ->1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |',
 '-----------------------------------------------------------------------------------------',
 '8 | EZEKIEL HOUGHTON                |5.0  |W   3|W  32|L  14|L   9|W  47|W  28|W  19|',
 'MI | 15142253 / R: 1641P17->1657P24  |N:3  |B    |W    |B    |W    |B    |W    |W    |',
 '-----------------------------------------------------------------------------------------',
 '9 | STEFANO LEE                     |5.0  |W  25|L  18|W  59|W   8|W  26|L   7|W  20|',
 'ON | 14954524 / R: 1411   ->1564     |N:2  |W    |B    |W    |B    |W    |B    |B    |',
 '-----------------------------------------------------------------------------------------',
 '10 | ANVIT RAO                       |5.0  |D  16|L  19|W  55|W  31|D   6|W  25|W  18|',
 'MI | 14150362 / R: 1365   ->1544     |N:3  |W    |W    |B    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '11 | CAMERON WILLIAM MC LEMAN        |4.5  |D  38|W  56|W   6|L   7|L   3|W  34|W  26|',
 'MI | 12581589 / R: 1712   ->1696     |N:3  |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3|',
 'MI | 12681257 / R: 1663   ->1670     |N:3  |W    |B    |W    |B    |     |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '13 | TORRANCE HENRY JR               |4.5  |W  36|W  27|L   7|D   5|W  33|L   3|W  32|',
 'MI | 15082995 / R: 1666   ->1662     |N:3  |B    |W    |B    |B    |W    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31|',
 'MI | 10131499 / R: 1610   ->1618     |N:3  |W    |B    |W    |W    |B    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '15 | ZACHARY JAMES HOUGHTON          |4.5  |D  19|L  16|W  30|L  22|W  54|W  33|W  38|',
 'MI | 15619130 / R: 1220P13->1416P20  |N:3  |B    |B    |W    |W    |B    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '16 | MIKE NIKITIN                    |4.0  |D  10|W  15|H    |W  39|L   2|W  36|U    |',
 'MI | 10295068 / R: 1604   ->1613     |N:3  |B    |W    |     |B    |W    |B    |     |',
 '-----------------------------------------------------------------------------------------',
 '17 | RONALD GRZEGORCZYK              |4.0  |W  48|W  41|L  26|L   2|W  23|W  22|L   5|',
 'MI | 10297702 / R: 1629   ->1610     |N:3  |W    |B    |W    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10|',
 'MI | 11342094 / R: 1600   ->1600     |N:3  |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '19 | DIPANKAR ROY                    |4.0  |D  15|W  10|W  52|D  28|W  18|L   4|L   8|',
 'MI | 14862333 / R: 1564   ->1570     |N:3  |W    |B    |W    |B    |W    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '20 | JASON ZHENG                     |4.0  |L  40|W  49|W  23|W  41|W  28|L   2|L   9|',
 'MI | 14529060 / R: 1595   ->1569     |N:4  |W    |B    |W    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6|',
 'ON | 15495066 / R: 1563P22->1562     |N:3  |B    |W    |B    |W    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '22 | EUGENE L MCCLURE                |4.0  |W  64|D  52|L  28|W  15|H    |L  17|W  40|',
 'MI | 12405534 / R: 1555   ->1529     |N:4  |W    |B    |W    |B    |     |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '23 | ALAN BUI                        |4.0  |L   4|W  43|L  20|W  58|L  17|W  37|W  46|',
 'ON | 15030142 / R: 1363   ->1371     |     |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '24 | MICHAEL R ALDRICH               |4.0  |L  28|L  47|W  43|L  25|W  60|W  44|W  39|',
 'MI | 13469010 / R: 1229   ->1300     |N:4  |B    |W    |B    |B    |W    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '25 | LOREN SCHWIEBERT                |3.5  |L   9|W  53|L   3|W  24|D  34|L  10|W  47|',
 'MI | 12486656 / R: 1745   ->1681     |N:4  |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '26 | MAX ZHU                         |3.5  |W  49|W  40|W  17|L   4|L   9|D  32|L  11|',
 'ON | 15131520 / R: 1579   ->1564     |N:4  |B    |W    |B    |W    |B    |W    |W    |',
 '-----------------------------------------------------------------------------------------',
 '27 | GAURAV GIDWANI                  |3.5  |W  51|L  13|W  46|W  37|D  14|L   6|U    |',
 'MI | 14476567 / R: 1552   ->1539     |N:4  |W    |B    |W    |B    |W    |B    |     |',
 '-----------------------------------------------------------------------------------------',
 '28 | SOFIA ADINA STANESCU-BELLU      |3.5  |W  24|D   4|W  22|D  19|L  20|L   8|D  36|',
 'MI | 14882954 / R: 1507   ->1513     |N:3  |W    |W    |B    |W    |B    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '29 | CHIEDOZIE OKORIE                |3.5  |W  50|D   6|L  38|L  34|W  52|W  48|U    |',
 'MI | 15323285 / R: 1602P6 ->1508P12  |N:4  |B    |W    |B    |W    |W    |B    |     |',
 '-----------------------------------------------------------------------------------------',
 '30 | GEORGE AVERY JONES              |3.5  |L  52|D  64|L  15|W  55|L  31|W  61|W  50|',
 'ON | 12577178 / R: 1522   ->1444     |     |W    |B    |B    |W    |W    |B    |B    |',
 '-----------------------------------------------------------------------------------------',
 '31 | RISHI SHETTY                    |3.5  |L  58|D  55|W  64|L  10|W  30|W  50|L  14|',
 'MI | 15131618 / R: 1494   ->1444     |     |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '32 | JOSHUA PHILIP MATHEWS           |3.5  |W  61|L   8|W  44|L  18|W  51|D  26|L  13|',
 'ON | 14073750 / R: 1441   ->1433     |N:4  |W    |B    |W    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '33 | JADE GE                         |3.5  |W  60|L  12|W  50|D  36|L  13|L  15|W  51|',
 'MI | 14691842 / R: 1449   ->1421     |     |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '34 | MICHAEL JEFFERY THOMAS          |3.5  |L   6|W  60|L  37|W  29|D  25|L  11|W  52|',
 'MI | 15051807 / R: 1399   ->1400     |     |B    |W    |B    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '35 | JOSHUA DAVID LEE                |3.5  |L  46|L  38|W  56|L   6|W  57|D  52|W  48|',
 'MI | 14601397 / R: 1438   ->1392     |     |W    |W    |B    |W    |B    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '36 | SIDDHARTH JHA                   |3.5  |L  13|W  57|W  51|D  33|H    |L  16|D  28|',
 'MI | 14773163 / R: 1355   ->1367     |N:4  |W    |B    |W    |B    |     |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '37 | AMIYATOSH PWNANANDAM            |3.5  |B    |L   5|W  34|L  27|H    |L  23|W  61|',
 'MI | 15489571 / R:  980P12->1077P17  |     |     |B    |W    |W    |     |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '38 | BRIAN LIU                       |3.0  |D  11|W  35|W  29|L  12|H    |L  18|L  15|',
 'MI | 15108523 / R: 1423   ->1439     |N:4  |W    |B    |W    |W    |     |B    |B    |',
 '-----------------------------------------------------------------------------------------',
 '39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24|',
 'MI | 12923035 / R: 1436P23->1413     |N:4  |B    |W    |B    |W    |B    |W    |W    |',
 '-----------------------------------------------------------------------------------------',
 '40 | FOREST ZHANG                    |3.0  |W  20|L  26|L  39|W  59|L  21|W  56|L  22|',
 'MI | 14892710 / R: 1348   ->1346     |     |B    |B    |W    |W    |B    |W    |W    |',
 '-----------------------------------------------------------------------------------------',
 '41 | KYLE WILLIAM MURPHY             |3.0  |W  59|L  17|W  58|L  20|X    |U    |U    |',
 'MI | 15761443 / R: 1403P5 ->1341P9   |     |B    |W    |B    |W    |     |     |     |',
 '-----------------------------------------------------------------------------------------',
 '42 | JARED GE                        |3.0  |L  12|L  50|L  57|D  60|D  61|W  64|W  56|',
 'MI | 14462326 / R: 1332   ->1256     |     |B    |W    |B    |B    |W    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '43 | ROBERT GLEN VASEY               |3.0  |L  21|L  23|L  24|W  63|W  59|L  46|W  55|',
 'MI | 14101068 / R: 1283   ->1244     |     |W    |B    |W    |W    |B    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '44 | JUSTIN D SCHILLING              |3.0  |B    |L  14|L  32|W  53|L  39|L  24|W  59|',
 'MI | 15323504 / R: 1199   ->1199     |     |     |W    |B    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '45 | DEREK YAN                       |3.0  |L   5|L  51|D  60|L  56|W  63|D  55|W  58|',
 'MI | 15372807 / R: 1242   ->1191     |     |W    |B    |W    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '46 | JACOB ALEXANDER LAVALLEY        |3.0  |W  35|L   7|L  27|L  50|W  64|W  43|L  23|',
 'MI | 15490981 / R:  377P3 ->1076P10  |     |B    |W    |B    |W    |B    |W    |W    |',
 '-----------------------------------------------------------------------------------------',
 '47 | ERIC WRIGHT                     |2.5  |L  18|W  24|L  21|W  61|L   8|D  51|L  25|',
 'MI | 12533115 / R: 1362   ->1341     |     |W    |B    |W    |B    |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '48 | DANIEL KHAIN                    |2.5  |L  17|W  63|H    |D  52|H    |L  29|L  35|',
 'MI | 14369165 / R: 1382   ->1335     |     |B    |W    |     |B    |     |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '49 | MICHAEL J MARTIN                |2.5  |L  26|L  20|D  63|D  64|W  58|H    |U    |',
 'MI | 12531685 / R: 1291P12->1259P17  |     |W    |W    |B    |W    |B    |     |     |',
 '-----------------------------------------------------------------------------------------',
 '50 | SHIVAM JHA                      |2.5  |L  29|W  42|L  33|W  46|H    |L  31|L  30|',
 'MI | 14773178 / R: 1056   ->1111     |     |W    |B    |W    |B    |     |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '51 | TEJAS AYYAGARI                  |2.5  |L  27|W  45|L  36|W  57|L  32|D  47|L  33|',
 'MI | 15205474 / R: 1011   ->1097     |     |B    |W    |B    |W    |B    |W    |W    |',
 '-----------------------------------------------------------------------------------------',
 '52 | ETHAN GUO                       |2.5  |W  30|D  22|L  19|D  48|L  29|D  35|L  34|',
 'MI | 14918803 / R:  935   ->1092     |N:4  |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '53 | JOSE C YBARRA                   |2.0  |H    |L  25|H    |L  44|U    |W  57|U    |',
 'MI | 12578849 / R: 1393   ->1359     |     |     |B    |     |W    |     |W    |     |',
 '-----------------------------------------------------------------------------------------',
 '54 | LARRY HODGE                     |2.0  |L  14|L  39|L  61|B    |L  15|L  59|W  64|',
 'MI | 12836773 / R: 1270   ->1200     |     |B    |B    |W    |     |W    |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '55 | ALEX KONG                       |2.0  |L  62|D  31|L  10|L  30|B    |D  45|L  43|',
 'MI | 15412571 / R: 1186   ->1163     |     |W    |B    |W    |B    |     |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '56 | MARISA RICCI                    |2.0  |H    |L  11|L  35|W  45|H    |L  40|L  42|',
 'MI | 14679887 / R: 1153   ->1140     |     |     |B    |W    |W    |     |B    |W    |',
 '-----------------------------------------------------------------------------------------',
 '57 | MICHAEL LU                      |2.0  |L   7|L  36|W  42|L  51|L  35|L  53|B    |',
 'MI | 15113330 / R: 1092   ->1079     |     |B    |W    |W    |B    |W    |B    |     |',
 '-----------------------------------------------------------------------------------------',
 '58 | VIRAJ MOHILE                    |2.0  |W  31|L   2|L  41|L  23|L  49|B    |L  45|',
 'MI | 14700365 / R:  917   -> 941     |     |W    |B    |W    |B    |W    |     |B    |',
 '-----------------------------------------------------------------------------------------',
 '59 | SEAN M MC CORMICK               |2.0  |L  41|B    |L   9|L  40|L  43|W  54|L  44|',
 'MI | 12841036 / R:  853   -> 878     |     |W    |     |B    |B    |W    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '60 | JULIA SHEN                      |1.5  |L  33|L  34|D  45|D  42|L  24|H    |U    |',
 'MI | 14579262 / R:  967   -> 984     |     |W    |B    |B    |W    |B    |     |     |',
 '-----------------------------------------------------------------------------------------',
 '61 | JEZZEL FARKAS                   |1.5  |L  32|L   3|W  54|L  47|D  42|L  30|L  37|',
 'ON | 15771592 / R:  955P11-> 979P18  |     |B    |W    |B    |W    |B    |W    |B    |',
 '-----------------------------------------------------------------------------------------',
 '62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    |',
 'MI | 15219542 / R: 1530   ->1535     |     |B    |     |     |     |     |     |     |',
 '-----------------------------------------------------------------------------------------',
 '63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |',
 'MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |',
 '-----------------------------------------------------------------------------------------',
 '64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|',
 'MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |',
 '-----------------------------------------------------------------------------------------']
  • I want to split up what's easy for me to take from the list, so i've created a few lists to store those in.

In [5]:
player_state = []

In [6]:
player_number = []

In [7]:
id_data = []

In [8]:
name_data = []
  • Using Regex to clean some of the data inside the table before we loop through it, so it can be easier to accesss the desired information from the table list we've created.

In [9]:
state = '([A-Z]{2})'

In [10]:
number = '([0-9]{1})'

In [11]:
dash = '^-'
  • Using a for-loop to parse through the table, i've split the rows seperately becaues they are formated differently, making it easier to work once on each unique row set.

In [12]:
for line in text_table:
    if not re.search(dash, line):
        if re.search(state, line) and not re.search(dash, line):
            text =  line.replace('/', '').replace('-','').replace('>','').replace(':','') 
            state_num = text.strip().replace('|', ',')[:3].strip(',')
            if re.search(state, state_num):
                player_state.append(state_num)
                id_data.append(text[5:].strip('').replace('|',','))
            elif re.search(number, state_num):
                name_data.append(text[4:].strip().replace('|',','))
                player_number.append(state_num)
  • Because i still need to process more information and because some cells are misformed with extra data, i need to run another forloop to reach a more detailed cleaning, I am also going to create some list of columns to store these values.

In [13]:
player_names = []
rounds = []
uscf_id = []
ratings = []
total_points = []

In [14]:
for item in name_data:
    total_points.append(float(item[33:36]))
    player_names.append(item[:28].split())
    rounds.append(item[39:].replace(' ', '' ).strip(',').split(','))
  • I want to set the max number of rows to 70 for DataFrames because there are a total of 64 I.D. and if there are more than set ammount it can be a sign that i've made a mistake.

In [15]:
pd.options.display.max_rows = 70

In [16]:
games_df = pd.DataFrame(rounds, columns = ['Round1','Round2','Round3','Round4','Round5','Round6','Round7'], 
                   index = [player for player in player_number])
  • I will de doing the same with the alternate rows and cleaning it in the process.

In [17]:
for item in id_data:
    uscf_id.append(item.replace('R', '').replace('P', ' ')[:8])
    ratings.append(item.replace('R', '').replace('P', ' ')[11:16])
  • Here I want to add the columns that the DataFrame needs so i do index and list comprehension for each particular one.

In [18]:
games_df['Ratings'] = [rate for rate in ratings]

In [19]:
games_df['Total_Points'] = [n for n in total_points]

In [20]:
games_df['Player_ID'] = [n for n in player_number]

In [21]:
games_df['State'] = [state for state in player_state]

In [22]:
games_df['Player_Names'] = [player for player in player_names]
  • The following is the use of indexing and mapping of cells containting letters and the process of removing them using map function and lambdas.

In [23]:
games_df['Round1'] = games_df['Round1'].map(lambda x: x.lstrip('W')).map(lambda x: x.lstrip('U'))
games_df['Round1'] = games_df['Round1'].map(lambda x: x.lstrip('D')).map(lambda x: x.lstrip('L'))
games_df['Round1'] = games_df['Round1'].map(lambda x: x.lstrip('H')).map(lambda x: x.lstrip('B'))
games_df['Round1'] = games_df['Round1'].map(lambda x: x.lstrip('X'))

In [24]:
games_df['Round2'] = games_df['Round2'].map(lambda x: x.lstrip('W')).map(lambda x: x.lstrip('U'))
games_df['Round2'] = games_df['Round2'].map(lambda x: x.lstrip('D')).map(lambda x: x.lstrip('L'))
games_df['Round2'] = games_df['Round2'].map(lambda x: x.lstrip('H')).map(lambda x: x.lstrip('B'))
games_df['Round2'] = games_df['Round2'].map(lambda x: x.lstrip('X'))

In [25]:
games_df['Round3'] = games_df['Round3'].map(lambda x: x.lstrip('W')).map(lambda x: x.lstrip('U'))
games_df['Round3'] = games_df['Round3'].map(lambda x: x.lstrip('D')).map(lambda x: x.lstrip('L'))
games_df['Round3'] = games_df['Round3'].map(lambda x: x.lstrip('H')).map(lambda x: x.lstrip('B'))
games_df['Round3'] = games_df['Round3'].map(lambda x: x.lstrip('X'))

In [26]:
games_df['Round4'] = games_df['Round4'].map(lambda x: x.lstrip('W')).map(lambda x: x.lstrip('U'))
games_df['Round4'] = games_df['Round4'].map(lambda x: x.lstrip('D')).map(lambda x: x.lstrip('L'))
games_df['Round4'] = games_df['Round4'].map(lambda x: x.lstrip('H')).map(lambda x: x.lstrip('B'))
games_df['Round4'] = games_df['Round4'].map(lambda x: x.lstrip('X'))

In [27]:
games_df['Round5'] = games_df['Round5'].map(lambda x: x.lstrip('W')).map(lambda x: x.lstrip('U'))
games_df['Round5'] = games_df['Round5'].map(lambda x: x.lstrip('D')).map(lambda x: x.lstrip('L'))
games_df['Round5'] = games_df['Round5'].map(lambda x: x.lstrip('H')).map(lambda x: x.lstrip('B'))
games_df['Round5'] = games_df['Round5'].map(lambda x: x.lstrip('X'))

In [28]:
games_df['Round6'] = games_df['Round6'].map(lambda x: x.lstrip('W')).map(lambda x: x.lstrip('U'))
games_df['Round6'] = games_df['Round6'].map(lambda x: x.lstrip('D')).map(lambda x: x.lstrip('L'))
games_df['Round6'] = games_df['Round6'].map(lambda x: x.lstrip('H')).map(lambda x: x.lstrip('B'))
games_df['Round6'] = games_df['Round6'].map(lambda x: x.lstrip('X'))

In [29]:
games_df['Round7'] = games_df['Round7'].map(lambda x: x.lstrip('W')).map(lambda x: x.lstrip('U'))
games_df['Round7'] = games_df['Round7'].map(lambda x: x.lstrip('D')).map(lambda x: x.lstrip('L'))
games_df['Round7'] = games_df['Round7'].map(lambda x: x.lstrip('H')).map(lambda x: x.lstrip('B'))
games_df['Round7'] = games_df['Round7'].map(lambda x: x.lstrip('X'))
  • I want to replace the empty string values with get.nan but because further along this notebook, i was presented with a ValueError i've returned and set them to 0. (Not sure if this is going to affect the data much), although some rows contain a number of unplayed games, and it may contribute to some difference in players with unplayed matches.

In [30]:
games_df['Round1'].replace('', 0, inplace=True)

In [31]:
games_df['Round2'].replace('', 0, inplace=True)

In [32]:
games_df['Round3'].replace('', 0, inplace=True)

In [33]:
games_df['Round4'].replace('', 0, inplace=True)

In [34]:
games_df['Round5'].replace('', 0, inplace=True)

In [35]:
games_df['Round6'].replace('', 0, inplace=True)

In [36]:
games_df['Round7'].replace('', 0, inplace=True)
  • Printing the DataFrame constantly throughout helps to visualize the changes and not make unnecessary changes to the object.

In [37]:
games_df


Out[37]:
Round1 Round2 Round3 Round4 Round5 Round6 Round7 Ratings Total_Points Player_ID State Player_Names
1 39 21 18 14 7 12 4 1794 6.0 1 ON [GARY, HUA]
2 63 58 4 17 16 20 7 1553 6.0 2 MI [DAKSHESH, DARURI]
3 8 61 25 21 11 13 12 1384 6.0 3 MI [ADITYA, BAJAJ]
4 23 28 2 26 5 19 1 1716 5.5 4 MI [PATRICK, H, SCHILLING]
5 45 37 12 13 4 14 17 1655 5.5 5 MI [HANSHI, ZUO]
6 34 29 11 35 10 27 21 1686 5.0 6 OH [HANSEN, SONG]
7 57 46 13 11 1 9 2 1649 5.0 7 MI [GARY, DEE, SWATHELL]
8 3 32 14 9 47 28 19 1641 5.0 8 MI [EZEKIEL, HOUGHTON]
9 25 18 59 8 26 7 20 1411 5.0 9 ON [STEFANO, LEE]
10 16 19 55 31 6 25 18 1365 5.0 10 MI [ANVIT, RAO]
11 38 56 6 7 3 34 26 1712 4.5 11 MI [CAMERON, WILLIAM, MC, LEMAN]
12 42 33 5 38 0 1 3 1663 4.5 12 MI [KENNETH, J, TACK]
13 36 27 7 5 33 3 32 1666 4.5 13 MI [TORRANCE, HENRY, JR]
14 54 44 8 1 27 5 31 1610 4.5 14 MI [BRADLEY, SHAW]
15 19 16 30 22 54 33 38 1220 4.5 15 MI [ZACHARY, JAMES, HOUGHTON]
16 10 15 0 39 2 36 0 1604 4.0 16 MI [MIKE, NIKITIN]
17 48 41 26 2 23 22 5 1629 4.0 17 MI [RONALD, GRZEGORCZYK]
18 47 9 1 32 19 38 10 1600 4.0 18 MI [DAVID, SUNDEEN]
19 15 10 52 28 18 4 8 1564 4.0 19 MI [DIPANKAR, ROY]
20 40 49 23 41 28 2 9 1595 4.0 20 MI [JASON, ZHENG]
21 43 1 47 3 40 39 6 1563 4.0 21 ON [DINH, DANG, BUI]
22 64 52 28 15 0 17 40 1555 4.0 22 MI [EUGENE, L, MCCLURE]
23 4 43 20 58 17 37 46 1363 4.0 23 ON [ALAN, BUI]
24 28 47 43 25 60 44 39 1229 4.0 24 MI [MICHAEL, R, ALDRICH]
25 9 53 3 24 34 10 47 1745 3.5 25 MI [LOREN, SCHWIEBERT]
26 49 40 17 4 9 32 11 1579 3.5 26 ON [MAX, ZHU]
27 51 13 46 37 14 6 0 1552 3.5 27 MI [GAURAV, GIDWANI]
28 24 4 22 19 20 8 36 1507 0.5 28 MI [SOFIA, ADINA, STANESCUBELLU]
29 50 6 38 34 52 48 0 1602 3.5 29 MI [CHIEDOZIE, OKORIE]
30 52 64 15 55 31 61 50 1522 3.5 30 ON [GEORGE, AVERY, JONES]
31 58 55 64 10 30 50 14 1494 3.5 31 MI [RISHI, SHETTY]
32 61 8 44 18 51 26 13 1441 3.5 32 ON [JOSHUA, PHILIP, MATHEWS]
33 60 12 50 36 13 15 51 1449 3.5 33 MI [JADE, GE]
34 6 60 37 29 25 11 52 1399 3.5 34 MI [MICHAEL, JEFFERY, THOMAS]
35 46 38 56 6 57 52 48 1438 3.5 35 MI [JOSHUA, DAVID, LEE]
36 13 57 51 33 0 16 28 1355 3.5 36 MI [SIDDHARTH, JHA]
37 0 5 34 27 0 23 61 980 3.5 37 MI [AMIYATOSH, PWNANANDAM]
38 11 35 29 12 0 18 15 1423 3.0 38 MI [BRIAN, LIU]
39 1 54 40 16 44 21 24 1436 3.0 39 MI [JOEL, R, HENDON]
40 20 26 39 59 21 56 22 1348 3.0 40 MI [FOREST, ZHANG]
41 59 17 58 20 0 0 0 1403 3.0 41 MI [KYLE, WILLIAM, MURPHY]
42 12 50 57 60 61 64 56 1332 3.0 42 MI [JARED, GE]
43 21 23 24 63 59 46 55 1283 3.0 43 MI [ROBERT, GLEN, VASEY]
44 0 14 32 53 39 24 59 1199 3.0 44 MI [JUSTIN, D, SCHILLING]
45 5 51 60 56 63 55 58 1242 3.0 45 MI [DEREK, YAN]
46 35 7 27 50 64 43 23 377 3.0 46 MI [JACOB, ALEXANDER, LAVALLEY]
47 18 24 21 61 8 51 25 1362 2.5 47 MI [ERIC, WRIGHT]
48 17 63 0 52 0 29 35 1382 2.5 48 MI [DANIEL, KHAIN]
49 26 20 63 64 58 0 0 1291 2.5 49 MI [MICHAEL, J, MARTIN]
50 29 42 33 46 0 31 30 1056 2.5 50 MI [SHIVAM, JHA]
51 27 45 36 57 32 47 33 1011 2.5 51 MI [TEJAS, AYYAGARI]
52 30 22 19 48 29 35 34 935 2.5 52 MI [ETHAN, GUO]
53 0 25 0 44 0 57 0 1393 2.0 53 MI [JOSE, C, YBARRA]
54 14 39 61 0 15 59 64 1270 2.0 54 MI [LARRY, HODGE]
55 62 31 10 30 0 45 43 1186 2.0 55 MI [ALEX, KONG]
56 0 11 35 45 0 40 42 1153 2.0 56 MI [MARISA, RICCI]
57 7 36 42 51 35 53 0 1092 2.0 57 MI [MICHAEL, LU]
58 31 2 41 23 49 0 45 917 2.0 58 MI [VIRAJ, MOHILE]
59 41 0 9 40 43 54 44 853 2.0 59 MI [SEAN, M, MC, CORMICK]
60 33 34 45 42 24 0 0 967 1.5 60 MI [JULIA, SHEN]
61 32 3 54 47 42 30 37 955 1.5 61 ON [JEZZEL, FARKAS]
62 55 0 0 0 0 0 0 1530 1.0 62 MI [ASHWIN, BALAJI]
63 2 48 49 43 45 0 0 1175 1.0 63 MI [THOMAS, JOSEPH, HOSMER]
64 22 30 31 49 46 42 54 1163 1.0 64 MI [BEN, LI]
  • I have indexed to the Player Names to compare the data better to the original table.

In [38]:
games_df.set_index(['Player_Names'])


Out[38]:
Round1 Round2 Round3 Round4 Round5 Round6 Round7 Ratings Total_Points Player_ID State
Player_Names
[GARY, HUA] 39 21 18 14 7 12 4 1794 6.0 1 ON
[DAKSHESH, DARURI] 63 58 4 17 16 20 7 1553 6.0 2 MI
[ADITYA, BAJAJ] 8 61 25 21 11 13 12 1384 6.0 3 MI
[PATRICK, H, SCHILLING] 23 28 2 26 5 19 1 1716 5.5 4 MI
[HANSHI, ZUO] 45 37 12 13 4 14 17 1655 5.5 5 MI
[HANSEN, SONG] 34 29 11 35 10 27 21 1686 5.0 6 OH
[GARY, DEE, SWATHELL] 57 46 13 11 1 9 2 1649 5.0 7 MI
[EZEKIEL, HOUGHTON] 3 32 14 9 47 28 19 1641 5.0 8 MI
[STEFANO, LEE] 25 18 59 8 26 7 20 1411 5.0 9 ON
[ANVIT, RAO] 16 19 55 31 6 25 18 1365 5.0 10 MI
[CAMERON, WILLIAM, MC, LEMAN] 38 56 6 7 3 34 26 1712 4.5 11 MI
[KENNETH, J, TACK] 42 33 5 38 0 1 3 1663 4.5 12 MI
[TORRANCE, HENRY, JR] 36 27 7 5 33 3 32 1666 4.5 13 MI
[BRADLEY, SHAW] 54 44 8 1 27 5 31 1610 4.5 14 MI
[ZACHARY, JAMES, HOUGHTON] 19 16 30 22 54 33 38 1220 4.5 15 MI
[MIKE, NIKITIN] 10 15 0 39 2 36 0 1604 4.0 16 MI
[RONALD, GRZEGORCZYK] 48 41 26 2 23 22 5 1629 4.0 17 MI
[DAVID, SUNDEEN] 47 9 1 32 19 38 10 1600 4.0 18 MI
[DIPANKAR, ROY] 15 10 52 28 18 4 8 1564 4.0 19 MI
[JASON, ZHENG] 40 49 23 41 28 2 9 1595 4.0 20 MI
[DINH, DANG, BUI] 43 1 47 3 40 39 6 1563 4.0 21 ON
[EUGENE, L, MCCLURE] 64 52 28 15 0 17 40 1555 4.0 22 MI
[ALAN, BUI] 4 43 20 58 17 37 46 1363 4.0 23 ON
[MICHAEL, R, ALDRICH] 28 47 43 25 60 44 39 1229 4.0 24 MI
[LOREN, SCHWIEBERT] 9 53 3 24 34 10 47 1745 3.5 25 MI
[MAX, ZHU] 49 40 17 4 9 32 11 1579 3.5 26 ON
[GAURAV, GIDWANI] 51 13 46 37 14 6 0 1552 3.5 27 MI
[SOFIA, ADINA, STANESCUBELLU] 24 4 22 19 20 8 36 1507 0.5 28 MI
[CHIEDOZIE, OKORIE] 50 6 38 34 52 48 0 1602 3.5 29 MI
[GEORGE, AVERY, JONES] 52 64 15 55 31 61 50 1522 3.5 30 ON
[RISHI, SHETTY] 58 55 64 10 30 50 14 1494 3.5 31 MI
[JOSHUA, PHILIP, MATHEWS] 61 8 44 18 51 26 13 1441 3.5 32 ON
[JADE, GE] 60 12 50 36 13 15 51 1449 3.5 33 MI
[MICHAEL, JEFFERY, THOMAS] 6 60 37 29 25 11 52 1399 3.5 34 MI
[JOSHUA, DAVID, LEE] 46 38 56 6 57 52 48 1438 3.5 35 MI
[SIDDHARTH, JHA] 13 57 51 33 0 16 28 1355 3.5 36 MI
[AMIYATOSH, PWNANANDAM] 0 5 34 27 0 23 61 980 3.5 37 MI
[BRIAN, LIU] 11 35 29 12 0 18 15 1423 3.0 38 MI
[JOEL, R, HENDON] 1 54 40 16 44 21 24 1436 3.0 39 MI
[FOREST, ZHANG] 20 26 39 59 21 56 22 1348 3.0 40 MI
[KYLE, WILLIAM, MURPHY] 59 17 58 20 0 0 0 1403 3.0 41 MI
[JARED, GE] 12 50 57 60 61 64 56 1332 3.0 42 MI
[ROBERT, GLEN, VASEY] 21 23 24 63 59 46 55 1283 3.0 43 MI
[JUSTIN, D, SCHILLING] 0 14 32 53 39 24 59 1199 3.0 44 MI
[DEREK, YAN] 5 51 60 56 63 55 58 1242 3.0 45 MI
[JACOB, ALEXANDER, LAVALLEY] 35 7 27 50 64 43 23 377 3.0 46 MI
[ERIC, WRIGHT] 18 24 21 61 8 51 25 1362 2.5 47 MI
[DANIEL, KHAIN] 17 63 0 52 0 29 35 1382 2.5 48 MI
[MICHAEL, J, MARTIN] 26 20 63 64 58 0 0 1291 2.5 49 MI
[SHIVAM, JHA] 29 42 33 46 0 31 30 1056 2.5 50 MI
[TEJAS, AYYAGARI] 27 45 36 57 32 47 33 1011 2.5 51 MI
[ETHAN, GUO] 30 22 19 48 29 35 34 935 2.5 52 MI
[JOSE, C, YBARRA] 0 25 0 44 0 57 0 1393 2.0 53 MI
[LARRY, HODGE] 14 39 61 0 15 59 64 1270 2.0 54 MI
[ALEX, KONG] 62 31 10 30 0 45 43 1186 2.0 55 MI
[MARISA, RICCI] 0 11 35 45 0 40 42 1153 2.0 56 MI
[MICHAEL, LU] 7 36 42 51 35 53 0 1092 2.0 57 MI
[VIRAJ, MOHILE] 31 2 41 23 49 0 45 917 2.0 58 MI
[SEAN, M, MC, CORMICK] 41 0 9 40 43 54 44 853 2.0 59 MI
[JULIA, SHEN] 33 34 45 42 24 0 0 967 1.5 60 MI
[JEZZEL, FARKAS] 32 3 54 47 42 30 37 955 1.5 61 ON
[ASHWIN, BALAJI] 55 0 0 0 0 0 0 1530 1.0 62 MI
[THOMAS, JOSEPH, HOSMER] 2 48 49 43 45 0 0 1175 1.0 63 MI
[BEN, LI] 22 30 31 49 46 42 54 1163 1.0 64 MI
  • Below I've creating a series of the data i wish to analyze further and do computational processes on.

In [39]:
games_df.set_index(['Round1', 'Round2', 'Round3', 'Round4', 'Round5', 'Round6', 'Round7'])['Ratings']


Out[39]:
Round1  Round2  Round3  Round4  Round5  Round6  Round7
39      21      18      14      7       12      4         1794 
63      58      4       17      16      20      7         1553 
8       61      25      21      11      13      12        1384 
23      28      2       26      5       19      1         1716 
45      37      12      13      4       14      17        1655 
34      29      11      35      10      27      21        1686 
57      46      13      11      1       9       2         1649 
3       32      14      9       47      28      19        1641 
25      18      59      8       26      7       20        1411 
16      19      55      31      6       25      18        1365 
38      56      6       7       3       34      26        1712 
42      33      5       38      0       1       3         1663 
36      27      7       5       33      3       32        1666 
54      44      8       1       27      5       31        1610 
19      16      30      22      54      33      38        1220 
10      15      0       39      2       36      0         1604 
48      41      26      2       23      22      5         1629 
47      9       1       32      19      38      10        1600 
15      10      52      28      18      4       8         1564 
40      49      23      41      28      2       9         1595 
43      1       47      3       40      39      6         1563 
64      52      28      15      0       17      40        1555 
4       43      20      58      17      37      46        1363 
28      47      43      25      60      44      39        1229 
9       53      3       24      34      10      47        1745 
49      40      17      4       9       32      11        1579 
51      13      46      37      14      6       0         1552 
24      4       22      19      20      8       36        1507 
50      6       38      34      52      48      0         1602 
52      64      15      55      31      61      50        1522 
58      55      64      10      30      50      14        1494 
61      8       44      18      51      26      13        1441 
60      12      50      36      13      15      51        1449 
6       60      37      29      25      11      52        1399 
46      38      56      6       57      52      48        1438 
13      57      51      33      0       16      28        1355 
0       5       34      27      0       23      61         980 
11      35      29      12      0       18      15        1423 
1       54      40      16      44      21      24        1436 
20      26      39      59      21      56      22        1348 
59      17      58      20      0       0       0         1403 
12      50      57      60      61      64      56        1332 
21      23      24      63      59      46      55        1283 
0       14      32      53      39      24      59        1199 
5       51      60      56      63      55      58        1242 
35      7       27      50      64      43      23         377 
18      24      21      61      8       51      25        1362 
17      63      0       52      0       29      35        1382 
26      20      63      64      58      0       0         1291 
29      42      33      46      0       31      30        1056 
27      45      36      57      32      47      33        1011 
30      22      19      48      29      35      34         935 
0       25      0       44      0       57      0         1393 
14      39      61      0       15      59      64        1270 
62      31      10      30      0       45      43        1186 
0       11      35      45      0       40      42        1153 
7       36      42      51      35      53      0         1092 
31      2       41      23      49      0       45         917 
41      0       9       40      43      54      44         853 
33      34      45      42      24      0       0          967 
32      3       54      47      42      30      37         955 
55      0       0       0       0       0       0         1530 
2       48      49      43      45      0       0         1175 
22      30      31      49      46      42      54        1163 
Name: Ratings, dtype: object
  • Here I wanted to create a function that would loop through the data and return the desired result.
  • This function is indexing the cells in the columns to their respective indexed players rating and replacing them, after this replacement is made all the data is then computed and divided by the number of games played which returns a column with the players Average Pre Tournament Chess Rating of Opponents.

In [40]:
def parse_series(series):
    player_mapping = {}
    average_rating = []
    for index, row in series.iterrows():
        player_mapping[int(row['Player_ID'])] = int(row['Ratings'])
    for index, row in series.iterrows():
        
        if row[0] is not None:
            try:
                key1 = int(row.Round1)
                row.Round1 = player_mapping[key1]

                key2 = int(row.Round2)
                row.Round2 = player_mapping[key2]

                key3 = int(row.Round3)
                row.Round3 = player_mapping[key3]

                key4 = int(row.Round4)
                row.Round4 = player_mapping[key4]

                key5 = int(row.Round5)
                row.Round5 = player_mapping[key5]

                key6 = int(row.Round6)
                row.Round6 = player_mapping[key6]

                key7 = int(row.Round7)
                row.Round7 = player_mapping[key7]

                total = row.Round1 + row.Round2 + row.Round3 + row.Round4 + row.Round5 + row.Round6 + row.Round7
                games = len([row.Round1, row.Round2, row.Round3, row.Round4, row.Round5, row.Round6, row.Round7])
                average_rating.append(total/games)
            except KeyError:
                total = int(row.Round1) + int(row.Round2) + int(row.Round3) + int(row.Round4) + int(row.Round5) + int(row.Round6) + int(row.Round7)
                games = len([row.Round1, row.Round2, row.Round3, row.Round4, row.Round5, row.Round6, row.Round7])
                average_rating.append(total/games)
    return average_rating
  • Creating an object that will process the parse_series function and return its list values.

In [41]:
average_rate = parse_series(games_df)
  • Adding the column -Average_Rating- to the games_df DataFrame object.

In [42]:
games_df['Average_Rating'] = [n for n in average_rate]
  • As we can see, the Average Ratings have been computed and now applied to the original DataFrame Object, here is the top 5 rows.

In [43]:
games_df.head()


Out[43]:
Round1 Round2 Round3 Round4 Round5 Round6 Round7 Ratings Total_Points Player_ID State Player_Names Average_Rating
1 39 21 18 14 7 12 4 1794 6.0 1 ON [GARY, HUA] 1605
2 63 58 4 17 16 20 7 1553 6.0 2 MI [DAKSHESH, DARURI] 1469
3 8 61 25 21 11 13 12 1384 6.0 3 MI [ADITYA, BAJAJ] 1563
4 23 28 2 26 5 19 1 1716 5.5 4 MI [PATRICK, H, SCHILLING] 1573
5 45 37 12 13 4 14 17 1655 5.5 5 MI [HANSHI, ZUO] 1500
  • Here are te last 5 rows in the DataFrame object.

In [44]:
games_df.tail()


Out[44]:
Round1 Round2 Round3 Round4 Round5 Round6 Round7 Ratings Total_Points Player_ID State Player_Names Average_Rating
60 33 34 45 42 24 0 0 967 1.5 60 MI [JULIA, SHEN] 950
61 32 3 54 47 42 30 37 955 1.5 61 ON [JEZZEL, FARKAS] 1327
62 55 0 0 0 0 0 0 1530 1.0 62 MI [ASHWIN, BALAJI] 169
63 2 48 49 43 45 0 0 1175 1.0 63 MI [THOMAS, JOSEPH, HOSMER] 964
64 22 30 31 49 46 42 54 1163 1.0 64 MI [BEN, LI] 1263
  • Now since all computations are done, and the data that we want is in the DataFrame, we can create and send to CSV File using the to_csv method.

In [45]:
games_df.to_csv('Project_2_results', sep=',', encoding='utf-8')