Setup MySQL Database for Tennis Predictions

Author: Carl Toews
Date: September 30, 2017
File: setup_mysql_database.ipynb

Description:
This notebook provides Python code to create a MySQL database from CSV files with data about professional tennis matches. The data comes from two sources: basic player and outcome information is from https://github.com/JeffSackmann/tennis_atp, while odds data are from http://www.tennis-data.co.uk/alldata. The code in this notebook should be run before running the learning algorithms in the notebook tennis_predictions, which assumes that the MySQL database has already been created.

Details:

Path and file names can be configured in the first cell below. As it stands, this notebook assumes that file names and directory structures are as reflected in this github site.

The CSV files within Sackmann dataset are as follows:

*  "atp_matches_xxxx.csv" (xxxx represents a year between 1968 and 2017)
*  "atp_matches_futures_xxxx.csv" (xxxx is a year between 1991 and 2017)
*  "atp_matches_qual_chall_xxxx.csv" (xxxx is a year between 1991 and 2017)

There is also a single CSV file with player info:

* "atp_players.csv"

and week-by-week rankings by decade, stored in files with names as follows:

*  "atp_rankings_xxs.csv" (xx represents a decade--70,80,90,00, or 10)

The rankings of the current decade are stored in

*  "atp_rankings_current.csv"

The CSV files within the directory "odds_data" are of the form

* "xxxx.csv" (xxxx a year between 2001 and 2017)

The notebook also assumes a MySQL database named "tennis" exists, and there there is a user called "testuser" with password "test623" who can create tables in this database. Again, username and password can be configured below.


Configure names

Pathnames, database names, usernames, and passwords configurable here:


In [116]:
# name of database
db_name = "tennis"
# name of db user 
username = "testuser"
# db password for db user
password = "test623"
# location of atp data files
atpfile_directory = "./data/tennis_atp-master/"
# location of odds data files
oddsfiles_directory = "./data/odds_data/"

Import packages

All file handing and mysql connection import commands are contained here.


In [117]:
import sqlalchemy # pandas-mysql interface library
import sqlalchemy.exc # exception handling
from   sqlalchemy import create_engine  # needed to define db interface
import glob # for file manipulation
import sys # for defining behavior under errors
from IPython.core.debugger import Tracer
import pdb

Establish database connection

Try to connect to the local mysql host. If successful, print out the MySQL version number, if unsuccessful, exit gracefully.


In [118]:
# create an engine for interacting with the MySQL database
try:
    eng_str = 'mysql+mysqldb://' + username + ':' + password + '@localhost/'
    engine = create_engine(eng_str)
    connection = engine.connect()
    version = connection.execute("SELECT VERSION()")
    print("Database version : ")
    print(version.fetchone())
    
# report what went wrong if this fails.    
except sqlalchemy.exc.DatabaseError as e:
    reason = e.message
    print("Error %s:" % (reason))
    sys.exit(1)

# close the connection
finally:            
    if connection:    
        connection.close()
    else:
        print("Failed to create connection.")


Database version : 
('5.7.16',)

Create "tennis" database if it doesn't already exist


In [4]:
# create an engine for interacting with the MySQL database

try:
    engine.execute("CREATE DATABASE IF NOT EXISTS " + db_name) #create db
    engine.execute("USE " + db_name) # select new db

    # report what went wrong if this fails.    
except sqlalchemy.exc.DatabaseError as e:
    reason = e.message
    print("Error %s:" % (reason))
    sys.exit(1)


/Users/ctoews/anaconda/envs/py3_6_2/lib/python3.6/site-packages/sqlalchemy/engine/default.py:470: Warning: (1007, "Can't create database 'tennis'; database exists")
  cursor.execute(statement, parameters)

Make table "matches"

This cell first checks to see if the table "matches" already exists. If it does, the code does nothing. If it does not, the code creates it, and then reads in the data from the relevant .csv files, the location of which is specified above. Note that there are three different match types: regular matches, qualifying matches, and futures matches. By default, all three get read into the table, though you can edit this behavior in the opening lines. by default, all three types of matches will be read into the table "matches"


In [101]:
table_name = "matches"
db_name = "tennis"
engine.execute("USE " + db_name + ";")

#all_match_types = ['atp_matches','atp_matches_qual_chall','atp_matches_futures']
all_match_types = ['atp_matches'] # focus for the moment on major tournaments

# start a transaction with the database 'tennis'
with engine.begin() as connection:
    query=("""SELECT count(*) FROM information_schema.TABLES WHERE 
    (TABLE_SCHEMA = '""" + db_name + """') AND (TABLE_NAME = '""" + 
    table_name + """');""")
    
    res = connection.execute(query)
    if (res.fetchone()[0]!=0):
        print("Table " + table_name + " already exists, skipping table creation.")
    else:
        connection.execute("CREATE TABLE matches (tourney_id VARCHAR(256));")
        connection.execute("ALTER TABLE matches ADD COLUMN tourney_name VARCHAR(256);")
        connection.execute("ALTER TABLE matches ADD COLUMN surface VARCHAR(256);")
        connection.execute("ALTER TABLE matches ADD COLUMN draw_size TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN tourney_level CHAR(1);")
        connection.execute("ALTER TABLE matches ADD COLUMN tourney_date DATE;")
        connection.execute("ALTER TABLE matches ADD COLUMN match_num SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_id MEDIUMINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_seed TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_entry VARCHAR(2);")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_name VARCHAR(256);")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_hand CHAR(1);")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_ht TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_ioc CHAR(3);")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_age DECIMAL(12,10);")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_rank SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN winner_rank_points SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_id MEDIUMINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_seed TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_entry VARCHAR(2);")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_name VARCHAR(256);") 
        connection.execute("ALTER TABLE matches ADD COLUMN loser_hand CHAR(1);")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_ht TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_ioc CHAR(3);")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_age DECIMAL(12,10);")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_rank SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN loser_rank_points SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN score VARCHAR(256);")
        connection.execute("ALTER TABLE matches ADD COLUMN best_of TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN round VARCHAR(4);")
        connection.execute("ALTER TABLE matches ADD COLUMN minutes SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_ace TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_df TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_svpt SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_1stIn SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_1stWon SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_2ndWon TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_SvGms TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_bpSaved TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN w_bpFaced TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_ace TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_df TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_svpt SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_1stIn SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_1stWon SMALLINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_2ndWon TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_SvGms TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_bpSaved TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN l_bpFaced TINYINT UNSIGNED;")
        connection.execute("ALTER TABLE matches ADD COLUMN match_type VARCHAR(256);")

        # loop through all three match types, and store data for each       
        for match_type in all_match_types:
            
            # store names of all .csv files for this match-type
            all_files = glob.glob(atpfile_directory + match_type + "_[0-9]*.csv")

            #all_files = ["./data/tennis_atp-master/atp_matches_2017.csv"]
            # load each file, and make sure missing data is stored as NULL
            for file in all_files:
                print(file)
                query = """LOAD DATA LOCAL INFILE '""" + file + \
                """' INTO TABLE matches \
                COLUMNS TERMINATED BY ',' \
                LINES TERMINATED BY '\n' \
                IGNORE 1 LINES \
                (@v0, @v1, @v2, @v3, @v4, @v5, @v6, @v7, @v8, @v9, @v10, @v11, @v12, @v13, @v14, @v15, @v16, @v17, @v18, @v19, @v20, @v21, @v22, @v23, @v24, @v25, @v26, @v27, @v28, @v29, @v30, @v31, @v32, @v33, @v34, @v35, @v36, @v37, @v38, @v39, @v40, @v41, @v42, @v43, @v44, @v45, @v46, @v47, @v48) \
                SET \
                tourney_id = nullif(@v0,''), \
                tourney_name = nullif(@v1,''), \
                surface = nullif(@v2,''), \
                draw_size = nullif(@v3,''), \
                tourney_level = nullif(@v4,''), \
                tourney_date = nullif(@v5,''), \
                match_num = nullif(@v6,''), \
                winner_id = nullif(@v7,''), \
                winner_seed = nullif(@v8,''), \
                winner_entry = nullif(@v9,''), \
                winner_name = nullif(@v10,''), \
                winner_hand = nullif(@v11,''), \
                winner_ht = nullif(@v12,''), \
                winner_ioc = nullif(@v13,''), \
                winner_age = nullif(@v14,''), \
                winner_rank = nullif(@v15,''), \
                winner_rank_points = nullif(@v16,''), \
                loser_id = nullif(@v17,''), \
                loser_seed = nullif(@v18,''), \
                loser_entry = nullif(@v19,''), \
                loser_name = nullif(@v20,''), \
                loser_hand = nullif(@v21,''), \
                loser_ht = nullif(@v22,''), \
                loser_ioc = nullif(@v23,''), \
                loser_age = nullif(@v24,''), \
                loser_rank = nullif(@v25,''), \
                loser_rank_points = nullif(@v26,''), \
                score = nullif(@v27,''), \
                best_of = nullif(@v28,''), \
                round = nullif(@v29,''), \
                minutes = nullif(@v30,''), \
                w_ace = nullif(@v31,''), \
                w_df = nullif(@v32,''), \
                w_svpt = nullif(@v33,''), \
                w_1stIn = nullif(@v34,''), \
                w_1stWon = nullif(@v35,''), \
                w_2ndWon = nullif(@v36,''), \
                w_SvGms = nullif(@v37,''), \
                w_bpSaved = nullif(@v38,''), \
                w_bpFaced = nullif(@v39,''), \
                l_ace = nullif(@v40,''), \
                l_df = nullif(@v41,''), \
                l_svpt = nullif(@v42,''), \
                l_1stIn = nullif(@v43,''), \
                l_1stWon = nullif(@v44,''), \
                l_2ndWon = nullif(@v45,''), \
                l_SvGms = nullif(@v46,''), \
                l_bpSaved = nullif(@v47,''), \
                l_bpFaced = nullif(@v48,''), \
                match_type = '""" + match_type + """' \
                ;"""

                connection.execute(query)


./data/tennis_atp-master/atp_matches_1968.csv
./data/tennis_atp-master/atp_matches_1969.csv
./data/tennis_atp-master/atp_matches_1970.csv
./data/tennis_atp-master/atp_matches_1971.csv
./data/tennis_atp-master/atp_matches_1972.csv
./data/tennis_atp-master/atp_matches_1973.csv
./data/tennis_atp-master/atp_matches_1974.csv
./data/tennis_atp-master/atp_matches_1975.csv
./data/tennis_atp-master/atp_matches_1976.csv
./data/tennis_atp-master/atp_matches_1977.csv
./data/tennis_atp-master/atp_matches_1978.csv
./data/tennis_atp-master/atp_matches_1979.csv
./data/tennis_atp-master/atp_matches_1980.csv
./data/tennis_atp-master/atp_matches_1981.csv
./data/tennis_atp-master/atp_matches_1982.csv
./data/tennis_atp-master/atp_matches_1983.csv
./data/tennis_atp-master/atp_matches_1984.csv
./data/tennis_atp-master/atp_matches_1985.csv
./data/tennis_atp-master/atp_matches_1986.csv
./data/tennis_atp-master/atp_matches_1987.csv
./data/tennis_atp-master/atp_matches_1988.csv
./data/tennis_atp-master/atp_matches_1989.csv
./data/tennis_atp-master/atp_matches_1990.csv
./data/tennis_atp-master/atp_matches_1991.csv
./data/tennis_atp-master/atp_matches_1992.csv
./data/tennis_atp-master/atp_matches_1993.csv
./data/tennis_atp-master/atp_matches_1994.csv
./data/tennis_atp-master/atp_matches_1995.csv
./data/tennis_atp-master/atp_matches_1996.csv
./data/tennis_atp-master/atp_matches_1997.csv
./data/tennis_atp-master/atp_matches_1998.csv
./data/tennis_atp-master/atp_matches_1999.csv
./data/tennis_atp-master/atp_matches_2000.csv
./data/tennis_atp-master/atp_matches_2001.csv
./data/tennis_atp-master/atp_matches_2002.csv
./data/tennis_atp-master/atp_matches_2003.csv
./data/tennis_atp-master/atp_matches_2004.csv
./data/tennis_atp-master/atp_matches_2005.csv
./data/tennis_atp-master/atp_matches_2006.csv
./data/tennis_atp-master/atp_matches_2007.csv
./data/tennis_atp-master/atp_matches_2008.csv
./data/tennis_atp-master/atp_matches_2009.csv
./data/tennis_atp-master/atp_matches_2010.csv
./data/tennis_atp-master/atp_matches_2011.csv
./data/tennis_atp-master/atp_matches_2012.csv
./data/tennis_atp-master/atp_matches_2013.csv
./data/tennis_atp-master/atp_matches_2014.csv
./data/tennis_atp-master/atp_matches_2015.csv
./data/tennis_atp-master/atp_matches_2016.csv
./data/tennis_atp-master/atp_matches_2017.csv

Make table "players"

This cell checks to see if the table "players" exists. If it does, the code does nothing, if it does not, the code creates it and populates it with the data in the files "players.csv".


In [6]:
# start a transaction with the "tennis" database
with engine.begin() as connection:

    # find out if the table "players" exists
    query="""\
    SELECT count(*) FROM information_schema.TABLES WHERE (TABLE_SCHEMA = '""" + db_name + """') AND (TABLE_NAME = 'players');\
    """
    res = connection.execute(query)

    # if "players does not exist, create it
    if (res.fetchone()[0]!=0):
        print("Table PLAYERS already exists, skipping table creation.")        
    else:
        connection.execute("CREATE TABLE players \
                     (player_id MEDIUMINT UNSIGNED, \
                     first_name VARCHAR(256), \
                     last_name VARCHAR(256), \
                     hand CHAR(1), \
                     birth_date DATE, \
                     country_code CHAR(3)) \
                     ;") 
        
        # read in the appropriate .csv files
        file = atpfile_directory + "atp_players.csv"
        query =  """LOAD DATA LOCAL INFILE '""" + file + """' INTO TABLE players \
                        COLUMNS TERMINATED BY ',' \
                        LINES TERMINATED BY '\n' \
                        (@v0, @v1, @v2, @v3, @v4, @v5) \
                        SET \
                        player_id = nullif(@v0,''), \
                        first_name = nullif(@v1,''), \
                        last_name = nullif(@v2,''), \
                        hand = nullif(@v3,''), \
                        birth_date = nullif(@v4,''), \
                        country_code = nullif(@v5,'')
                        ;"""
        connection.execute(query)


Table PLAYERS already exists, skipping table creation.

Make table "rankings"

This cell checks to see if the table "rankings" exists. If it does, the code does nothing, if it does not, the code creates it and populates it with the data in the relevant .csv files (see doc-string above for naming conventions.)


In [7]:
with engine.begin() as connection:
    query="""\
    SELECT count(*) FROM information_schema.TABLES WHERE (TABLE_SCHEMA = '""" + db_name + """') AND (TABLE_NAME = 'rankings');\
    """
    r1 = connection.execute(query)
    if (r1.fetchone()[0]!=0):
        print("Table RANKINGS already exists, skipping table creation.")
    else:
        connection.execute("CREATE TABLE rankings \
                     (ranking_date DATE, \
                     ranking SMALLINT UNSIGNED, \
                     player_id MEDIUMINT UNSIGNED, \
                     ranking_points SMALLINT UNSIGNED) \
                     ;") 
        # store in a list the names of all the relevant .csv files
        all_files = glob.glob("/Users/ctoews/Projects/Data_Science/Tennis/tennis_atp-master/atp_rankings_*.csv")

        # load each file, and make sure missing data is stored as NULL
        for file in all_files:
            query =  """LOAD DATA LOCAL INFILE '""" + file + """' INTO TABLE rankings \
                        COLUMNS TERMINATED BY ',' \
                        LINES TERMINATED BY '\n' \
                        (@v0, @v1, @v2, @v3, @v4) \
                        SET
                        ranking_date = nullif(@v0,''), \
                        ranking = nullif(@v1,''), \
                        player_id = nullif(@v2,''), \
                        ranking_points = nullif(@v3,'') \
                        ;"""
            connection.execute(query)


Table RANKINGS already exists, skipping table creation.

Make table "odds"

This cell checks to see if the table "odds" exists. If it does, the code does nothing, if it does not, the code creates it and populates it with the data in the relevant .csv files (see doc-string above for naming conventions.)


In [120]:
table_name = "odds"
db_name = "tennis"
engine.execute("USE " + db_name + ";")

with engine.begin() as connection:
    query=("""SELECT count(*) FROM information_schema.TABLES WHERE 
    (TABLE_SCHEMA = '""" + db_name + """') AND (TABLE_NAME = '""" + 
    table_name + """');""")
    
    res = connection.execute(query)
    if (res.fetchone()[0]!=0):
        print("Table " + table_name + " already exists, skipping table creation.")
    else:
        query=("""CREATE TABLE """ + table_name + """ 
        (ATP TINYINT UNSIGNED, 
         Location VARCHAR(256), 
         Tournament	VARCHAR(256),
         Date VARCHAR(256),
         Series VARCHAR(256),
         Court VARCHAR(256),
         Surface VARCHAR(256), 
         Round VARCHAR(256), 
         BestOf TINYINT UNSIGNED, 
         Winner VARCHAR(256), 
         Loser VARCHAR(256), 
         WRank SMALLINT UNSIGNED, 
         LRank SMALLINT UNSIGNED, 
         WPts SMALLINT UNSIGNED, 
         LPts SMALLINT UNSIGNED, 
         W1 TINYINT UNSIGNED, 
         L1 TINYINT UNSIGNED, 
         W2 TINYINT UNSIGNED, 
         L2 TINYINT UNSIGNED, 
         W3 TINYINT UNSIGNED,
         L3 TINYINT UNSIGNED, 
         W4 TINYINT UNSIGNED, 
         L4 TINYINT UNSIGNED, 
         W5 TINYINT UNSIGNED, 
         L5 TINYINT UNSIGNED, 
         Wsets TINYINT UNSIGNED, 
         Lsets TINYINT UNSIGNED, 
         Comment VARCHAR(256), 
         B365W DECIMAL(6,3),
         B365L DECIMAL(6,3),
         EXW DECIMAL(6,3),
         EXL DECIMAL(6,3),
         LBW DECIMAL(6,3),
         LBL DECIMAL(6,3),
         PSW DECIMAL(6,3),
         PSL DECIMAL(6,3),
         SJW DECIMAL(6,3),
         SJL DECIMAL(6,3),
         MaxW DECIMAL(6,3),
         MaxL DECIMAL(6,3),
         AvgW DECIMAL(6,3),
         AvgL DECIMAL(6,3),
         fname CHAR(4)) 
         ;""")
        connection.execute(query) 

        # store in a list the names of all the relevant .csv files
        all_files = glob.glob(oddsfiles_directory + "201[0-7].csv")
        
        # load each file, and make sure missing data is stored as NULL
        for file in all_files:
            fname = file.split('/')[-1].split('.')[0]
            #pdb.set_trace()
            print("using file ", file)
            query =  ("""LOAD DATA LOCAL INFILE '""" + file + """' INTO TABLE """ + table_name + 
            """ COLUMNS TERMINATED BY ',' 
            LINES TERMINATED BY '\r\n' 
            IGNORE 1 LINES 
            (@v0, @v1, @v2, @v3, @v4, @v5, @v6, @v7, @v8, @v9, @v10, @v11, @v12, @v13, @v14, @v15, @v16, @v17, @v18, @v19, @v20, @v21, @v22, @v23, @v24, @v25, @v26, @v27, @v28, @v29, @v30, @v31, @v32, @v33, @v34, @v35, @v36, @v37, @v38, @v39, @v40, @v41) 
            SET
            ATP = nullif(@v0,''),
            Location = nullif(@v1,''), 
            Tournament = nullif(@v2,''),
            Date = nullif(@v3,''),
            Series = nullif(@v4,''),
            Court = nullif(@v5,''),
            Surface = nullif(@v6,''), 
            Round = nullif(@v7,''), 
            BestOf = nullif(@v8,''), 
            Winner = nullif(@v9,''), 
            Loser = nullif(@v10,''),
            WRank = nullif(@v11,''),
            LRank = nullif(@v12,''), 
            WPts = nullif(@v13,''),
            LPts = nullif(@v14,''), 
            W1 = nullif(@v15,''),
            L1 = nullif(@v16,''), 
            W2 = nullif(@v17,''),
            L2 = nullif(@v18,''),
            W3 = nullif(@v19,''),
            L3 = nullif(@v20,''),
            W4 = nullif(@v21,''),
            L4 = nullif(@v22,''), 
            W5 = nullif(@v23,''),
            L5 = nullif(@v24,''),
            Wsets = nullif(@v25,''),
            Lsets = nullif(@v26,''),
            Comment = nullif(@v27,''),
            B365W = nullif(@v28,''),
            B365L = nullif(@v29,''),
            EXW = nullif(@v30,''),
            EXL = nullif(@v31,''),
            LBW = nullif(@v32,''),
            LBL = nullif(@v33,''),
            PSW = nullif(@v34,''),
            PSL = nullif(@v35,''),
            SJW = nullif(@v36,''),
            SJL = nullif(@v37,''),
            MaxW = nullif(@v38,''),
            MaxL = nullif(@v39,''),
            AvgW = nullif(@v40,''),
            AvgL = nullif(@v41,''),
            fname = '""" + fname + """';""")
            
            connection.execute(query)
            
        # modify date strings to be mysql "date" types
        query = """UPDATE """ + table_name + """ SET Date = str_to_date( Date, '%%m/%%d/%%Y');"""
        connection.execute(query)
        # change the type to "date", explicitly
        query = """ALTER TABLE """ + table_name + """ MODIFY COLUMN Date DATE;"""
        connection.execute(query)
        # add a primary key
        #query = "ALTER TABLE " + table_name + " ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY;"
        #connection.execute(query)


using file  ./data/odds_data/2010.csv
using file  ./data/odds_data/2011.csv
using file  ./data/odds_data/2012.csv
using file  ./data/odds_data/2013.csv
using file  ./data/odds_data/2014.csv
using file  ./data/odds_data/2015.csv
using file  ./data/odds_data/2016.csv
using file  ./data/odds_data/2017.csv

In [ ]: