This notebook will walk you through each step of creating a SQLite Database for the kaggle.com March Machine Learning Mania 2017 competition.

*In order to use this notebook you need to obtain the data from kaggle.com

Create a Database

1. Download DB Browser for SQLite at http://sqlitebrowser.org/ 
2. Open the program and make a New Database, call it Data.db

Import sqlite and pandas modules


In [1]:
import sqlite3 as sql
import pandas as pd

Make a connection to your database


In [2]:
# make a variable containing the file path (as a raw string) to your data.db file, below is an example
db = r'C:\Users\...\data.db'
# the variable conn contains the connection to your database
conn = sql.connect(db)

Create pandas dataframes for each .csv data file provided by kaggle.com, then add each dataframe to the database as a table in the database.


In [ ]:
# variables containing the file paths to the .csv data (this will vary depending on where you saved the files)
file1 = r'C:\Users\user\Desktop\Teams.csv'
file2 = r'C:\Users\user\Desktop\Seasons.csv'
file3 = r'C:\Users\user\Desktop\RegularSeasonCompactResults.csv'
file4 = r'C:\Users\user\Desktop\RegularSeasonDetailedResults.csv'
file5 = r'C:\Users\user\Desktop\TourneyCompactResults.csv'
file6 = r'C:\Users\user\Desktop\TourneyDetailedResults.csv'
file7 = r'C:\Users\user\Desktop\TourneySeeds.csv'
file8 = r'C:\Users\user\Desktop\TourneySlots.csv'

In [4]:
# list of files
files = [file1,file2, file3, file4, file5, file6, file7, file8]
# list of the future database Table names, with a place holder in index 0
table_names = ['Teams','Seasons','RegularSeasonCompactResults','RegularSeasonDetailedResults',
              'TourneyCompactResults', 'TourneyDetailedResults','TourneySeeds','TourneySlots']

In [5]:
# adds the csv data to the database as tables
for i in range(0,8):
    # create dataframe for each of the .csv files
    file_name = files[i]
    df = pd.read_csv(file_name)
    # adds the dataframe to the database
    table_name = table_names[i]
    df.to_sql(table_name, conn, if_exists='append', index=False)

In [6]:
# closes the connection to the database
conn.close()

That's it! You did it!