SeekWell

SeekWell is a package for quickly and easily querying SQL databases in Python. It was made with data analysts in mind and plays well with Jupyter notebooks. This notebook is a little tutorial to get you started working with SQL databases in under 5 minutes.

SeekWell is a higher level library built on top of SQLAlchemy, an amazing library that does all the heavy lifting. SeekWell is designed to get our of your way and focus on retrieving the data you want from your database.

Query results are retrieved in a lazy manner. That is, they aren't returned until you ask for them. Records are cached once you get them, so you only ever run the query once. SeekWell also provides methods for inspecting the tables and columns in your database.


In [1]:
from seekwell import Database

Connect to the database

SeekWell uses SQLAlchemy underneath to connect to literally any database you can throw at it. You just need the appropriate engines, for example, psychopg for PostGres. Make sure to check out SQLAlchemy's documentation for connecting to databases.

In SeekWell, you just need to import the Database class and create a new Database object. The path required is defined by SQLAlchemy (link to documentation here).

Here I'll load a database of European soccer matches, teams, and players available from Kaggle Datasets.


In [2]:
db = Database('sqlite:///database.sqlite')

I've found database introspection to be really useful, that is, listing out tables and columns. When connected to a Database, you can get a list of tables.


In [3]:
db.table_names


Out[3]:
['Country',
 'League',
 'country',
 'Match',
 'Player',
 'Team',
 'Player_Attributes',
 'Team_Attributes',
 'sqlite_sequence']

And you can get a table to inspect it.


In [4]:
table = db['Team']
table


Out[4]:
Table(Team, Database('sqlite:///database.sqlite'))

In [5]:
table.column_names


Out[5]:
['id', 'team_api_id', 'team_fifa_api_id', 'team_long_name', 'team_short_name']

To check out the data in a table, use the head method to print out the first few rows.


In [6]:
table.head()


Out[6]:
id team_api_id team_fifa_api_id team_long_name team_short_name
1 9987 673 KRC Genk GEN
2 9993 675 Beerschot AC BAC
3 10000 15005 SV Zulte-Waregem ZUL
4 9994 2007 Sporting Lokeren LOK
5 9984 1750 KSV Cercle Brugge CEB
6 8635 229 RSC Anderlecht AND
7 9991 674 KAA Gent GEN
8 9998 1747 RAEC Mons MON
9 7947 None FCV Dender EH DEN
10 9985 232 Standard de Liège STL

In a notebook, rows are printed out in an HTML table for nice viewing. In the terminal, rows are printed out as an ASCII table.


In [7]:
print(table.head())


 id | team_api_id | team_fifa_api_id | team_long_name    | team_short_name 
----+-------------+------------------+-------------------+-----------------
 1  | 9987        | 673              | KRC Genk          | GEN             
 2  | 9993        | 675              | Beerschot AC      | BAC             
 3  | 10000       | 15005            | SV Zulte-Waregem  | ZUL             
 4  | 9994        | 2007             | Sporting Lokeren  | LOK             
 5  | 9984        | 1750             | KSV Cercle Brugge | CEB             
 6  | 8635        | 229              | RSC Anderlecht    | AND             
 7  | 9991        | 674              | KAA Gent          | GEN             
 8  | 9998        | 1747             | RAEC Mons         | MON             
 9  | 7947        | None             | FCV Dender EH     | DEN             
 10 | 9985        | 232              | Standard de Liège | STL             

If you're using a database with schemas, you can get a list of the schema names with db.schema_names.


In [8]:
db.schema_names


Out[8]:
['main']

Querying

There we go, now you're connected to the database and it's ready to be queried. Data analysts are all about getting data and workign with it. Queries are run through the Database object's query method. It accepts a SQL statement as a string and returns a Records object.


In [9]:
records = db.query('SELECT * from Player limit 50')

The data isn't returned immediately, only when you request it.


In [10]:
records


Out[10]:
id player_api_id player_name player_fifa_api_id birthday height weight

Use fetch to get the data. Calling fetch without any arguments will return all the rows. Passing in a number will return that many rows.


In [11]:
records.fetch(10)


Out[11]:
id player_api_id player_name player_fifa_api_id birthday height weight
1 505942 Aaron Appindangoye 218353 1992-02-29 00:00:00 182.88 187
2 155782 Aaron Cresswell 189615 1989-12-15 00:00:00 170.18 146
3 162549 Aaron Doran 186170 1991-05-13 00:00:00 170.18 163
4 30572 Aaron Galindo 140161 1982-05-08 00:00:00 182.88 198
5 23780 Aaron Hughes 17725 1979-11-08 00:00:00 182.88 154
6 27316 Aaron Hunt 158138 1986-09-04 00:00:00 182.88 161
7 564793 Aaron Kuhl 221280 1996-01-30 00:00:00 172.72 146
8 30895 Aaron Lennon 152747 1987-04-16 00:00:00 165.1 139
9 528212 Aaron Lennox 206592 1993-02-19 00:00:00 190.5 181
10 101042 Aaron Meijers 188621 1987-10-28 00:00:00 175.26 170

The data is cached in records.rows


In [12]:
records.rows


Out[12]:
[(1, 505942, 'Aaron Appindangoye', 218353, '1992-02-29 00:00:00', 182.88, 187),
 (2, 155782, 'Aaron Cresswell', 189615, '1989-12-15 00:00:00', 170.18, 146),
 (3, 162549, 'Aaron Doran', 186170, '1991-05-13 00:00:00', 170.18, 163),
 (4, 30572, 'Aaron Galindo', 140161, '1982-05-08 00:00:00', 182.88, 198),
 (5, 23780, 'Aaron Hughes', 17725, '1979-11-08 00:00:00', 182.88, 154),
 (6, 27316, 'Aaron Hunt', 158138, '1986-09-04 00:00:00', 182.88, 161),
 (7, 564793, 'Aaron Kuhl', 221280, '1996-01-30 00:00:00', 172.72, 146),
 (8, 30895, 'Aaron Lennon', 152747, '1987-04-16 00:00:00', 165.1, 139),
 (9, 528212, 'Aaron Lennox', 206592, '1993-02-19 00:00:00', 190.5, 181),
 (10, 101042, 'Aaron Meijers', 188621, '1987-10-28 00:00:00', 175.26, 170)]

You can get rows using slices too.


In [13]:
records[5:15]


Out[13]:
id player_api_id player_name player_fifa_api_id birthday height weight
6 27316 Aaron Hunt 158138 1986-09-04 00:00:00 182.88 161
7 564793 Aaron Kuhl 221280 1996-01-30 00:00:00 172.72 146
8 30895 Aaron Lennon 152747 1987-04-16 00:00:00 165.1 139
9 528212 Aaron Lennox 206592 1993-02-19 00:00:00 190.5 181
10 101042 Aaron Meijers 188621 1987-10-28 00:00:00 175.26 170
11 23889 Aaron Mokoena 47189 1980-11-25 00:00:00 182.88 181
12 231592 Aaron Mooy 194958 1990-09-15 00:00:00 175.26 150
13 163222 Aaron Muirhead 213568 1990-08-30 00:00:00 187.96 168
14 40719 Aaron Niguez 183853 1989-04-26 00:00:00 170.18 143
15 75489 Aaron Ramsey 186561 1990-12-26 00:00:00 177.8 154

In [14]:
records[-5:]


Out[14]:
id player_api_id player_name player_fifa_api_id birthday height weight
46 409003 Abdoulaye Keita 212280 1994-01-05 00:00:00 175.26 165
47 37280 Abdoulaye Meite 41745 1980-10-06 00:00:00 185.42 181
48 439366 Abdoulaye Toure 210450 1994-03-03 00:00:00 187.96 170
49 148827 Abdoulwahid Sissoko 189568 1990-03-20 00:00:00 182.88 165
50 173011 Abdourahman Dampha 197901 1991-12-27 00:00:00 182.88 168

Or get all the rows by calling fetch with no arguments...


In [15]:
records.fetch()


Out[15]:
id player_api_id player_name player_fifa_api_id birthday height weight
1 505942 Aaron Appindangoye 218353 1992-02-29 00:00:00 182.88 187
2 155782 Aaron Cresswell 189615 1989-12-15 00:00:00 170.18 146
3 162549 Aaron Doran 186170 1991-05-13 00:00:00 170.18 163
4 30572 Aaron Galindo 140161 1982-05-08 00:00:00 182.88 198
5 23780 Aaron Hughes 17725 1979-11-08 00:00:00 182.88 154
6 27316 Aaron Hunt 158138 1986-09-04 00:00:00 182.88 161
7 564793 Aaron Kuhl 221280 1996-01-30 00:00:00 172.72 146
8 30895 Aaron Lennon 152747 1987-04-16 00:00:00 165.1 139
9 528212 Aaron Lennox 206592 1993-02-19 00:00:00 190.5 181
10 101042 Aaron Meijers 188621 1987-10-28 00:00:00 175.26 170
11 23889 Aaron Mokoena 47189 1980-11-25 00:00:00 182.88 181
12 231592 Aaron Mooy 194958 1990-09-15 00:00:00 175.26 150
13 163222 Aaron Muirhead 213568 1990-08-30 00:00:00 187.96 168
14 40719 Aaron Niguez 183853 1989-04-26 00:00:00 170.18 143
15 75489 Aaron Ramsey 186561 1990-12-26 00:00:00 177.8 154
16 597948 Aaron Splaine 226014 1996-10-13 00:00:00 172.72 163
17 161644 Aaron Taylor-Sinclair 213569 1991-04-08 00:00:00 182.88 176
18 23499 Aaron Wilbraham 2335 1979-10-21 00:00:00 190.5 159
19 120919 Aatif Chahechouhe 187939 1986-07-02 00:00:00 175.26 150
20 46447 Abasse Ba 156626 1976-07-12 00:00:00 187.96 185
21 167027 Abdelaziz Barrada 192274 1989-06-19 00:00:00 177.8 161
22 245653 Abdelfettah Boukhriss 202425 1986-10-22 00:00:00 185.42 161
23 128456 Abdelhamid El Kaoutari 188145 1990-03-17 00:00:00 180.34 161
24 42664 Abdelkader Ghezzal 178063 1984-12-05 00:00:00 182.88 172
25 425950 Abdellah Zoubir 212934 1991-12-05 00:00:00 180.34 161
26 38423 Abdelmajid Oulmers 52782 1978-09-12 00:00:00 172.72 143
27 3264 Abdelmalek Cherrad 51868 1981-01-14 00:00:00 185.42 165
28 467485 Abdelmalek El Hasnaoui 209399 1994-02-09 00:00:00 180.34 159
29 306735 Abdelouahed Chakhsi 210504 1986-10-01 00:00:00 182.88 170
30 41659 Abderrazak Jadid 149241 1983-06-01 00:00:00 177.8 157
31 31684 Abdeslam Ouaddou 33022 1978-11-01 00:00:00 190.5 181
32 32637 Abdessalam Benjelloun 177295 1985-01-28 00:00:00 187.96 179
33 563215 Abdou Diallo 225711 1996-05-04 00:00:00 182.88 159
34 41093 Abdou Traore 187048 1988-01-17 00:00:00 180.34 174
35 564712 Abdoul Ba 225050 1994-02-08 00:00:00 200.66 212
36 67334 Abdoul Karim Yoda 188232 1988-10-25 00:00:00 182.88 161
37 173955 Abdoul Razzagui Camara 193953 1990-02-20 00:00:00 177.8 157
38 39562 Abdoulay Konko 161999 1984-03-09 00:00:00 182.88 157
39 191784 Abdoulaye Ba 204826 1991-01-01 00:00:00 198.12 174
40 210400 Abdoulaye Bamba 199313 1990-04-25 00:00:00 182.88 150
41 201915 Abdoulaye Diaby 202330 1991-05-21 00:00:00 172.72 154
42 194479 Abdoulaye Diallo Sadio,22 204171 1990-12-28 00:00:00 182.88 168
43 189181 Abdoulaye Diallo 197233 1992-03-30 00:00:00 187.96 174
44 352887 Abdoulaye Doucoure 208135 1993-01-01 00:00:00 182.88 165
45 40005 Abdoulaye Faye 100329 1978-02-26 00:00:00 187.96 218
46 409003 Abdoulaye Keita 212280 1994-01-05 00:00:00 175.26 165
47 37280 Abdoulaye Meite 41745 1980-10-06 00:00:00 185.42 181
48 439366 Abdoulaye Toure 210450 1994-03-03 00:00:00 187.96 170
49 148827 Abdoulwahid Sissoko 189568 1990-03-20 00:00:00 182.88 165
50 173011 Abdourahman Dampha 197901 1991-12-27 00:00:00 182.88 168

Your statements can of course be as complex as you want. Using parameters in statements is possible using keyword arguments. This uses the SQLAlchemy text syntax, so read up on it here. Below is an example using :home_team as a parameter to filter for the desired home team in the statement.


In [16]:
statement = """
SELECT Match.date, 
       Match.home_team_goal, Match.away_team_goal,
       home_team.team_long_name AS home_team, 
       away_team.team_long_name AS away_team
FROM Match
JOIN Team AS home_team
  ON Match.home_team_api_id=home_team.team_api_id
JOIN Team AS away_team
  ON Match.away_team_api_id=away_team.team_api_id
WHERE home_team=:home_team
ORDER BY Match.date ASC
"""
records = db.query(statement, home_team='KRC Genk')
records.fetch()[:20]


Out[16]:
date home_team_goal away_team_goal home_team away_team
2008-08-17 00:00:00 1 1 KRC Genk Beerschot AC
2008-08-30 00:00:00 1 0 KRC Genk Sporting Lokeren
2008-09-21 00:00:00 0 1 KRC Genk Club Brugge KV
2008-10-04 00:00:00 2 1 KRC Genk KV Mechelen
2008-10-26 00:00:00 0 0 KRC Genk Standard de Liège
2008-11-15 00:00:00 1 1 KRC Genk KSV Roeselare
2008-11-29 00:00:00 3 2 KRC Genk KSV Cercle Brugge
2008-12-13 00:00:00 1 0 KRC Genk Sporting Charleroi
2009-01-24 00:00:00 2 0 KRC Genk RAEC Mons
2009-02-07 00:00:00 1 2 KRC Genk SV Zulte-Waregem
2009-02-20 00:00:00 1 1 KRC Genk Royal Excel Mouscron
2009-03-07 00:00:00 4 3 KRC Genk FCV Dender EH
2009-03-21 00:00:00 3 0 KRC Genk Tubize
2009-04-04 00:00:00 1 4 KRC Genk KVC Westerlo
2009-04-17 00:00:00 2 2 KRC Genk KAA Gent
2009-05-03 00:00:00 0 1 KRC Genk KV Kortrijk
2009-05-16 00:00:00 0 2 KRC Genk RSC Anderlecht
2009-08-15 00:00:00 1 1 KRC Genk KAA Gent
2009-08-30 00:00:00 1 1 KRC Genk Beerschot AC
2009-09-18 00:00:00 1 2 KRC Genk Sporting Charleroi

Exporting

You can export your records as a CSV file or a Pandas DataFrame.


In [17]:
records.to_csv('KRC_Genk_games.csv')

In [18]:
df = records.to_pandas()
df


Out[18]:
date home_team_goal away_team_goal home_team away_team
0 2008-08-17 00:00:00 1 1 KRC Genk Beerschot AC
1 2008-08-30 00:00:00 1 0 KRC Genk Sporting Lokeren
2 2008-09-21 00:00:00 0 1 KRC Genk Club Brugge KV
3 2008-10-04 00:00:00 2 1 KRC Genk KV Mechelen
4 2008-10-26 00:00:00 0 0 KRC Genk Standard de Liège
5 2008-11-15 00:00:00 1 1 KRC Genk KSV Roeselare
6 2008-11-29 00:00:00 3 2 KRC Genk KSV Cercle Brugge
7 2008-12-13 00:00:00 1 0 KRC Genk Sporting Charleroi
8 2009-01-24 00:00:00 2 0 KRC Genk RAEC Mons
9 2009-02-07 00:00:00 1 2 KRC Genk SV Zulte-Waregem
10 2009-02-20 00:00:00 1 1 KRC Genk Royal Excel Mouscron
11 2009-03-07 00:00:00 4 3 KRC Genk FCV Dender EH
12 2009-03-21 00:00:00 3 0 KRC Genk Tubize
13 2009-04-04 00:00:00 1 4 KRC Genk KVC Westerlo
14 2009-04-17 00:00:00 2 2 KRC Genk KAA Gent
15 2009-05-03 00:00:00 0 1 KRC Genk KV Kortrijk
16 2009-05-16 00:00:00 0 2 KRC Genk RSC Anderlecht
17 2009-08-15 00:00:00 1 1 KRC Genk KAA Gent
18 2009-08-30 00:00:00 1 1 KRC Genk Beerschot AC
19 2009-09-18 00:00:00 1 2 KRC Genk Sporting Charleroi
20 2009-09-26 00:00:00 1 1 KRC Genk KSV Roeselare
21 2009-10-18 00:00:00 1 2 KRC Genk KV Mechelen
22 2009-11-02 00:00:00 2 0 KRC Genk KSV Cercle Brugge
23 2009-11-08 00:00:00 0 2 KRC Genk RSC Anderlecht
24 2009-12-05 00:00:00 0 0 KRC Genk KVC Westerlo
25 2009-12-19 00:00:00 2 2 KRC Genk SV Zulte-Waregem
26 2009-12-29 00:00:00 0 0 KRC Genk Sint-Truidense VV
27 2010-01-24 00:00:00 2 0 KRC Genk Club Brugge KV
28 2010-02-07 00:00:00 1 0 KRC Genk Standard de Liège
29 2010-02-19 00:00:00 3 1 KRC Genk Sporting Lokeren
... ... ... ... ... ...
76 2014-08-02 00:00:00 1 1 KRC Genk KSV Cercle Brugge
77 2014-08-08 00:00:00 0 0 KRC Genk Sporting Lokeren
78 2014-08-22 00:00:00 3 2 KRC Genk KAA Gent
79 2014-09-14 00:00:00 1 1 KRC Genk Club Brugge KV
80 2014-09-27 00:00:00 1 1 KRC Genk Sporting Charleroi
81 2014-10-19 00:00:00 3 1 KRC Genk KVC Westerlo
82 2014-10-28 00:00:00 3 0 KRC Genk Lierse SK
83 2014-11-09 00:00:00 0 2 KRC Genk Standard de Liège
84 2014-11-23 00:00:00 3 0 KRC Genk KV Mechelen
85 2014-12-13 00:00:00 3 0 KRC Genk KV Kortrijk
86 2015-01-23 00:00:00 2 0 KRC Genk Royal Excel Mouscron
87 2015-02-03 00:00:00 1 1 KRC Genk KV Oostende
88 2015-02-08 00:00:00 1 0 KRC Genk Waasland-Beveren
89 2015-02-22 00:00:00 0 1 KRC Genk RSC Anderlecht
90 2015-03-07 00:00:00 3 2 KRC Genk SV Zulte-Waregem
91 2015-07-25 00:00:00 3 1 KRC Genk Oud-Heverlee Leuven
92 2015-08-15 00:00:00 2 1 KRC Genk KVC Westerlo
93 2015-08-28 00:00:00 2 0 KRC Genk Sporting Charleroi
94 2015-09-18 00:00:00 3 1 KRC Genk KV Mechelen
95 2015-10-04 00:00:00 3 1 KRC Genk Standard de Liège
96 2015-10-23 00:00:00 0 4 KRC Genk Royal Excel Mouscron
97 2015-10-30 00:00:00 0 2 KRC Genk Sporting Lokeren
98 2015-11-28 00:00:00 0 1 KRC Genk KAA Gent
99 2015-12-06 00:00:00 0 0 KRC Genk RSC Anderlecht
100 2015-12-19 00:00:00 3 0 KRC Genk Sint-Truidense VV
101 2016-01-15 00:00:00 2 1 KRC Genk SV Zulte-Waregem
102 2016-01-30 00:00:00 1 0 KRC Genk KV Kortrijk
103 2016-02-13 00:00:00 6 1 KRC Genk Waasland-Beveren
104 2016-02-28 00:00:00 3 2 KRC Genk Club Brugge KV
105 2016-03-13 00:00:00 4 1 KRC Genk KV Oostende

106 rows × 5 columns


In [19]:
df = df.assign(point_diff=(df['home_team_goal'] - df['away_team_goal']))
df.groupby('away_team')['point_diff'].agg({'wins': lambda x: sum(x>0),
                                           'losses': lambda x: sum(x<0),
                                           'ties': lambda x: sum(x==0)})


Out[19]:
wins losses ties
away_team
Beerschot AC 3 0 2
Club Brugge KV 5 1 1
FCV Dender EH 1 0 0
KAA Gent 3 2 2
KAS Eupen 1 0 0
KSV Cercle Brugge 4 0 2
KSV Roeselare 0 0 2
KV Kortrijk 4 1 2
KV Mechelen 5 1 1
KV Oostende 1 0 1
KVC Westerlo 3 2 1
Lierse SK 4 0 0
Oud-Heverlee Leuven 2 0 1
RAEC Mons 3 0 0
RSC Anderlecht 0 6 1
Royal Excel Mouscron 1 1 1
SV Zulte-Waregem 4 1 2
Sint-Truidense VV 2 0 2
Sporting Charleroi 4 1 1
Sporting Lokeren 4 2 1
Standard de Liège 4 2 1
Tubize 1 0 0
Waasland-Beveren 2 0 1