In this lab, we will explore the Marvel Superhero Universe using the concept of SQL JOINS
(in particular, we'll be using sqlite
).
There are a lot of superheroes in the Marvel Universe, and a lot of superhero movies, so we'll just be examining a small portion today. As you'll see, we are storing the information about the superheroes separately from the information about the superhero movies. In addition, since superheroes frequently use aliases rather than their real names, it makes sense that we might want to store those real names separately from their superhero personae.
When information is stored in separate tables, the way to recombine the information is to do a join. In SQL, JOIN is a means of combining fields from two tables by using values common to each.
We commonly use Venn diagrams to represent the behavior of SQL joins, such as this one:
Here are some additional resources that you may find helpful as you are learning about and practicing SQL JOINS (and using sqlite
):
Ok, let's get started. The first thing we need to do is import sqlite
:
In [ ]:
import sqlite3
Next, we need to establish a connection to the database. In this case, I have already built a database for you called "superheroes.db", and it will be your job to implement different kinds of joins in order to answer the questions below.
In [ ]:
DBPATH = 'dbmaker/superheroes.db'
conn = sqlite3.connect(DBPATH)
Here is what the database looks like:
Next we'll create a cursor object to help us execute our SQL statements:
In [ ]:
cursor = conn.cursor()
The JOIN statements we write will all follow roughly the same pattern that our SELECT statements followed in this morning's workshop. That is to say they will look something like:
cursor.execute("SELECT [information] FROM [first_table] [SOME JOIN] [second_table] ON firsttable.id = secondtable.id")
print cursor.fetchall()
How can we query our database to determine who is the star of the movie? In this case, let's use LEFT JOIN to pair the title of each movie with the name of its main character.
In [ ]:
# Write the SQL code here
In [ ]:
# Write the SQL code here:
In [ ]:
# Write the SQL code here:
In [ ]:
# Write the SQL code here:
In [ ]:
# Write the SQL code here:
...and which superheroes don't have movies named (strictly) after them? Let's use another LEFT OUTER JOIN combined with a WHERE clause to select only the superheroes that don't share a name with a movie title.
In [ ]:
# Write the SQL code here:
Actually... you can't do them in sqlite
:(
If we were to use Postgres instead, what kinds of questions would we be able to ask of our superhero database using right and full outer joins?
There be dragons!
What is a Cartesian product and why can they be potentially dangerous when we are writing SQL queries?