Heroic Joins with SQL

In this lab, we will explore the Marvel Superhero Universe using the concept of SQL JOINS (in particular, we'll be using sqlite).

(from http://vignette1.wikia.nocookie.net/marveldatabase/images/e/e1/The_Marvel_Universe.png/revision/latest?cb=20110513164401)

There are a lot of superheroes in the Marvel Universe, and a lot of superhero movies, so we'll just be examining a small portion today. As you'll see, we are storing the information about the superheroes separately from the information about the superhero movies. In addition, since superheroes frequently use aliases rather than their real names, it makes sense that we might want to store those real names separately from their superhero personae.

When information is stored in separate tables, the way to recombine the information is to do a join. In SQL, JOIN is a means of combining fields from two tables by using values common to each.

We commonly use Venn diagrams to represent the behavior of SQL joins, such as this one:

Here are some additional resources that you may find helpful as you are learning about and practicing SQL JOINS (and using sqlite):

Get Connected

Ok, let's get started. The first thing we need to do is import sqlite:


In [ ]:
import sqlite3

Next, we need to establish a connection to the database. In this case, I have already built a database for you called "superheroes.db", and it will be your job to implement different kinds of joins in order to answer the questions below.


In [ ]:
DBPATH = 'dbmaker/superheroes.db'
conn = sqlite3.connect(DBPATH)

Here is what the database looks like:

Next we'll create a cursor object to help us execute our SQL statements:


In [ ]:
cursor = conn.cursor()

The JOIN statements we write will all follow roughly the same pattern that our SELECT statements followed in this morning's workshop. That is to say they will look something like:

cursor.execute("SELECT [information] FROM [first_table] [SOME JOIN] [second_table] ON firsttable.id = secondtable.id")
print cursor.fetchall()

Who's the star of the movie?

How can we query our database to determine who is the star of the movie? In this case, let's use LEFT JOIN to pair the title of each movie with the name of its main character.


In [ ]:
# Write the SQL code here

What's your real name?

We can also use LEFT JOIN to query our database to find out a superhero's real name. Let's pair the real name of each character with their superhero alias.


In [ ]:
# Write the SQL code here:

Who's actually a superhero?

Let's use an INNER JOIN to select only characters who have a superhero alias.


In [ ]:
# Write the SQL code here:

Which superheroes have their own movies?

Not all superheroes have their own movies. Let's use another INNER JOIN to select only superheroes who have a lead role in a movie.


In [ ]:
# Write the SQL code here:

What's in a name?

Which movies are named things other than (just) the name of the lead superhero? Let's experiment with using a LEFT OUTER JOIN to select only the movies that don't share the same name as the lead superhero. Hint: you will also need a WHERE clause!


In [ ]:
# Write the SQL code here:

...and which superheroes don't have movies named (strictly) after them? Let's use another LEFT OUTER JOIN combined with a WHERE clause to select only the superheroes that don't share a name with a movie title.


In [ ]:
# Write the SQL code here:

Stretch Goals

Right Joins and Full Outer Joins

Actually... you can't do them in sqlite

:(

If we were to use Postgres instead, what kinds of questions would we be able to ask of our superhero database using right and full outer joins?

Cartesian Products

There be dragons!

What is a Cartesian product and why can they be potentially dangerous when we are writing SQL queries?