Learning SQLAlchemy

Practical Examples

At this point, you're probably tired of my expounding on how to use databases and would probably like some examples on what they can do. I've put together what is hopefully a nice set of examples for you using the tools I've already presented in addition to a few more I'll throw into the mix.

A Full Database

I'm defining a full database here for your convenience. It will include all the tools that we've used so far and some hopefully useful examples of their use together as a functioning whole. For your first time through this, don't worry too much about the minutiae of each class. Just execute all the way through to the next section, so you can play around a bit with a full database. Later, you should definitely come back and check out all the new pieces of code and look them up in SQLAlchemy's documentation.



In [1]:

    
from sqlalchemy import create_engine, Column, Integer, String, Float
from sqlalchemy.ext.declarative import declarative_base
import math

engine = create_engine ('sqlite:///:memory:')
Base = declarative_base ()

# We're going to be using some trig functions in the database to do some great circle math, so we need to create these functions
raw_con = engine.raw_connection()
raw_con.create_function("cos", 1, math.cos)
raw_con.create_function("acos", 1, math.acos)

We'll now define a full Star object. We'll give it a convenient init method, so you can use RA and Dec tuples instead of floats. We'll also implement some great circle math, so that you can query based on proximity of stars to sky coordinates and other stars.



In [2]:

    
from sqlalchemy import func
from sqlalchemy.ext.hybrid import hybrid_property, hybrid_method

import numpy as np

class Star (Base):    
    __tablename__ = 'stars'
    
    id = Column (Integer, primary_key = True)

    name = Column (String, unique = True)
    ra = Column (Float)
    dec = Column (Float)
    cos_dec = Column (Float)
    sin_dec = Column (Float)
    
    def __init__ (self, *args, **kwargs):
        try:
            kwargs ["ra"] = float (kwargs ["ra"])
        except TypeError:
            kwargs ["ra"] = self.raToFloat (*(kwargs ["ra"]))
            
        try:
            kwargs ["dec"] = float (kwargs ["dec"])
        except TypeError:
            kwargs ["dec"] = self.decToFloat (*(kwargs ["dec"]))
            
        kwargs ["cos_dec"] = math.cos (kwargs ["dec"])
        kwargs ["sin_dec"] = math.sin (kwargs ["dec"])
        
        super (Star, self).__init__ (*args, **kwargs)

    def __repr__ (self):
        return "<Star Object %s at (RA=%f, DEC=%f)>" % (self.name, self.ra, self.dec)
    
    @classmethod
    def load (cls, session, data, tags = None, convert = False, **kwargs):
        filters = ('u', 'g', 'r', 'i', 'z')
        if tags is None:
            tags= []

        for datum in data:
            if convert:
                star = Star (name = str (datum ['objid']), ra = np.radians (datum ['ra']), dec = np.radians (datum ['dec']))
            else:
                star = Star (name = str (datum ['objid']), ra = datum ['ra'], dec = datum ['dec'])
            session.add (star)
            for tag in tags:
                star.tags.append (Tag.get (session, tag))
            for filter in filters:
                phot = Photometry (filter = filter, mag = datum [filter])
                phot.star = star

        session.commit ()
        
    @staticmethod
    def raToFloat (hours, minutes = 0.0, seconds = 0.0):
        return np.radians ((hours + minutes / 60.0 + seconds / 3600.0) * 360 / 24)
    
    @staticmethod
    def decToFloat (degrees, arcmins = 0.0, arcsecs = 0.0):
        return (np.radians (degrees + arcmins / 60.0 + arcsecs / 3600.0))
    
    @hybrid_method
    def greatCircleDistance (self, other):
        return math.acos ((self.cos_dec * other.cos_dec * math.cos (self.ra - other.ra) + self.sin_dec * other.sin_dec) * (1.0 - 2.e-16))

    @greatCircleDistance.expression
    def greatCircleDistance (cls, other):
        return func.acos ((cls.cos_dec * other.cos_dec * func.cos (cls.ra - other.ra) + cls.sin_dec * other.sin_dec) * (1.0 - 2.e-16))

We also implement a SkyLocation object. This is designed to hold an ra and dec and we can use it to calculate distances between fixed sky locations and individual stars. This can be used, for example, to calculate the angular separation between a star and the zenith to filter by airmass.



In [14]:

    
from sidereal import LatLon, SiderealTime, AltAz
from datetime import datetime
import numpy as np

class SkyLocation (object):
    
    def __init__ (self, ra, dec):
        try:
            self.ra = float (ra)
        except TypeError:
            self.ra = Star.raToFloat (*ra)
            
        try:
            self.dec = float (dec)
        except TypeError:
            self.dec = Star.decToFloat (*dec)
            
        self.cos_dec = math.cos (self.dec)
        self.sin_dec = math.sin (self.dec)
        
        super (SkyLocation, self).__init__ ()

    def greatCircleDistance (self, other):
        return math.acos (self.cos_dec * other.cos_dec * math.cos(self.ra - other.ra) + self.sin_dec * other.sin_dec)
    
    @classmethod
    def zenith (cls, latitude, longitude, time = None):
        """
        time should be a datetime object or None to use the current time
        """
        if time is None:
            time = datetime.now ()
        altaz = AltAz (math.pi / 2.0, 0.0)
        radec = altaz.raDec (SiderealTime.fromDatetime (time), LatLon (np.radians (latitude), np.radians (longitude)))
        return cls (radec.ra, radec.dec)

We also implement the Photometry class from the One to Many lesson to show some applications of the photometry class.



In [4]:

    
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship, backref

class Photometry (Base):    
    __tablename__ = 'photometry'

    id = Column (Integer, primary_key = True)

    filter = Column (String)
    mag = Column (Float)
    date = Column (Float)
    comments = Column (String)
    
    star_id = Column (Integer, ForeignKey("stars.id"))
    star = relationship ("Star", backref = backref ("photometry_set", 
                                   cascade="all, delete-orphan", lazy = "dynamic"))

We can also implement the Tag class from the Many to Many lesson to show some of its applications.



In [5]:

    
from sqlalchemy import Table
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.orm.exc import NoResultFound

star_tag = Table('star_tag', Base.metadata,
    Column('star_id', Integer, ForeignKey('stars.id')),
    Column('tag_id', Integer, ForeignKey('tags.id'))
)

class Tag (Base):
    __tablename__ = 'tags'
    
    id = Column(Integer, primary_key = True)
    tag = Column(String(50), nullable=False, unique=True)
    
    stars = relationship ('Star', secondary = star_tag, backref = 'tags')

    @classmethod
    def get (cls, session, tag):
        try:
            return session.query (cls).filter_by (tag = tag).one ()
        except NoResultFound:
            return cls (tag = tag)
        
    def __repr__ (self):
        return "<Tag %s>" % self.tag



In [6]:

    
from sqlalchemy.orm import sessionmaker

Base.metadata.create_all (engine)
Session = sessionmaker (bind = engine)

Populating the Database

We've now set up a database, and I'd like to show you a few neat things you can do with it, but first, we need to populate it. I have three datasets for you: some M3 photometry, some M13 photometry, and a list of the 10 brightest stars in the night sky. We're going to put all three into the database and look at some applications.



In [7]:

    
import numpy as np

session = Session ()

# Load the M3 dataset from a csv file using numpy, tag all the data as "M3"
Star.load (session, np.genfromtxt ("sdss-dr7-m3.csv", names = True, delimiter = ',', dtype = None), tags = ["M3"], convert = True)



In [8]:

    
import numpy as np

# Load the M13 dataset from a csv file using numpy, tag all the data as "M13"
Star.load (session, np.genfromtxt ("sdss-dr7-m13.csv", names = True, delimiter = ',', dtype = None), tags = ["M13"], convert = True)



In [9]:

    
# The 10 brightest stars in the night sky, tagged as "Bright"
sirius = Star (name = "Sirius", ra = (6, 45, 08.9), dec = (-16, 42, 58))
sirius.tags.append (Tag.get (session, "Bright"))
session.add (sirius)

canopus = Star (name = "Canopus", ra = (6, 23, 57.1), dec = (-52, 41, 45))
canopus.tags.append (Tag.get (session, "Bright"))
session.add (canopus)

rigil = Star (name = "Rigil Kentaurus", ra = (14, 39, 35.9), dec = (-60, 50, 7))
rigil.tags.append (Tag.get (session, "Bright"))
session.add (rigil)

arcturus = Star (name = "Arcturus", ra = (14, 15, 39.7), dec = (19, 10, 57))
arcturus.tags.append (Tag.get (session, "Bright"))
session.add (arcturus)

vega = Star (name = "Vega", ra = (18, 36, 56.3), dec = (38, 47, 1))
vega.tags.append (Tag.get (session, "Bright"))
session.add (vega)

capella = Star (name = "Capella", ra = (5, 16, 41.4), dec = (45, 59, 53))
capella.tags.append (Tag.get (session, "Bright"))
session.add (capella)

rigel = Star (name = "Rigel", ra = (5, 14, 32.3), dec = (-8, 12, 6))
rigel.tags.append (Tag.get (session, "Bright"))
session.add (rigel)

procyon = Star (name = "Procyon", ra = (7, 39, 18.1), dec = (5, 13, 30))
procyon.tags.append (Tag.get (session, "Bright"))
session.add (procyon)

achernar = Star (name = "Achernar", ra = (1, 37, 42.9), dec = (-57, 14, 12))
achernar.tags.append (Tag.get (session, "Bright"))
session.add (achernar)

betelgeuse = Star (name = "Betelgeuse", ra = (5, 55, 10.3), dec = (7, 24, 25))
betelgeuse.tags.append (Tag.get (session, "Bright"))
session.add (betelgeuse)

session.commit ()

Using the Database

Now that we've loaded the data, let's try to actually use it. For most of this, we'll actually be plotting the data, so let's import matplotlib and use the matplotlib magic to make figures appear inline



In [10]:

    
import matplotlib.pyplot as plt
%matplotlib inline

1. Plot All Targets on Sky

We can do a very simple query on the database to plot the location of every star on the sky. We'll color them by tag to make the tags more apparent. Here, we use a Mollweide projection, but you could get the same effect by just plotting RA and Dec.



In [11]:

    
import numpy as np

query = session.query (Star, Tag).join (Tag, Star.tags)
results = query.all ()

tagToColor = {"M13": "blue", "M3": "green", "Bright": "white"}

ras = np.array ([star.ra for star, tag in results])
decs = np.array ([star.dec for star, tag in results])
colors = np.array ([tagToColor [tag.tag] for star, tag in results])

fig = plt.figure ()
ax = plt.subplot (111, projection = "mollweide")

ax.scatter (ras - math.pi, decs, c = colors, s = 50)

ax.grid (True)









    



/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/matplotlib/projections/geo.py:489: RuntimeWarning: invalid value encountered in arcsin
  theta = np.arcsin(y / np.sqrt(2))

2. Filter by Dataset and by Functions of Color and Magnitude

We can use the tags to filter and color by dataset, and we can even add filters that are functions of both color and magnitude.

There are a few interesting features we're using here. The big one is aliasing. An alias is a reference to a table in the database, but occasionally, we may want to filter by more than one entry in a database, e.g. when filtering by color. To filter by color, we need two Photometry type objects, and the way we specify this is with aliases.



In [12]:

    
import numpy as np
from sqlalchemy.orm import aliased

# Set up the aliases to the Photometry object
photog = aliased (Photometry)
photor = aliased (Photometry)

# We'll need to set up a query that returns two photometry objects and a Tag; we join the Tag table to the Star table
query = session.query (photog, photor, Tag).join (Tag, Star.tags)
# We join the photog Photometry table to the Star table and then filter by forcing photog to be a "g" filter
query = query.join (photog, Star.photometry_set).filter (photog.filter == "g")
# We repeart for photor
query = query.join (photor, Star.photometry_set).filter (photor.filter == "r")
# We then construct a filter to filter based on color and magnitude
# Try commenting out this next line
query = query.filter (5 * (photog.mag - photor.mag) + 20 > photog.mag)

# We separate the query into one for each cluster
m3 = query.filter (Star.tags.contains (Tag.get (session, "M3"))).all ()
m13 = query.filter (Star.tags.contains (Tag.get (session, "M13"))).all ()

m3gmags = np.array ([photog.mag for photog, photor, tag in m3])
m3rmags = np.array ([photor.mag for photog, photor, tag in m3])

m13gmags = np.array ([photog.mag for photog, photor, tag in m13])
m13rmags = np.array ([photor.mag for photog, photor, tag in m13])

fig = plt.figure ()
ax = plt.subplot (111)

ax.scatter (m3gmags - m3rmags, m3gmags, s = 5, lw = 0, c = "green", label = "M3")
ax.scatter (m13gmags - m13rmags, m13gmags, s = 5, lw = 0, c = "blue", label = "M13")

ax.set_xlabel ("g - r")
ax.set_ylabel ("g")

ax.set_ylim ((25, 15))
ax.set_xlim ((0, 2))

ax.legend ()









    Out[12]:





<matplotlib.legend.Legend at 0x10a5a8ef0>

3. Querying for Visible Stars and Prioritizing by Airmass

We can use our database to query for any stars in our database that are currently observable, and we can prioritize them by airmass (or by whatever metric you desire). This could be useful in the construction of automated queues that choose the best targets on any given night.



In [15]:

    
import math

# We generate a SkyLocation object at the current zenith and specify our desired maximum airmass
zenith = SkyLocation.zenith (36.974117, -122.030796)
maxAirmass = 2.0

# We want the query to return a Star object and its distance from the zenith
query = session.query (Star, Star.greatCircleDistance (zenith)).filter (Star.tags.contains (Tag.get (session, "Bright")))
# We filter by our maximum airmass, making sure the angle from zenith doesn't exceed the desired airmass
query = query.filter (Star.greatCircleDistance (zenith) < math.acos (1.0 / maxAirmass))
# We order by the distance from the zenith so that the stars appear in order of the best potential observations
for star, dist in query.order_by (Star.greatCircleDistance (zenith)):
    print ("Found visible star, %s, with airmass %f" % (star.name, 1.0 / math.cos (dist)))









    



Found visible star, Vega, with airmass 1.237511

4. Building Statistics on Clustering

We can use the angular distance math to calculate statistics of clustering in the globular clusters in our database as well.



In [16]:

    
from sqlalchemy.orm import aliased
from sqlalchemy import func

# We need two star aliases because we're comparing distances
first = aliased (Star)
second = aliased (Star)

# We want the query to return the star and the distance to the nearest neighbor (not including itself)
query = session.query (first, func.min (first.greatCircleDistance (second))).filter (first.id != second.id)
# Both stars should be members of M3
query = query.filter (first.tags.contains (Tag.get (session, "M3"))).filter (second.tags.contains (Tag.get (session, "M3")))
# Since we're returning the min of the collection, we need to group by the first star.
# This would return a group of results, except that the min function now takes the minimum of each group
query = query.order_by (first.greatCircleDistance (second)).group_by (first)

results = query.all ()

distances = np.array ([distance for star, distance in results])

Now we have a numpy array that contains, for every star, the distance to the nearest star in order of increasing separation. We can do any number of things with this information. For example, we can look at the probability density function of this information.



In [17]:

    
import numpy as np

values, base = np.histogram (distances, bins = np.logspace(np.log10 (min (distances)), np.log10 (max (distances)), 50))

fig = plt.figure ()
ax = plt.subplot (111)

ax.plot (base [:-1], values)

ax.set_xscale ("log")
# ax.set_yscale ("log")

ax.set_xlabel ("Angular Separation to Nearest Neighbor (Radians)")









    Out[17]:





<matplotlib.text.Text at 0x109df1a90>

Try It Yourself!

Now, it's your turn. Try querying the database and making some interesting plots of your own. Or add your own dataset and try out some of the tools you've seen here.



In [47]:

Conclusion

Hopefully, I've provided you with a series of useful tools that can be applied to your own unique data set, whatever that may be. Databases are extremely powerful ways to query and organize data, but like any tool, their features can be abused.

First, there are many examples (particularly with homogenous data sets) where a database is the extremely powerful tool you don't need. A small, homogenous data table is almost always easier to deal with in the form of numpy arrays. Relational databases are much more useful when dealing with large, heterogeneous data sets. In addition, excessively building relationships can slow down your database, particularly if your database contains huge numbers of entries.

Tha

A final note: all the work here has been done with sqlite databases, which are nice for learning purposes but do not scale well to production-level code. Now that you're familiar with the basics, I definitely recommend looking through some of the SQLAlchemy documentation here.

Good luck!



In [ ]: